Distribute encrypted chunk files

drops on water surface
Photo by Jan-Willem / Unsplash

When backing up files, their data are spread over assemblies and encrypted. These assemblies are broken up into chunk files of 256 kB in size. These chunk files can be distributed to different storage medium for long term archiving. A copy of chunks can be made to a number of targets:

  • AWS S3 - a commercial available storage solution in the cloud
  • MinIO - a self-hosted S3 storage system that can be extended to PB storage clusters
  • Filesystems - like external disks, USB sticks, or network attached storage (NAS)

A good practice is to not put all eggs in the same basket, but distribute the chunks to several storage medium in an overlapping way so to keep a certain redundancy.

lxr_distribute

The program "lxr_distribute" accepts a number of arguments and will distribute chunks to storage targets, or will fetch chunks from these.

lxr_distribute:                      
  -v verbose output
  -d sets direction of copy: GET | PUT (default)
  -x sets path for encrypted chunks
  -n sets number of chunks (16-256) per assembly
  -i sets own identifier
  -c sets path for sink configuration file
  -a sets assembly id
  -help  Display this list of options
  --help  Display this list of options
  • -d sets the direction of the transfer: either GET or PUT chunks; PUT is the default
  • -x the path to the local directory that holds the chunks
  • -n the number of chunks per assembly
  • -i our own identifier
  • -c points to a configuration file that describes the sinks (targets) and how to access them
  • -a indicates the assembly which chunks need to be transferred

note: the chunks identifiers are computed from the provided information: assembly identifier, chunks per assembly, and own identifier. It will lead to unknown identifiers if any one of those is not entered correctly.

example call:

lxr_distribute -v -d PUT -x ../elykseer.chunks -n 16 -i mytest \
-c sinks.json -a "c74b..95f4" 8 8 16

with the definitions of the targets in sinks.json:

{
  "version": "1.0.0",
  "sinks": [
    {
        "type": "S3",
        "name": "s3_minio",
        "description": "minio storage cluster",
        "credentials": {
            "access-key": "minioadmin",
            "secret-key": "s3cr3t"
        },
        "access": {
            "bucket": "lxr",
            "prefix": "lxr",
            "host": "localhost",
            "port": "9000",
            "protocol": "https"
        }
    },
    {
        "type": "S3",
        "name": "s3_aws",
        "description": "AWS S3 storage",
        "credentials": {
            "access-key": "AKIA4VLHJ8YV6EXAMPLE",
            "secret-key": "s3cr3t"
        },
        "access": {
            "bucket": "mybucket",
            "prefix": "lxr",
            "host": "mybucket.s3.eu-west-3.amazonaws.com",
            "port": "443",
            "protocol": "https"
        }
    },
    {
        "type": "FS",
        "name": "fs_copy",
        "description": "filesystem copy",
        "credentials": {
            "user": "*",
            "group": "root",
            "permissions": "640"
        },
        "access": {
            "basepath": "/data/secure_stick"
        }
    }
  ]
}

The last arguments on the command line define how many chunks are copied to which sink. The targets are in the same order as in the sinks.json file. In the above example we copy the first eight chunks to MinIO, then the next eight chunks to AWS S3, and all 16 chunks to the filesystem path /data/secure_stick. Overall this gives us 2x redundancy and we only copy have the chunks to either cloud storage.