<![CDATA[eLyKseeR cryptographic data archive]]>https://elykseer.github.io/documentation/https://elykseer.github.io/documentation/favicon.pngeLyKseeR cryptographic data archivehttps://elykseer.github.io/documentation/Ghost 5.82Mon, 15 Jul 2024 22:32:13 GMT60<![CDATA[Distribute encrypted chunk files]]>When backing up files, their data are spread over assemblies and encrypted. These assemblies are broken up into chunk files of 256 kB in size. These chunk files can be distributed to different storage medium for long term archiving. A copy of chunks can be made to a number of

]]>
https://elykseer.github.io/documentation/distribute-encrypted-chunk-files/6695377c29fc7a804a08be1cMon, 15 Jul 2024 22:30:34 GMT

When backing up files, their data are spread over assemblies and encrypted. These assemblies are broken up into chunk files of 256 kB in size. These chunk files can be distributed to different storage medium for long term archiving. A copy of chunks can be made to a number of targets:

  • AWS S3 - a commercial available storage solution in the cloud
  • MinIO - a self-hosted S3 storage system that can be extended to PB storage clusters
  • Filesystems - like external disks, USB sticks, or network attached storage (NAS)

A good practice is to not put all eggs in the same basket, but distribute the chunks to several storage medium in an overlapping way so to keep a certain redundancy.

lxr_distribute

The program "lxr_distribute" accepts a number of arguments and will distribute chunks to storage targets, or will fetch chunks from these.

lxr_distribute:                      
  -v verbose output
  -d sets direction of copy: GET | PUT (default)
  -x sets path for encrypted chunks
  -n sets number of chunks (16-256) per assembly
  -i sets own identifier
  -c sets path for sink configuration file
  -a sets assembly id
  -help  Display this list of options
  --help  Display this list of options
  • -d sets the direction of the transfer: either GET or PUT chunks; PUT is the default
  • -x the path to the local directory that holds the chunks
  • -n the number of chunks per assembly
  • -i our own identifier
  • -c points to a configuration file that describes the sinks (targets) and how to access them
  • -a indicates the assembly which chunks need to be transferred

note: the chunks identifiers are computed from the provided information: assembly identifier, chunks per assembly, and own identifier. It will lead to unknown identifiers if any one of those is not entered correctly.

example call:

lxr_distribute -v -d PUT -x ../elykseer.chunks -n 16 -i mytest \
-c sinks.json -a "c74b..95f4" 8 8 16

with the definitions of the targets in sinks.json:

{
  "version": "1.0.0",
  "sinks": [
    {
        "type": "S3",
        "name": "s3_minio",
        "description": "minio storage cluster",
        "credentials": {
            "access-key": "minioadmin",
            "secret-key": "s3cr3t"
        },
        "access": {
            "bucket": "lxr",
            "prefix": "lxr",
            "host": "localhost",
            "port": "9000",
            "protocol": "https"
        }
    },
    {
        "type": "S3",
        "name": "s3_aws",
        "description": "AWS S3 storage",
        "credentials": {
            "access-key": "AKIA4VLHJ8YV6EXAMPLE",
            "secret-key": "s3cr3t"
        },
        "access": {
            "bucket": "mybucket",
            "prefix": "lxr",
            "host": "mybucket.s3.eu-west-3.amazonaws.com",
            "port": "443",
            "protocol": "https"
        }
    },
    {
        "type": "FS",
        "name": "fs_copy",
        "description": "filesystem copy",
        "credentials": {
            "user": "*",
            "group": "root",
            "permissions": "640"
        },
        "access": {
            "basepath": "/data/secure_stick"
        }
    }
  ]
}

The last arguments on the command line define how many chunks are copied to which sink. The targets are in the same order as in the sinks.json file. In the above example we copy the first eight chunks to MinIO, then the next eight chunks to AWS S3, and all 16 chunks to the filesystem path /data/secure_stick. Overall this gives us 2x redundancy and we only copy have the chunks to either cloud storage.

]]>
<![CDATA[Example using binary releases]]>Starting with version v0.9.12 we are releasing precompiled binaries for various platforms and architectures.
See our Github site: https://github.com/eLyKseeR/elykseer-ml/releases

Preparations

Extract the zip of downloaded binaries to a directory and point the environment variable at them:

export LXR_BINARIES=${HOME}/Downloads/Darwin_arm64

]]>
https://elykseer.github.io/documentation/example-using-binary-releases/6679410888891f827e1ee046Mon, 24 Jun 2024 12:22:03 GMT

Starting with version v0.9.12 we are releasing precompiled binaries for various platforms and architectures.
See our Github site: https://github.com/eLyKseeR/elykseer-ml/releases

Preparations

Extract the zip of downloaded binaries to a directory and point the environment variable at them:

export LXR_BINARIES=${HOME}/Downloads/Darwin_arm64

(this might change depending on your platform)

Only on macOS: go into this directory and setup the binaries explicitly with the following commands. This is necessary as macOS retracts rights to execute a downloaded binary by default.

cd ${LXR_BINARIES}
sh ./setup.sh

(only required on macOS)

Also, define where to store meta data and encrypted chunks:

export LXR_DB=${HOME}/elykseer.db

export LXR_CHUNKS=${HOME}/elykseer.chunks

First, we need to create directories that will hold the meta data and the encrypted chunks from backups.

mkdir -v ${LXR_DB}

mkdir -v ${LXR_CHUNKS}

And, next we create irmin's configuration file and initialise the database:

cat << EOF > irmin.yml
root: ${LXR_DB}
store: git
contents: json-value
EOF

${LXR_BINARIES}/irmin init

Test run

Preparations

Let's create a few files with random data and remember their checksums:

MYID=test1
dd if=/dev/random of=test1M bs=1M count=1
dd if=/dev/random of=test4M bs=1M count=4
dd if=/dev/random of=test8M bs=1M count=8
md5sum test[148]M > md5sums || md5 test[148]M > md5sums

Backup

Then, encrypt and backup these files:

${LXR_BINARIES}/lxr_backup.exe -v -x ${LXR_CHUNKS} -d ${LXR_DB} -n 16 -i $MYID test1M test4M test8M

The output will look like this:

INFO finalising assembly 161e4e0d5096cb8b5cfffcbb2b437c7eb9a612a9424a6d03659e485b812ed96a with apos = 4161552
INFO encrypted assembly: 161e4e0d5096cb8b5cfffcbb2b437c7eb9a612a9424a6d03659e485b812ed96a
INFO block backup succeeded of file: test8M
INFO finalising assembly 318a57be77e51a00a4561ebc82691c97ee9ba894379a7b177cb215a29551b4a3 with apos = 4161552
INFO encrypted assembly: 318a57be77e51a00a4561ebc82691c97ee9ba894379a7b177cb215a29551b4a3
INFO block backup succeeded of file: test4M
INFO finalising assembly 755c4033d76f8a0a669fee253db850c7dc47502179bde27d7f71db17d3019638 with apos = 4161552
INFO encrypted assembly: 755c4033d76f8a0a669fee253db850c7dc47502179bde27d7f71db17d3019638
INFO block backup succeeded of file: test1M
INFO finalising assembly d7b1f4bf29cfcf2a6ce994ebece3ecb18127dab027a94ec7125418ecffbc195d with apos = 1146896
INFO encrypted assembly: d7b1f4bf29cfcf2a6ce994ebece3ecb18127dab027a94ec7125418ecffbc195d
done.
    total allocated: 9997965.000000

We can inspect a file's meta data using its filehash:

FHASH=$(${LXR_BINARIES}/lxr_filehash.exe -f test1M -i ${MYID} | cut -d ' ' -f 2)

${LXR_BINARIES}/irmin get ${MYID}/relfiles/${FHASH:4:2}/${FHASH} | jq -r 

Restore

Let's recreate the files from the encrypted chunks with the help of meta data:

PREVPWD=$(pwd)
TEMPORARY=$(mktemp -d)
${LXR_BINARIES}/lxr_restore.exe -v -x ${LXR_CHUNKS} -d ${LXR_DB} -n 16 -i $MYID -o ${TEMPORARY} test1M test4M test8M
cd ${TEMPORARY}
md5sum -c ${PREVPWD}/md5sums
cd ${PREVPWD}

The output will look like this:

INFO restoring file test8M from 256 blocks
INFO restoring file test4M from 128 blocks
INFO restoring file test1M from 32 blocks
  restored 3 files with 13631488 bytes in total

And, the validation of the restored files might be done so:

cd ${TEMPORARY}
md5sum -c ${PREVPWD}/md5sums || { md5 test[148]M | diff - ${PREVPWD}/md5sums && echo OK || echo failed; }
cd ${PREVPWD}

The above command should output "OK" to indicate that the file checksums of the restored files match the one previously recorded.

]]>
<![CDATA[Executable eLyKseeR Docker image]]>https://elykseer.github.io/documentation/executable-elykseer-docker-image/664720e7f0341a1b807b1545Fri, 17 May 2024 19:23:00 GMT

This Docker image contains just the compiled command line programs of eLyKseeR to run backups and file restore on your computer.

Visit Docker Hub: https://hub.docker.com/r/codieplusplus/elykseer-ml-binaries

Pull the image

There are images for Intel/AMD CPUs:

ARCH=amd64
docker pull codieplusplus/elykseer-ml-binaries:${ARCH}

or, ARM (aarch64) processors like on the newer Macs M[123]:

ARCH=arm64
docker pull codieplusplus/elykseer-ml-binaries:${ARCH}
$ARCH indicates the platform (arm64 | amd64)

Preparations

We are using Docker volumes to store chunks and meta data independent of the container's lifecycle. Create these volumes, if not already present:

docker volume list

docker volume create elykseer_db
docker volume create elykseer_chunks

Running a backup

Let's start with defining some settings:

SRCDIR=/Users/alex/Documents
TGTDIR=/tmp/test_restore
mkdir -vp ${TGTDIR}
$SRCDIR points to a directory that will be mounted read-only into the container at /data; this will be the data to backup
$TGTDIR points to a directory which will contain the restored files

And, then run a container that will not be persisted after we leave it:

docker run -it --rm \
  -v elykseer_db:/home/coq/elykseer.db \
  -v elykseer_chunks:/home/coq/elykseer.chunks \
  --mount type=bind,source="${SRCDIR}",target=/data,readonly \
  --mount type=bind,source="${TGTDIR}",target=/restore \
  codieplusplus/elykseer-ml-binaries:${ARCH}

For a first run, check that the permissions on these directories in the container are similar to these:

coq@1ab703a5a27c:~$ ls -ld elykseer.chunks elykseer.db
drwxrwsr-x 210 root coq 4096 May 16 22:15 elykseer.chunks
drwxrwsr-x   3 root coq 4096 May 15 09:44 elykseer.db


# if necessary change with:
sudo chgrp coq elykseer.chunks elykseer.db
sudo chmod g+ws elykseer.chunks elykseer.db

We are located in the home directory of user coq where we set the identifier of our backups:

coq@c7d5c065ca31:~$ MYID=test

Let's start the shallow backup of only the files directly under /data :

coq@c7d5c065ca31:~$ lxr_backup -v -x ${HOME}/elykseer.chunks -d ${HOME}/elykseer.db -n 16 -i $MYID -D /data/
done.
prepend '-R' to the argument '-D' to have the backup recurse into subdirectories

List meta data

A backup outputs encrypted chunks but also meta data which we can inspect:

coq@c7d5c065ca31:~$ irmin list
DIR test

coq@c7d5c065ca31:~$ irmin list $MYID
DIR relfiles
DIR relkeys

coq@c7d5c065ca31:~$ irmin list $MYID/relfiles
DIR 07
DIR 0e
DIR 2f
DIR 3a
DIR 3e
DIR 63
DIR aa
DIR fa
DIR fc

coq@c7d5c065ca31:~$ irmin list $MYID/relfiles/63
FILE 726563f83800d7c5d8ab76fbc71e2d38ba060df36a417eb9615bd3759380a699

Let's have a look at one of them using jq to format the output:

coq@c7d5c065ca31:~$ irmin get test/relfiles/63/726563f83800d7c5d8ab76fbc71e2d38ba060df36a417eb9615bd3759380a699 | jq
{
  "version": {
    "major": "0",
    "minor": "9",
    "build": "11"
  },
  "fileinformation": {
    "fname": "/data/something.dat",
    "fhash": "726563f8..9380a699",
    "fsize": "64798",
    "fowner": "1000",
    "fpermissions": "644",
    "fmodified": "2022-07-31 17:51:59",
    "fchecksum": "9e76dd4b..2973fe25"
  },
  "blocks": [
    {
      "blockid": "1",
      "bchecksum": "b0ebf725..b4ff33d5",
      "blocksize": "32768",
      "filepos": "0",
      "blockaid": "35d5411f..5653bff3",
      "blockapos": "182033"
    },
    {
      "blockid": "2",
      "bchecksum": "82f178b5..e533c0bf",
      "blocksize": "32030",
      "filepos": "32768",
      "blockaid": "35d5411f..5653bff3",
      "blockapos": "150003"
    }
  ]
}

The file has been backup in two blocks. The checksums of all blocks and the original file are remembered to decide later on a subsequent backup which parts of the file changed.

Restore a file

From the encrypted chunks and the saved meta data this file can be restored:

coq@c7d5c065ca31:~$ lxr_restore -v -x ${HOME}/elykseer.chunks -d ${HOME}/elykseer.db -i $MYID -o /restore/ /data/something.dat
INFO restoring file /data/Pergola.dxf from 2 blocks
  restored 1 files with 64798 bytes in total

done.

]]>
<![CDATA[Docker development image]]>https://elykseer.github.io/documentation/docker-development-image/66471812f0341a1b807b14b1Fri, 17 May 2024 09:18:14 GMT

Setting up a development environment with Coq can take some time. We have prepared a Docker image one can use for development of eLyKseeR and running proofs in Coq for formal verification of the system. This also gives us a stable environment to run reproducible tests.

The image also contains an OCaml compiler with all necessary libraries preinstalled. So it allows to extract code from Coq to OCaml and compile the command line programs.

Versions installed:

  • coq 8.18.0
  • ocaml 5.1.1

Visit Docker Hub: https://hub.docker.com/r/codieplusplus/elykseer-ml

Docker

elykseer-ml images on Docker Hub

Pull the image

There are images for Intel/AMD CPUs:

docker pull codieplusplus/elykseer-ml:amd64

or, ARM (aarch64) processors like on the newer Macs M[123]:

docker pull codieplusplus/elykseer-ml:arm64

Running an eLyKseeR container

Let's start with running a container that will not be persisted after we leave it:

docker run -it --rm codieplusplus/elykseer-ml:arm64

We are located in the checkout of the project where we can run git commands:

coq@9f37ae23a26b:~/elykseer-ml.git$ git fetch
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 19 (delta 11), reused 19 (delta 11), pack-reused 0
Unpacking objects: 100% (19/19), 4.82 KiB | 329.00 KiB/s, done.
From https://github.com/eLyKseeR/elykseer-ml
   dffe907..1425c92  main              -> origin/main

A run with Coq to verify the program:

coq@9f37ae23a26b:~/elykseer-ml.git$ make clean
CLEAN
coq@9f37ae23a26b:~/elykseer-ml.git$ make
COQDEP VFILES
COQC theories/Cstdio.v
COQC theories/Filesystem.v
COQC theories/Conversion.v
COQC theories/Nchunks.v
COQC theories/Configuration.v
COQC theories/Filesupport.v
COQC theories/Utilities.v
COQC theories/Assembly.v
COQC theories/Store.v
COQC theories/Environment.v
COQC theories/AssemblyCache.v
COQC theories/Processor.v
COQC theories/Version.v
COQC theories/MakeML.v

If no error is output, then this is a good sign and proof that everything is validated. This also extracted the OCaml code which we now compile and install to be able to execute the programs.

dune build && dune install --prefix ${HOME}/.local

Run tests

Now, we change to the home directory and prepare the setup for testing:

coq@9f37ae23a26b:~/elykseer-ml.git$ cd

coq@9f37ae23a26b:~$ ./setup.sh 
mkdir: created directory '/home/coq/elykseer.chunks'
mkdir: created directory '/home/coq/elykseer.db'

all setup.

The test run uses files of size 1, 4, and 8 megabytes consisting of random data. It will first create a backup of these, which will output a number of chunks files in the directory elykseer.chunks and meta data in elykseer.db . Then, it restores them and compares their checksums to the one of the original file.

coq@9f37ae23a26b:~$ ./run_test.sh 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00975533 s, 107 MB/s
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0218318 s, 192 MB/s
8+0 records in
8+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.0268092 s, 313 MB/s
test1M: OK
test4M: OK
test8M: OK
done.
get fchecksum: d4a2d5ce52f8675cbdebaaf13bbc6f41e5d5ebc3d200a0c3899d4017eaf644be
get fblocks: d4a2d5ce52f8675cbdebaaf13bbc6f41e5d5ebc3d200a0c3899d4017eaf644be
get fchecksum: 21994160f4af06b068e408585ad1f84bcf743455fed4d5ebfc50ee3d74672cec
get fblocks: 21994160f4af06b068e408585ad1f84bcf743455fed4d5ebfc50ee3d74672cec
get fchecksum: 7f4c347c34d4aafc55e23eb5642e0311dff57d92122b5e91d573c3f4882b1c31
get fblocks: 7f4c347c34d4aafc55e23eb5642e0311dff57d92122b5e91d573c3f4882b1c31
FILE 7f4c347c34d4aafc55e23eb5642e0311dff57d92122b5e91d573c3f4882b1c31
  restoring 4194304 bytes in file 'test4M' from 128 blocks
     -> 4194304 bytes
  restoring 8388608 bytes in file 'test8M' from 256 blocks
     -> 8388608 bytes
  restoring 1048576 bytes in file 'test1M' from 32 blocks
     -> 1048576 bytes
  restored 3 files with 13631488 bytes in total
test1M: OK
test4M: OK
test8M: OK

Done.

]]>
<![CDATA[What are the use cases?]]>https://elykseer.github.io/documentation/use-case/661adc7915129e71f89a6d1aSat, 13 Apr 2024 19:45:39 GMTbackupWhat are the use cases?

File backup will output meta data, i.e. encryption keys, and encrypted file content in chunk files.

restore

Pass in encryption keys and other meta data, then the software will restore file content from assembling chunks and decrypting them.

verify

Given the meta data from file backup, the system can verify the validity of the encrypted data.

distribute

As output from a file backup there are chunk files created containing the encrypted data. These chunks can be stored on different servers or various storage medium in a redundant way to assure successful file restore in the future.

share

Sharing the meta data with somebody else allows that user to restore the original file content.

]]>
<![CDATA[Welcome]]>This is the documentation site for https://github.com/eLyKseeR - a cryptographic data archive.

I got inspired by this article: https://zzamboni.org/post/hosting-a-ghost-blog-in-github/ which describes how to dump a "ghost" site to Github pages.

Content here will be kept up-to-date which means it can also

]]>
https://elykseer.github.io/documentation/welcome/661abca6f8baa26b249f6155Sat, 13 Apr 2024 17:17:42 GMT

This is the documentation site for https://github.com/eLyKseeR - a cryptographic data archive.

I got inspired by this article: https://zzamboni.org/post/hosting-a-ghost-blog-in-github/ which describes how to dump a "ghost" site to Github pages.

Content here will be kept up-to-date which means it can also disappear once it is outdated. New stuff will be added constantly but erratically spread over time.

A good way to follow the progress is to connect on Mastodon

eLyKseeR_FOSS (@elykseer@ioc.exchange)
4 Posts, 4 Following, 2 Followers · developing free and open source software for cryptographic data archives.
Welcome
]]>
<![CDATA[Source code]]>https://elykseer.github.io/documentation/source-code/661ab437f8baa26b249f6110Sat, 13 Apr 2024 16:53:26 GMTThe code for eLyKseeR is organised in a few repositories:

elykseer-crypto

GitHub - eLyKseeR/elykseer-crypto: base library that provides cryptographic functions to elykseer implementations
base library that provides cryptographic functions to elykseer implementations - eLyKseeR/elykseer-crypto

This library provides implementations for the cryptographic primitives that we require. It is written in C++ and interfaces with Crypto++ or OpenSSL.

The library provides bindings for languages: C, C#, OCaml.

elykseer-ml

GitHub - eLyKseeR/elykseer-ml: formally specified & verified implementation of eLyKseeR in Coq / OCaml
formally specified & verified implementation of eLyKseeR in Coq / OCaml - eLyKseeR/elykseer-ml

This repository hosts the formal specification of the eLyKseeR software in Coq/Rocq. Source code is extracted to OCaml and command line tools are built as reference implementations.

elykseer-cpp

GitHub - eLyKseeR/elykseer-cpp: eLyKseeR basis in C++
eLyKseeR basis in C++. Contribute to eLyKseeR/elykseer-cpp development by creating an account on GitHub.

Based on the library elykseer-crypto this code implements the formally specified eLyKseeR software (elykseer-ml) in C++.

]]>
<![CDATA[Formal verification]]>https://elykseer.github.io/documentation/formal-verification/6619d55573e1055bf736b20dSat, 13 Apr 2024 00:52:20 GMT

There is a lot to proof - a truth that holds in all circumstances.

I am using Coq (soon to be renamed to Rocq) for the formalisation of the software eLyKseeR. This means that the end result, the software, will behave as specified by the formalisation. And, this can be proved.

in short..

The most important property is that a program or function will eventually finish its computation once started. All functions in Coq need to be written in a style that this is proven. Otherwise, the theorem prover will complain.

More properties can be written as lemmas or theorems in Coq and they will be verified henceforth.

💡
to be done: provide illustrative examples of code
]]>
<![CDATA[Coming soon]]>we are working on

  • filesystem access (fuse?)
  • GUI - a web-based user interface
]]>
https://elykseer.github.io/documentation/coming-soon/6619d3e873e1055bf736b020Sat, 13 Apr 2024 00:38:00 GMT

we are working on

  • filesystem access (fuse?)
  • GUI - a web-based user interface
]]>