Skip to content

A command-line utility to download HSDS domains from an S3 bucket

License

Notifications You must be signed in to change notification settings

methodpark/hss3dump

Repository files navigation

hss3dump - Dump HSDS Domains to local filesystem

Important: hss3dump is still in an early stage of development, so results may vary. It has only been tested with small datasets.

hss3dump is a command-line utility allowing you to list and/or replicate one or more HSDS domains from an S3 bucket to your local filesystem. It will replicate the data in such a way that it can be used as the root directory for a local HSDS instance and, thus, allows restoring of h5 files using h5pyd's hsget.

Additionally, if your S3 bucket has versioning enabled and its life cycle is set up in such a way that it does not delete versions, the tool also supports fetching different S3 object versions in order to allow restoring HSDS domains from older versions.

Installation

go install github.com/methodpark/hss3dump

Usage

Hss3dump offers the -h flag to get more information on its usage:

$ hss3dump -h
usage: hss3dump [OPTIONS] BUCKET DOMAIN...

Hss3dump downloads one or more HSDS domains from an S3 bucket, storing them on
the local filesystem in such a way that the target directory can be used as the
root directory for a local HSDS deployment.

It can restore different states of the target domain based on the versions
available in the S3 bucket. If an RFC3339 timestamp is supplied with the -b
flag, hss3dump will download the most recent versions of a domain's files that
are older or equal to the supplied time.

Options:
  -b string
        Return the first version of the domain before the given RFC3339 timestamp.
  -h    Print this command information.
  -l    Output a list with all available file versions of each domain's files.
  -r string
        Choose the root directory of the local HSDS filesystem. (default ".")

Fetching Most Recent Data

In order to fetch the most recent version of an HSDS domain called home/user/domain.h5 from an S3 bucket called hsds-bucket and dump it to the current directory, run the following command:

$ hss3dump hsds-bucket home/user/domain.h5

Supplying a Different Target Directory

The directory to which files will be written can be changed by specifying the directory root with -r. Assuming we would like to replicate the domain above to the directory /var/db/hsds_data the command would have to look like this:

$ hss3dump -r /var/db/hsds_data hsds-bucket home/user/domain.h5

Restoring Previous Domain Versions

If we want to restore a previous version of a domain, we have to take a look at the available versions first. Hss3dump makes this easy with its -l flag. Assuming we have accidentally deleted data from a domain, the output could look something like this:

$ hss3dump -l hsds-bucket home/user/domain.h5
home/user/domain.h5:
    db/e32b20a5-6c27622f/d/693e-302825-f8c087/.dataset.json
        ock7uFraVWjrotdTtGwXFR1N0TasC+ln        489 Bytes      2022-10-05T16:06:57+0100
    db/e32b60a5-6c27622f/d/693e-302825-f8c087/0
        HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL        0 Bytes        2022-10-10 09:06:59+0100
        U9LG1wDd4EdzQj0PtZqPvvTH9/BdzvVH        1296 Bytes     2022-10-05 16:06:59+0100
    db/e32b60a5-6c27622f/g/40c5-5e41ac-92006c/.group.json
        sQwXZJAcjr1M0do1BsaFmnN6FlDLRwzM        1056 Bytes     2022-10-10 09:07:00+0100
        zkRK4cagD9alQWUeN3BKTi9T+SqQdjcO        193 Bytes      2022-10-05 16:07:00+0100

The output shows that the most recent version (HikS0B1PNyvCKLO+BmagsRaAnF1sL9zL) of the file db/e32b60a5-6c27622f/d/693e-302825-f8c087/0 is 0 bytes large, while its previous version was 1296 bytes large. This means that its content has been deleted on the 10th October. If we want to restore the old data, we can do so by letting hss3dump know that it should download only versions older than October 10th. This can be done by supplying a corresponding RFC3339 timestamp via the command's -b flag:

$ hss3dump -b "2022-10-10T00:00:00+0100" hsds-bucket home/user/domain.h5

Hss3dump will then either download the most recent version that satisfies this condition, or - if no version of an object satisfies the condition - the oldest version present is chosen instead.

About

A command-line utility to download HSDS domains from an S3 bucket

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages