Add support for Argo index from DOI snapshot ? #358

gmaze · 2024-06-07T08:28:04Z

gmaze
Jun 7, 2024
Maintainer

The Argo dataset is always evolving, as the flow of new data and quality control procedures constantly update file content.

Using Argo DOI snapshots is a great solution for science reproducibility because it provides an archive of the dataset with a specific DOI, on a monthly basis.

argopy has a helper class to help access and discover all Argo DOI.

However, these snapshot are quite large and long to decompress. The full GDAC archive is about 60Gb compressed, while the BGC subset of synthetic files has a more reasonable 4Gb.

So, what if you'd like to check the archive for some stuff, before decompression ? or if you'd like to analyse the temporal evolution of several archives content ?

Since it is possible to load some specific content within an archive without decompression, and in particular index files, it may be of some interest to have the ArgoIndex compatible with a DOI snapshot

It could look like this:

from argopy import ArgoIndex
idx = ArgoIndex(host='/Users/janedoe/data/110195.tar.gz', index_file='bgc-s').load()

What do you think ?

Poke @antoine250 , @HCBScienceProducts, @catsch

HCBScienceProducts · 2024-06-07T12:04:08Z

HCBScienceProducts
Jun 7, 2024

Is it possible to load some specific content from the tar.gz archive snapshots using a remote source, i.e., without having to download the full Argo DOI snapshot?

Something like this:

from argopy import ArgoIndex
idx = ArgoIndex(host='https://www.seanoe.org/data/00311/42182/data/110195.tar.gz', index_file='bio-profile').load()

Then it would come in handy, e.g., for reproducing the status figures and maps

1 reply

gmaze Jun 7, 2024
Maintainer Author

this indeed would be awesome !

however, I don't think this is feasible right now with the seanoe https server

may be this would be possible from s3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Argo index from DOI snapshot ? #358

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Add support for Argo index from DOI snapshot ? #358

gmaze Jun 7, 2024 Maintainer

Replies: 1 comment · 1 reply

HCBScienceProducts Jun 7, 2024

gmaze Jun 7, 2024 Maintainer Author

gmaze
Jun 7, 2024
Maintainer

Replies: 1 comment 1 reply

HCBScienceProducts
Jun 7, 2024

gmaze Jun 7, 2024
Maintainer Author