Replies: 1 comment 1 reply
-
Is it possible to load some specific content from the tar.gz archive snapshots using a remote source, i.e., without having to download the full Argo DOI snapshot? Something like this: from argopy import ArgoIndex
idx = ArgoIndex(host='https://www.seanoe.org/data/00311/42182/data/110195.tar.gz', index_file='bio-profile').load() Then it would come in handy, e.g., for reproducing the status figures and maps |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The Argo dataset is always evolving, as the flow of new data and quality control procedures constantly update file content.
Using Argo DOI snapshots is a great solution for science reproducibility because it provides an archive of the dataset with a specific DOI, on a monthly basis.
argopy has a helper class to help access and discover all Argo DOI.
However, these snapshot are quite large and long to decompress. The full GDAC archive is about 60Gb compressed, while the BGC subset of synthetic files has a more reasonable 4Gb.
So, what if you'd like to check the archive for some stuff, before decompression ? or if you'd like to analyse the temporal evolution of several archives content ?
Since it is possible to load some specific content within an archive without decompression, and in particular index files, it may be of some interest to have the
ArgoIndex
compatible with a DOI snapshotIt could look like this:
What do you think ?
Poke @antoine250 , @HCBScienceProducts, @catsch
Beta Was this translation helpful? Give feedback.
All reactions