-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create shelves files for pds4 #57
base: main
Are you sure you want to change the base?
Conversation
absolute path of a filespec.
checksums-bundles directory for pds4
_infoshelf-bundles
_linkshelf-* directory
The |
when trying to parse each entry to get the basename of a file in the archive.
Update the latest status, the top comments are also updated (10/22/24)
|
- volset_abspath to bundleset_abspath - volset_pdsfile to bundleset_pdsfile - volume_abspath to bundle_abspath - volume_pdsfile to bundle_pdsfile - voltype_ to bundletype_ - volume_publication_date to bundle_publication_date - volume_version_id to bundle_version_id
names and directories included in one archive by doing these: - Add rules ARCHIVE_PATHS & ARCHIVE_DIRS in pds4 to specify the mapping for a bundle set path to its corresponding archive files, and the mapping for a archive file to its included directories. (line 602-614 in rules/__init__.py) - Add bundle set specific archive_paths & archive_dirs rules for uranus occs (line 430-447 in rules/uranus_occs_earthbased.py) - Add pds4 specific functions: (line 189-218 in pds4file/__init__.py) - archive_paths: return the absolute path to the archive file associated with this bundleset. - archive_dirs: Return a dictionary that is keyed by a archive path and the list of directories included in that archive path as the value. - Modify write_archive to get tarpath and its included dirs from archive_paths & archive_dirs (line 181-203 in pds4/pds4archives.py)
functions in pds4 to use glob_glob to properly get the abspath for included archive dirs.
cassini_iss_saturn. (line 292-376 in rules/cassini_iss.py)
Status update on 11/04/24 (top comments are updated):
|
cassini_iss_saturn
and archive_dirs rules for cassini_uvis_solarocc_beckerjarmak2023
Updates on 11/12/24:
|
Here is what I'm seeing:
|
This is just a text file Dave created by hand for our review. It's not part of the tool chain.
We discussed this in the group meeting and agreed to break the browse products in the same way as the data products.
I'm assuming you mean 6 GB, not 6 TB? If the entire collection compresses to only 6 GB, then I agree one file is a good choice.
I'm assuming you mean 4 GB, not 4 TB? If the entire collection compresses to only 4 GB, then I agree one file is a good choice.
The
Seems like a good idea. |
I'll break the
The total sizes of archive files for
The entire
I'll update
Got it, I'll remove |
That's fine, but part of what I was trying to figure out at the group meeting is how and where these decisions are encoded. Can you point me to the file(s)?
I understand that. I just want to have some place where I can be informed about what rules are being set up. If it's not these files, then don't bother changing these files.
We will be removing |
They are encoded in these two variables under the
For reference, we can also call
|
- Split browse_raw into multiple archive files based on sclk for cassini_iss_cruise - Include collection_*_raw.csv/xml in every browse_raw & data_raw .tar.gz files based on sclk - Remove data_raw_col_xml_csv_metadata.tar.gz since metadata directory will be removed and all collection_*_raw.csv/xml are included in every brwose_raw & data_raw archive files based on sclk
Updates:
|
opus_products functino (line 4792-4795, pdsfile.py)
- Sort the list of sublists by version and filepath (in the order of decreasing version, or reversed alphabetical order if version is the same) - Sort the sublist by filepath (alphabetical order)
the abspath of element in the sublist.
0 when there is 'REDO', 'TIRETRACK', or 'REPAIRED' substring in one of the path in a sublist so that prioritizer can be properly sorted.
Okay, thanks for that. I see that everything seems to be hardcoded in an ad hoc manner. Maybe that's what we want to do. A more systematic approach could conceivably be attractive but might be more trouble than it's worth. |
- for the same opus type (header), combining different lists of the same version to one sublist - sorting each sublist by filepath (alphabetical order) - sorting the list of sublists by version (in the order of decreasing version)
Current status of creating shevles files for pds4:
Create files in checksums-* directory (
pds4checksums.py
)BUNDLENAME_REGEX
python holdings_maintenance/pds4/pds4checksums.py --init /Volumes/rms-holdings/pds4-holdings/bundles/uranus_occs_earthbased
python holdings_maintenance/pds4/pds4checksums.py --init /Volumes/rms-holdings/pds4-holdings/metadata/uranus_occs_earthbased
python holdings_maintenance/pds4/pds4checksums.py --init /Volumes/rms-holdings/pds4-holdings/diagrams/uranus_occs_earthbased
Create files in _infoshelf-* directory (
pds4infoshelf.py
), corresponding checksums files from the above steps are requiredpds4checksums
python holdings_maintenance/pds4/pds4infoshelf.py --init /Volumes/rms-holdings/pds4-holdings/bundles/uranus_occs_earthbased
python holdings_maintenance/pds4/pds4infoshelf.py --init /Volumes/rms-holdings/pds4-holdings/metadata/uranus_occs_earthbased
python holdings_maintenance/pds4/pds4infoshelf.py --init /Volumes/rms-holdings/pds4-holdings/diagrams/uranus_occs_earthbased
Create files in _indexshelf-metadata (
pds4indexshelf.py
)BUNDLENAME_REGEX
to Pds3File & Pds4File classes since they are different for pds3 & pds4IDX_EXT
andLBL_EXT
to Pds3File & Pds4File to replace '.tab' & '.lbl' inpdsfile.py
.xml
and idx extension is.csv
.lbl
and idx extension is.tab
Create files in _linkshelf-* directory (
pds4linkshelf.py
).TXT
inEXTS_WO_LABELS
,.TXT
could have a label in pds4python holdings_maintenance/pds4/pds4linkshelf.py --init /Volumes/rms-holdings/pds4-holdings/bundles/uranus_occs_earthbased
Create archive files (
pds4archives.py
)ARCHIVE_PATHS
andARCHIVE_DIRS
rules to determine the archive file names and the included directories for each archive file. (each bundle set has its own rules)ARCHIVE_PATHS
: map a bundle set or a bundle to a list of logical paths of the archive file names.ARCHIVE_DIRS
: map a logical path of an archive file name to a list of logical paths of the included directories.python holdings_maintenance/pds4/pds4archives.py --init /Volumes/rms-holdings/pds4-holdings/bundles/uranus_occs_earthbased
python holdings_maintenance/pds4/pds4archives.py --init /Volumes/rms-holdings/pds4-holdings/bundles/cassini_iss/cassini_iss_cruise
Pds4FileTest/archive-bundles
for reviewuranus_occs_earthbased.tar.gz
with bundle set as the file name, and all bundles are included in the archive file.Pending items:
rms-pdstable
(pds3, pdstable) repo, create a pds4 version of it to read the pds4 table.Pds4FileTest/archive-bundles
for reviewpds4file/rules/
pds4indexshelf.py
to create_indexshelf-metadata
for pds4pds4linkshelf.py
to create_linkshelf-metadata
for pds4Note:
ring_models
and_support
are included when running the scripts.use_shelves_only
set to True now.opus_products
output is in alphabetical order now (work with 1384 sort filenames on details tab rms-opus#1396)