Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: combine METS and metadata files for delivery to AIS #77

Closed
sallain opened this issue Nov 5, 2024 · 3 comments · Fixed by #80
Closed

Feature: combine METS and metadata files for delivery to AIS #77

sallain opened this issue Nov 5, 2024 · 3 comments · Fixed by #80
Assignees

Comments

@sallain
Copy link
Contributor

sallain commented Nov 5, 2024

Is your feature request related to a problem? Please describe.

DPS must deliver both the METS file and the metadata.xml/UpdatedAreldaMetadata.xml file to the AIS during the post-preservation workflow. However, AIS only expects one file.

Describe the solution you'd like

Combine the METS and the metadata.xml/UpdatedAreldaMetadata.xml files together into one metadata file. For migration files (files identified as DigitizedAIP or BornDigitalAIP), UpdatedAreldaMetadata.xml should be used.

The newly created file should be named with the prefix AIS_ followed by the accession number, which can be found in the metadata.xml (or UpdatedAreldaMetadata.xml, but should be the same value) under <ablieferungsnummer>. There should only be one ablieferungsnummer per metadata file. The number is formatted as 2002/05; the / should be replaced with an _. The final file name will be AIS_2002_05.

Within the file, SFA would like the contents of metadata.xml/UpdatedAreldaMetadata.xml first, since it contains the higher hierarchies, and then the METS. The contents of the two files should probably be tagged in some way but I think it can be pretty simple - perhaps just indicating the source file.

Describe alternatives you've considered

None

Additional context

There's a very real chance that, when operating at scale, the resulting file will be too big for AIS to handle; it might make sense to then limit which fields from each file we're combining into this new file. But we'll tackle that if/when it happens.

@sallain sallain added this to Enduro Nov 5, 2024
@sallain sallain moved this to 👍 Ready in Enduro Nov 5, 2024
@djjuhasz djjuhasz self-assigned this Nov 6, 2024
djjuhasz added a commit that referenced this issue Nov 7, 2024
djjuhasz added a commit that referenced this issue Nov 8, 2024
@djjuhasz
Copy link
Contributor

djjuhasz commented Nov 8, 2024

@sallain I originally planned to try and merge the SFA Arelda metadata into the METS XML as a proper XML document with one root node and proper namespacing. I see now that SFA would like the Arelda metadata first in the document, and I've also realized that adding the Arelda XML inside the METS XML is going to be quite a bit of work. So, I've settled for now on just concatenating the two XML files with the Arelda first and the METS second. It's a work in progress (still needs testing) but I think the concatenation code should work now: https://github.com/artefactual-sdps/preprocessing-sfa/tree/dev/issue-77-combine-ais-metadata

djjuhasz added a commit that referenced this issue Nov 12, 2024
djjuhasz added a commit that referenced this issue Nov 12, 2024
djjuhasz added a commit that referenced this issue Nov 13, 2024
djjuhasz added a commit that referenced this issue Nov 13, 2024
Fixes #77.

Concatenate the Arelda metadata file from the original package and the
METS file created by Archivematica into a single "AIS" metadata file.
djjuhasz added a commit that referenced this issue Nov 13, 2024
Fixes #77.

Concatenate the Arelda metadata file from the original package and the
METS file created by Archivematica into a single "AIS" metadata file.
djjuhasz added a commit that referenced this issue Nov 13, 2024
Fixes #77.

Concatenate the Arelda metadata file from the original package and the
METS file created by Archivematica into a single "AIS" metadata file.
djjuhasz added a commit that referenced this issue Nov 13, 2024
Fixes #77.

Concatenate the Arelda metadata file from the original package and the
METS file created by Archivematica into a single "AIS" metadata file.

[skip codecov]
@djjuhasz
Copy link
Contributor

Attached is a zipped AIS package created by Enduro with the combined AIS metadata file.
search-md_little_digitized_sip-15da98b9-5953-46dd-8dc9-2b31ee544bff.zip

Note that the current name of the AIS metadata file is "AIS_1974_47_3578513" with no file extension. From the description above I think that's what the filename should be, but let me know if I should and an extension (e.g. ".xml").

djjuhasz added a commit that referenced this issue Nov 14, 2024
Fixes #77.

Concatenate the Arelda metadata file from the original package and the
METS file created by Archivematica into a single "AIS" metadata file.

[skip codecov]
@github-project-automation github-project-automation bot moved this from ⏳ In Progress to 🎉 Done in Enduro Nov 14, 2024
djjuhasz added a commit that referenced this issue Nov 15, 2024
Refs #77.

- Remove extraneous `filepath.Join()` calls
- Improve commentary a bit
- Correct "search_md" zip name in workflow tests
djjuhasz added a commit that referenced this issue Nov 15, 2024
Refs #77.

- Remove extraneous `filepath.Join()` calls
- Improve commentary a bit
- Correct "search_md" zip name in workflow tests

[skip codecov]
djjuhasz added a commit that referenced this issue Nov 15, 2024
Refs #77.

- Remove extraneous `filepath.Join()` calls
- Improve commentary a bit
- Correct "search_md" zip name in workflow tests

[skip codecov]
@sallain
Copy link
Contributor Author

sallain commented Nov 20, 2024

Results as expected!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants