Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: check to see if incoming SIP has already been ingested #106

Open
sallain opened this issue Jan 13, 2025 · 0 comments
Open

Feature: check to see if incoming SIP has already been ingested #106

sallain opened this issue Jan 13, 2025 · 0 comments
Assignees

Comments

@sallain
Copy link
Contributor

sallain commented Jan 13, 2025

Is your feature request related to a problem? Please describe.

When processing a large volume of packages, it can be easy to upload the same package twice. This could also happen if package deposit is automated or in a variety of other situations. In any case, processing the same package twice is an unnecessary use of compute resources and storage space and should be avoided.

Describe the solution you'd like

I would like Enduro to check to see if the package has already been ingested. This could be done by computing a checksum for the package and recording this in a database, to be checked against in the future. All incoming packages will be compressed, so computing a checksum should be fast. Any checksum that is a repeat of one already recorded in the database will trigger a failure.

This should be optional 😉

Describe alternatives you've considered

This is a pretty high-level check that will not catch very similar packages. It would be possible to be much more specific - checking individual file checksums, for example, or checking certain metadata elements - but I think this will suffice for the migration (and therefore MVP).

Additional context

@sallain sallain added this to Enduro Jan 13, 2025
@sallain sallain moved this to 👍 Ready in Enduro Jan 13, 2025
@sallain sallain moved this from 👍 Ready to ⏳ In Progress in Enduro Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ⏳ In Progress
Development

No branches or pull requests

2 participants