Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non redundant provenance #101

Merged
merged 19 commits into from
Jan 29, 2024
Merged

Non redundant provenance #101

merged 19 commits into from
Jan 29, 2024

Conversation

alexdunnjpl
Copy link
Contributor

Rebased on #100 - consider only commit 3f94d33

🗒️ Summary

Implements #92

Modifies behaviour in that now, the latest version of a product will be assigned "ops:Provenance/ops:superseded_by": null rather than not having the attribute assigned at all.

Implements software-version-based reprocessing avoidance, as already exists for repairkit and ancestry.

Reads all documents, builds version chains for distinct LIDs, drops all singleton products (as no links exist), builds links, tainting any products with changed successor data, then produces updates, skipping up-to-date records unless they have been tainted.

⚙️ Test Data and/or Report

Functional tests pass, but none are relevant to provenance, per #13
Manually tested, comparing updates produced before/after change.

♻️ Related Issues

fixes #92

@alexdunnjpl
Copy link
Contributor Author

Benchmarking against sbnpsi results in speed-up from 5m30s to 4m20s due to inherent speed improvements, but sbnpsi only has ~250 non-singleton products out of 1.5M total.

Results are likely to be significantly more impressive when it's actually avoiding a significant quantity of avoidable db writes .

@alexdunnjpl alexdunnjpl merged commit 2850c2e into main Jan 29, 2024
2 checks passed
@alexdunnjpl alexdunnjpl deleted the non-redundant-provenance branch January 29, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate/implement non-redundant provenance processing
2 participants