Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xmlvalidate to validate SIP manifests #64

Merged
merged 2 commits into from
Oct 22, 2024

Conversation

djjuhasz
Copy link
Contributor

@djjuhasz djjuhasz commented Oct 18, 2024

Fixes #39

Switch from the internal "validate metadata" activity which calls a
Python (xsdval.py) script to the temporal-activities/xmlvalidate
module. xmlvalidate calls the xmllint C program to validate the SIP
manifest file against the XSD schema files included in the SIP.

  • Install xmllint in the preprocessing-sfa Docker image
  • Import github.com/artefactual-sdps/temporal-activities/xmlvalidate
  • Switch to xmlvalidate with the xmllint validator for validating the
    SIP metadata file
  • Remove the internal "validate metadata" activivity
  • Remove the sampledata directory containing the xsdval.py script,
    Arelda XSD files, and sample SIP

@djjuhasz djjuhasz force-pushed the dev/issue-39-use-xmlvalidate-activity branch from 097b581 to 1aa1951 Compare October 18, 2024 23:39
@djjuhasz djjuhasz changed the title Use temporal-activties/xmlvalidate to validate SIP manifests Use xmlvalidate to validate SIP manifests Oct 18, 2024
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 54.06%. Comparing base (0839404) to head (919f17f).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
cmd/worker/workercmd/cmd.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
+ Coverage   53.59%   54.06%   +0.47%     
==========================================
  Files          30       29       -1     
  Lines        2103     2090      -13     
==========================================
+ Hits         1127     1130       +3     
+ Misses        907      891      -16     
  Partials       69       69              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@djjuhasz djjuhasz force-pushed the dev/issue-39-use-xmlvalidate-activity branch 3 times, most recently from af248d5 to 11e9c98 Compare October 18, 2024 23:46
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, thanks @djjuhasz! Just one comment about the Docker image.

Dockerfile Show resolved Hide resolved
- Install libxml2-utils in the preprocessing-worker Docker image to
  provide xmllint, which is required by
  https://github.com/artefactual-sdps/temporal-activities/tree/main/xmlvalidate
- Update Python version 3.13
- Download and build the latest development version of bagit-python to
  to get the fixes made since the v1.8.1 release, and for compatibility
  with Python 3.13
- Update the Dockerfile syntax version to the latest version of 1.x
- Add stdout & stderr output to error message when running the Python
  metadata validation script (`xsdval.py`) to aid debugging
@djjuhasz djjuhasz force-pushed the dev/issue-39-use-xmlvalidate-activity branch from 11e9c98 to 3ad31d4 Compare October 22, 2024 17:29
Fixes #39

Switch from the internal `ValidateMetadata` activity which calls a
Python (xsdval.py) script to the `temporal-activities/xmlvalidate`
activity. xmlvalidate calls the xmllint C program to validate the SIP
manifest file against the XSD schema files included in the SIP.

- Import github.com/artefactual-sdps/temporal-activities/xmlvalidate
- Switch to xmlvalidate with the xmllint validator for validating the
  SIP metadata file
- Remove the internal "validate metadata" activivity
- Remove the sampledata directory containing the `xsdval.py` script,
  Arelda XSD files, and sample SIP
- Remove Python, python-bagit and lxml from the Docker image
@djjuhasz djjuhasz force-pushed the dev/issue-39-use-xmlvalidate-activity branch from 3ad31d4 to 919f17f Compare October 22, 2024 19:57
Dockerfile Show resolved Hide resolved
@djjuhasz djjuhasz requested a review from jraddaoui October 22, 2024 20:58
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, thanks @djjuhasz!

@djjuhasz djjuhasz merged commit c72409c into main Oct 22, 2024
9 checks passed
@djjuhasz djjuhasz deleted the dev/issue-39-use-xmlvalidate-activity branch October 22, 2024 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: use the xmlvalidate activity and SIP XSD files to validate XML metadata
2 participants