Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Avram schema files of K10plus and allow updating the schema #291

Open
nichtich opened this issue Jul 6, 2023 · 6 comments
Open
Assignees
Milestone

Comments

@nichtich
Copy link
Collaborator

nichtich commented Jul 6, 2023

The repository contains several copies of Avram Schema of K10plus:

src/main/resources/pica/avram-k10plus-title.json
src/main/resources/pica/avram-k10plus.json
src/test/resources/pica/avram-k10plus-title.json
src/test/resources/pica/avram-k10plus.json

There should at most be

  • src/main/resources/pica/avram-k10plus-title.json with the current schema
  • src/test/resources/pica/avram-k10plus-title.json with a smaller schema only used for testing (not sure if this is needed)

I was not able to update the schema with src/main/resources/pica/update-avram-k10plus-title.sh because afterwards tests did not succeed, so how can the schema be updated?

@pkiraly
Copy link
Owner

pkiraly commented Jul 6, 2023

Unfortunatey it is a maven and Java specific thing that during tests the resources are read from the test directory by default, that is the reason for the duplication. Needs further investigation to read it always from the main directory. A symbolic link would be enough for this purpose, but it had a conflict with git.
I agree with removing the older version of the schema.

@pkiraly pkiraly self-assigned this Jul 6, 2023
@pkiraly pkiraly added this to PICA Jul 6, 2023
@pkiraly pkiraly added this to the PICA: 1.3 milestone Jul 6, 2023
@nichtich
Copy link
Collaborator Author

nichtich commented Jul 6, 2023

I've manually updated src/main/resources/pica/avram-k10plus-title.json except for recent changes that affect conditions hard-coded in the tests:

  • change field 036F/00 to 036F/00-09 and its occurrence from null to `
  • change repeatability of field 041A/00-99 from true to false

Such changes will always require to change both schema and tests, so it would help to make sure there is only one place to do changes.

Unfortunatey it is a maven and Java specific thing that during tests the resources are read from the test directory by default, that is the reason for the duplication.

I'll do a feature request to fully remove schema from the jar file for phase 3.

@nichtich nichtich added investigation investigation and removed priority:high labels Jul 6, 2023
@pkiraly
Copy link
Owner

pkiraly commented Aug 29, 2023

Estimation: 4 hours

@pkiraly
Copy link
Owner

pkiraly commented Nov 6, 2023

It is done, testable.

@pkiraly
Copy link
Owner

pkiraly commented Nov 6, 2023

The same situation happened with the vocabularies.json:

  • src/main/resources/pica/vocabularies.json
  • src/test/resources/pica/vocabularies.json

I removed all 3 from the test and updated the links. SO now you can maintain them in src/main/resources/pica/.

@nichtich
Copy link
Collaborator Author

Thanks! It's better but still we have 5 different files when it should be only one or two:

Source

  • src/main/resources/pica/avram-k10plus-title.json used for validation ("title" level 0 only)
  • src/main/resources/pica/avram-k10plus.json (used only in tests so, it should be move to src/test/)

Test

  • src/test/resources/pica/schema/k10plus.json (used in most tests)
  • src/test/resources/pica/schema/pica-schema.json (used only in src/test/java/de/gwdg/metadataqa/marc/utils/pica/reader/PicaPlainReaderTest.java)
  • src/test/resources/pica/schema/pica-schema-extra.json (used only in src/test/java/de/gwdg/metadataqa/marc/utils/pica/reader/PicaPlainReaderTest.java)

The latter two can be merged for sure, I'm not sure about the other three test files. Unless the schema file can be configured on runtime (which I'd welcome), there is no need to have multiple versions in the tests, no? If symlinks don't work, the test file could be removed from git and copied via script on each test run to be sure its always up-to-date.

I've also tried to update the schema in src/main but this change does not pass tests as its result seems to be hard-coded in Java.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In review
Development

No branches or pull requests

2 participants