Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Compound Child metadata ignored #494

Open
bondjimbond opened this issue Jan 21, 2019 · 7 comments
Open

CSV Compound Child metadata ignored #494

bondjimbond opened this issue Jan 21, 2019 · 7 comments

Comments

@bondjimbond
Copy link
Collaborator

I'm trying to run a Compound CSV job, and running into problems getting the metadata in child rows to end up in the child MODS.xml files. MIK is treating the objects as if they do not appear in the spreadsheet.

Here are my relevant config settings:

; MIK configuration file for an OAI-PMH toolchain.

[CONFIG]
config_id = DOH compound
last_updated_on = "2018-07-30"
last_update_by = "bw"

[SYSTEM]
date_default_timezone = 'America/Vancouver'
verify_ca = 0

[FETCHER]
class = Csv
input_file = "DOH/metadata/princeton_allenby.csv"
temp_directory = "/Volumes/Arca/doh_temp"
record_key = key
child_key = child_key

[METADATA_PARSER]
class = mods\CsvToMods
repeatable_wrapper_elements[] = name
repeatable_wrapper_elements[] = subject
repeatable_wrapper_elements[] = identifier
mapping_csv_path = "DOH/metadata/doh_mapping.csv"

[FILE_GETTER]
class = CsvCompound
input_directory = "/Volumes/Arca/DOH_FILES/allenby"
temp_directory = "/Volumes/Arca/doh_temp"
compound_directory_field = Directory

[WRITER]
;datastreams[] = MODS
class = CsvCompound
metadata_filename = MODS.xml
preserve_content_filenames = true
;require_source_file = false
output_directory = "/Volumes/Arca/doh/allenby"
child_title = "%parent_title%, side %sequence_number%"
child_sequence_separator = _
min_children = 1
postwritehooks[] = "/usr/bin/php extras/scripts/postwritehooks/generate_compound_structure_file.php"

Am I missing some setting to allow MIK to read the child objects' metadata rows, or is MIK broken?

@mjordan
Copy link
Collaborator

mjordan commented Jan 21, 2019

@bondjimbond I can take a look at this tonight. Can you share your metadata spreadsheet and mappings file with me via email?

@bondjimbond
Copy link
Collaborator Author

Sorry @mjordan, I just figured out what the problem is.

So my CSV's child_key colum includes "1" for the first child, "2" for the second. Due, most likely, to annoying spreadsheet editor tendencies to auto-format numerical cells as numbers instead of text. The filenames, meanwhile, end in _01.tif and _02.tif.

MIK throws the error:

[2019-01-21 21:04:04] ErrorException.ERROR: ErrorException {"message":"file_put_contents(/Volumes/Arca/doh/allenby/PRIN_Wright_34/2/MODS.xml): failed to open stream: No such file or directory"

Because based on the filenames, MIK created directories named "01" and "02", but based on the metadata it's trying to find the directory "1" and "2" to write the correct metadata.

So, a couple of takeaways:

  1. We should document this rather common problem somehow and warn users about it.
  2. Maybe it would be nice for MIK to be able to recognize that "1" and "01" could mean the same thing when it comes to child keys.

@mjordan
Copy link
Collaborator

mjordan commented Jan 21, 2019

Agreed that we should do something here. These two things are a good start, but I wonder if we should have a --checkconfig option for this as well.

@bondjimbond
Copy link
Collaborator Author

Indeed!

@mjordan
Copy link
Collaborator

mjordan commented Jan 21, 2019

Or possibly building a check into https://github.com/mjordan/iipqa.

@bondjimbond
Copy link
Collaborator Author

I'd rather see it just built into the checks done by --checkconfig... Check that the values in the child_key column match the values in the file extensions.

@bondjimbond
Copy link
Collaborator Author

It should be allowed for the directory to contain file numbers not mentioned in the CSV, but it should be illegal for the CSV to contain child_key values that are not found in the directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants