Skip to content

Data flow

Jakob Voß edited this page Jul 10, 2023 · 2 revisions

This table summarizes which data (files and/or solr) each task of qa-catalogue takes as input and as output.

task input output
validate $MARC_DIR/$MASK ..
validate-sqlite
completeness
completeness-sqlite
classifications
authorities
tt-completeness
shelf-ready-completeness
bl-classification
serial-score
format
functional-analysis
network-analysis
pareto
marc-history
record-patterns
prepare-solr Solr (status of cores only) Solr cores (only if they were missing)
index $MARC_DIR/$MASK (and some files in $BASE_OUTPUT_DIR/$NAME?) Solr records (into new core, then swaps cores)
export-schema-files none (Java code) ./marc-schema/*.json

In addition, most steps write to $BASE_INPUT_DIR/_reports/.

Clone this wiki locally