Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JSON schema to validate that URLs are valid #974

Open
jacobthill opened this issue Sep 28, 2022 · 1 comment
Open

Update JSON schema to validate that URLs are valid #974

jacobthill opened this issue Sep 28, 2022 · 1 comment
Labels
question Further information is requested

Comments

@jacobthill
Copy link
Contributor

We need a test to ensure that the data harvested from a data provider makes it all the way through our ETL pipeline and into the web application. The data is harvested, transformed in traject, and loaded into the DLME web application. We have test in Airflow to check that the number of records harvested matches the number of records in the Intermediate Representation (IR) after transform. However, there are still cases where traject will not through an error but Spotlight will not like something about a record in the IR. Sometimes this results in an error that surfaces when attempting to load the records into Spotlight but sometimes no error is surfaced and some of the records just don't load into Spotlight. In these cases, the only indication that something went wrong is that the record count in Spotlight doesn't match the record count in the IR. We need a way to compare the record count in the IR to the record count in Spotlight. This might be a browser test, or a Solr query, or maybe both.

@thatbudakguy thatbudakguy added the question Further information is requested label Feb 8, 2023
@thatbudakguy
Copy link
Member

One reliable way to repro this is to try to index a record that has a non-url string value for agg_preview.

@thatbudakguy thatbudakguy changed the title Test record count in Spotlight to record count in Intermediate Representation Indexing some records silently fails Feb 17, 2023
@thatbudakguy thatbudakguy transferred this issue from sul-dlss/dlme Feb 17, 2023
@thatbudakguy thatbudakguy changed the title Indexing some records silently fails Update JSON schema to validate that URLs are valid Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants