Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid errorId in search #187

Open
nichtich opened this issue May 27, 2024 · 7 comments
Open

Avoid errorId in search #187

nichtich opened this issue May 27, 2024 · 7 comments
Assignees
Labels
partner:vzg Feature request from VZG priority:high
Milestone

Comments

@nichtich
Copy link
Collaborator

nichtich commented May 27, 2024

Links to issues use the Solr field errorId but this is not stable on updates. At least for "unknown subfield" type of errors the search can better use the specific subfield index. For instance instead of errorId:112 use PICA 017Ex_ss:* (unknown PICA subfield 017E$x) to list all records having this subfield.

Linking to errors based on unknown field requires #128.

@nichtich nichtich added priority:high partner:vzg Feature request from VZG labels May 27, 2024
@pkiraly
Copy link
Owner

pkiraly commented May 27, 2024

The errors are stored in 3 places:

  • files
  • a sqlite3 database
  • Solr
    The Solr indexing happen later, but sqlite index is part of the validation.
    What we can do to check if Solr index is older or not than the validation. If it is newer, we use Solr, otherwise we use the sqlite3.

@nichtich
Copy link
Collaborator Author

nichtich commented May 27, 2024 via email

@pkiraly
Copy link
Owner

pkiraly commented May 27, 2024

The ideal case is that validation is running in the same session as Solr (run all runs all the analyses and Solr). If I understand the situation correctly you problem is that after validation you did not run Solr indexing, that is why Solr is outdated. Is that true? If yes, then the solution is simply run Solr again. If it is not possible for whatever reason (e.g. becuase it takes too much time), then the solution I described might work.

@nichtich
Copy link
Collaborator Author

If I understand the situation correctly you problem is that after validation you did not run Solr indexing, that is why Solr is outdated.

No, the problem is to not have stable links to errors because errorId is just an internal identifier not stable across time. Solr index query 017Ex_ss:* is stable unless qa-catalogue changes how fields are indexed. If people share the link to list of records with a specific error, the list should always lead to same error.

@nichtich
Copy link
Collaborator Author

The search link can be adjusted as following

  • undefined subfield a of field XXXX: search XXXXa_ss:*
  • undefined field XXXX: search XXXX_count_i:*(requires configurationindexFieldCounts`)
  • repetition of non-repeatable field XXXX: search XXXX_count_i:[1 TO *] (requires configuration indexFieldCounts)
  • repetition of non-repeatable subfield a of field XXXX: search XXXXa_count_i:[1 TO *] (requires configuration indexSubfieldCounts not implemented yet)

@pkiraly
Copy link
Owner

pkiraly commented May 30, 2024

Good idea! For PICA it will work perfectly. For MARC: I am not sure if all other errors can be transformed into queries like this, but it is worth to investigate. Based on a quick overview of the error types most of them will work.

@nichtich
Copy link
Collaborator Author

nichtich commented Dec 3, 2024

If ErrorId is needed, it might be generated as hash value of the actual search (#187 (comment)), so it will be stable as well.

@pkiraly pkiraly self-assigned this Dec 11, 2024
@pkiraly pkiraly added this to the PICA 1.3 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
partner:vzg Feature request from VZG priority:high
Projects
None yet
Development

No branches or pull requests

2 participants