-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crashing index
command when targeted directory contains subject files
#705
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #705 +/- ##
=======================================
Coverage 99.66% 99.66%
=======================================
Files 89 89
Lines 6288 6295 +7
=======================================
+ Hits 6267 6274 +7
Misses 21 21
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the problem for sure and the variable renamings are sensible.
However, I think another approach would also be possible. DocumentDirectory already takes a require_subjects
flag, and when set to False, the subjects are not required, but they are still parsed from tsv/key files and returned if such files exist. This is wasted effort in the case of the annif index
operation, because the subjects aren't going to be used anyway. So another solution would be to modify DocumentDirectory so it doesn't even check for tsv/key files if require_subjects==False
. Then it would also be unnecessary to pass the subject index and language to the DocumentDirectory. If the default value of these parameters were set to None, it could be constructed like this: DocumentDirectory(directory, require_subjects=False)
. I think that would be even more elegant than the solution in this PR. However, I'm happy to accept either solution.
SonarCloud Quality Gate failed. 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Dear Annif colleagues, thank you for already working on this problem. We in the DNB have recently started using Annif 0.61.0 and have now noticed that we cannot currently use an "index" command with this version, so we have to switch back to 0.60.0. Just one question: Is a bugfix release 0.61.1 planned or will the solution come with 1.0? Thanks and greetings, Sandro |
Hi @san-uh, This bug realized only when the targeted directory contains files with either *.tsv or *.key extension (as far as I'm aware). Does |
Hello @juhoinkinen, |
The
annif index
command crashes when the targeted directory contains subject files (files with either*.tsv
or*.key
extensions):The bug is due to this change done in PR #663.
I made the change because it seemed unnecessary and confusing to pass the subject index and language to
DocumentDirectory
when it is used only for obtaining documents to give suggestions to.Reverting the change, i.e. passing subject index and language to
DocumentDirectory
is the simplest way to fix the bug, and on second thought maybe it also is better conceptually: corpora should have subject index and language defined, even if they are not used.Also renames vars to better correspond their origin/usage: keyfile(name) -> subjfile(name).