-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove condition "If ISBN is equal means it's a duplicate" in "duplicate check" #11191
Remove condition "If ISBN is equal means it's a duplicate" in "duplicate check" #11191
Conversation
… 'ISBN' a weight Signed-off-by: AbdAlRahmanGad <[email protected]>
@@ -65,6 +64,7 @@ public class DuplicateCheck { | |||
DuplicateCheck.FIELD_WEIGHTS.put(StandardField.NOTE, 0.1); | |||
DuplicateCheck.FIELD_WEIGHTS.put(StandardField.COMMENT, 0.1); | |||
DuplicateCheck.FIELD_WEIGHTS.put(StandardField.DOI, 3.); | |||
DuplicateCheck.FIELD_WEIGHTS.put(StandardField.ISBN, 2.5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the modification also work without this addition? Reading the comment, this adds some duplicate check for articles having the same ISBN.
I think, it's good to keep, but I am wondering... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it works without this addition. Should I remove it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, remove it. We can add if we have more tests for similar entries... - I think, if entries are different, the same ISBN could make it the same. However, I am not sure about the algoirthm. Is it typos at the author? If same ISBN, the entries are same? If no ISBN, not same? That feels strange. -- Thus, remove.
@@ -525,7 +524,9 @@ void compareOfTwoEntriesWithSameContentAndMixedLineEndingsReportsNoDifferences() | |||
assertTrue(duplicateChecker.isDuplicate(entryOne, entryTwo, BibDatabaseMode.BIBTEX)); | |||
} | |||
|
|||
@Disabled("Book entries can have the same ISBN due to different chapters. The Test fails as crossref identifies both entries as the same.") | |||
/** | |||
* Book entries can have the same ISBN due to different chapters. The Test fails as crossref identifies both entries as the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use one space after *
-- normal java doc style.
Modify the text. The test DOES NOT fail!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I added a CHANGELOG.md entry
- I changed "closes" to "refs" in the description as there is more to do (Duplicate check during import marks articles from collection as possible duplicates #8885 (comment))
- As is, it improves the situation. Thus, we merge
- More things, in a follow-up PR.
Refs #8885
The reason of the problem is that in the 'duplicate check' if the "ISBN" is equal the entries are marked as duplicates
but that's not true in all cases. See #9769 (comment).
What I did was remove the condition and add weight to the field. The weight should be revised as I gave it a random value to demonstrate the idea.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)