-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dash from illegal characters. #6300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. I walked through the linked issues and updated your code accordingly. A question still remains.
Could you also please add a CHANGELOG.md
entry (as indicated in the checklist) and create a pull request for https://github.com/JabRef/user-documentation/blob/master/en/fields/README.md (as asked in the checklist)
src/main/java/org/jabref/logic/bibtexkeypattern/BibtexKeyGenerator.java
Outdated
Show resolved
Hide resolved
@@ -27,7 +27,7 @@ | |||
*/ | |||
public static final String APPENDIX_CHARACTERS = "abcdefghijklmnopqrstuvwxyz"; | |||
private static final Logger LOGGER = LoggerFactory.getLogger(BibtexKeyGenerator.class); | |||
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),\\\"-#~^:'`ʹ"; | |||
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),\\\"#~^:'`ʹ"; | |||
private static final String KEY_UNWANTED_CHARACTERS = "{}(),\\\"-"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't you remove the -
it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the recommendation in the README, I thought it was a mistake to include a dash in illegal signs, not those not recommended.
Some characters should not be used in bibtexkey as they are not compatible or not recommended:
{ } ( ) , \ " - # ~ ^ : '
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koppor so should i remove dash from unwanted characters and make update on READEME file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove it from illegal, but keep it in unwanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the usage of KEY_UNWANTED_CHARACTERS
.
The code comment reads as follows:
// User doesn't want us to enforce legal characters. We must still look
// for whitespace and some characters such as commas, since these would
// interfere with parsing:
Thus, the KEY_UNWANTED_CHARACTERS
clearly is a subset of KEY_ILLEGAL_CHARACTERS
. Not sure, whether KEY_ILLEGAL_CHARACTERS
makes sense at all. Maybe because of tool interopability.
I see three options:
A) Remove KEY_UNWANTED_CHARACTERS
completely - and also the preference for it.
B) Set KEY_UNWANTED_CHARACTERS
to a minimum - as indicated in the comment. I would propose {}"
since these are the characters causing issues at parsing. I would not add space since this did not cause any issues until now (at list that I am aware of)
C) Just remove -
from KEY_UNWANTED_CHARACTERS
Please do option B). Please rename KEY_UNWANTED_CHARACTERS
to KEY_CHARACTERS_CAUSING_PARSING_ERRORS
. Please add a comment above that this is a subset of KEY_ILLEGAL_CHARACTERS
.
Thanks @koppor for your tips and attention i will do it asap. |
Ok, so unit test was created to correct operation of individual functions. We removed dash from the key because seems to be not illegal but still unwanted. I also check unit tests from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the test is correct. I would remove all illegal and all unwanted characters from an automatically generated key.
@tobiasdiez The test needs to be adapted as the semantics of If we want to have "beautiful" bibtex keys, we need to add a cleanup formatter removing |
Please note that we have preferences for allowing the unwanted characters there is a checkbox in the preferences |
@Siedlerchr This is what I meant with "prefernces" in option A: (Copied from #6300 (comment)) |
Co-Authored-By: Oliver Kopp <[email protected]>
Co-Authored-By: Oliver Kopp <[email protected]>
*/ | ||
public static final String APPENDIX_CHARACTERS = "abcdefghijklmnopqrstuvwxyz"; | ||
private static final Logger LOGGER = LoggerFactory.getLogger(BibtexKeyGenerator.class); | ||
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),\\\"-#~^:'`ʹ"; | ||
private static final String KEY_UNWANTED_CHARACTERS = "{}(),\\\"-"; | ||
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),=\\\"#%~'`ʹ"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment here referencing https://tex.stackexchange.com/a/408548/9075?
Proposal:
// Source of illegal characters: https://tex.stackexchange.com/a/408548/9075
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),\\\"-#~^:'`ʹ"; | ||
private static final String KEY_UNWANTED_CHARACTERS = "{}(),\\\"-"; | ||
private static final String KEY_ILLEGAL_CHARACTERS = "{}(),=\\\"#%~'`ʹ"; | ||
private static final String KEY_CHARACTERS_CAUSING_PARSING_ERRORS = "{}(),\\\"-"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove -
here, too. Otherwise, the subset property does not hold.
So apparently there are three classes of characters that we don't want to accept:
Right now we have the option to specify the Disallowed characters and toggle if they are included or not. But there is no option to specify Unwanted characters. Proposal:
Alternatively:
This would be slightly more flexible but I couldn't come up with a use case where one really needs a key which contains Disallowed characters. I think the option to disable the "enforce legal characters" came from a time where illegal keys were complained about with a dialog upon loading the library. @dextep @mlep @melTr @JoKalliauer @JabRef/developers what do you think? |
I'm a big fan of the second proposal. But we should point to a help page or introduce a tooltip describing the difference between disallowed and unwanted characters. Maybe also a warning validation for disallowed characters. Let the user decide, if he want's to break bibtex. |
I would suggest
Edit: I removed ' from the unwanted, since it is already a disalowed/illegal character ( #6300 (comment) ) |
@JoKalliauer So you propose to merge illegal and disallowed (into disallowed). - And introduce a new configurable "unwanted". - Sounds good for me. I would also put I would NOT make any difference between BibTeX and BibLaTeX here. Just one merged set of illegal characters for BibTeX and BibLaTeX. |
I see that with my proposal keys such as "Dueck_(1998)" won't be possible any more. On the one hand, when used in the context of Word, this could cause WTF at the user's side. On the other hand, this would simplify our code. |
I also really like the proposal #6300 (comment) from @JoKalliauer. @dextep could you please go ahead and implement it like that. The last point concerning the import, we can ignore for the moment I think. Sorry for the extended discussion concerning this issue. It appeared to be a clearly-defined issue which was good for a first contribution to JabRef, but then surprisingly got more complex. Do you have any questions concerning the implementation? |
Stupid question: What is the difference between illegal and disallowed character? Is there any specification? Do you have to try it with BibTeX,/BibLaTeX/Biber and check which ones (don't) work? |
I'm sorry, but I've been a bit busy lately. I have read your comments and propsial seems to be the best solution. @tobiasdiez no problem i'm glad to be able to participate :) I tried to summarize the changes that should be made:
No code changes required
Edit: Merge illegal
Edit: New
No code changes required
Add unwanred to reported tooltip. Please correct my summary of the changes :) |
@dextep According to https://tex.stackexchange.com/a/408548/110064:
@dextep I assume you meat PS: unwated unicode-characters can be increased to infinity PS2: Are |
The at symbol indicates the start of the bibtex key. Dollar sign may be okay. |
I would like to thank you very much @koppor and @tobiasdiez, it will definitely help me avoid such mistakes in the future ❗ |
@dextep I would suggest you also look into https://lostechies.com/joshuaflanagan/2010/09/03/use-gitk-to-understand-git/ and patiently read it and try to understand it paragraph-by-paragraph. After going through, you'll always have |
Fix unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. This looks very good now. Just a bit of final nitpicking from my side (mostly concerning the tests). Other than that, it's good to go from my side!
src/test/java/org/jabref/logic/bibtexkeypattern/BibtexKeyGeneratorTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jabref/logic/bibtexkeypattern/BibtexKeyGeneratorTest.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jabref/logic/bibtexkeypattern/BibtexKeyGeneratorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A huge step forward. Thank you for the update.
Two nitpicks are remaining. See below.
List<Character> unwantedChars = new ArrayList<>(); | ||
for (char ch : unwantedCharacters.toCharArray()) { | ||
unwantedChars.add(ch); | ||
} | ||
|
||
StringBuilder newKey = new StringBuilder(); | ||
for (int i = 0; i < key.length(); i++) { | ||
char c = key.charAt(i); | ||
if (KEY_ILLEGAL_CHARACTERS.indexOf(c) == -1) { | ||
if (!DISALLOWED_CHARACTERS.contains(c) && !unwantedChars.contains(c)) { | ||
newKey.append(c); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace the code:
String newKey = key.chars()
.filter(c -> unwantedCharacters.indexOf(c) == -1)
.filter(c -> !DISALLOWED_CHARACTERS.contains((char) c))
.collect(StringBuilder::new,
StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
I came up while thinking as follows (and stack-overflow-ing for String filter characters, ...). Used answer: https://stackoverflow.com/a/20268845/873282
Why not a simple indexOf
thing?
"abcdef".indexOf('c') != -1;
See https://stackoverflow.com/a/506107/873282.
Second reason: Conversion into the list is IMHO more CPU consuming than the indexOf
call. I know, that it is not critical, but I think, the conversion (plus the readble !unwantedChars.contains(c)
) is less readable then a simple (unwantedChars.indexOf(c) >= 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think of it that way, thank you for your review @koppor 👍!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More old code can be deleted 🎉
@@ -82,17 +82,15 @@ public static String removeUnwantedCharacters(String key, String unwantedCharact | |||
for (char ch : unwantedCharacters.toCharArray()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the unwantedChars
variable, do we? Can you please delete that for loop and the variable? --> the code on line 86 directly accesses the method parameter unwantedChararacters
!
Side note: This is an example why similar variable names can be mixed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, we don't need it, I'm sorry. Every time I get the impression that I didn't notice something. 🐞
Yeah. Side comment: It should have been "Fixes #... Fixes #... Fixes #..." not just "Fix" once. See https://help.github.com/en/enterprise/2.16/user/github/managing-your-work-on-github/closing-issues-using-keywords#closing-multiple-issues for details. |
And anyway, thanks you guys for your patience and tips, I'm learn a lot from you! 🎉 |
@dextep Thank you again for your contribution and your enduring patience! ;) Although sometimes it feels nasty, having to fix all those little code issues, but in the end it contributes to a good code standard which helps others and maybe your "future you" to more easily understand it. |
…ptyLib * upstream/master: (23 commits) Make wrap fields also visible in entry editor (#6315) Add launcher to fix extension import in snap (#6439) add concise message when SaveException happen (#6444) Squashed 'src/main/resources/csl-locales/' changes from 79845b087b..cbb45961b8 Squashed 'src/main/resources/csl-styles/' changes from 906cd6d..270cd32 Addition to the early pull #6438 (#6441) Fix the bug #6421 (#6438) Cleanup dead code (#6436) Fixed entry duplication on file download (#6437) Add workaround for buildSrc issue (#6433) Remove dash from illegal characters. (#6300) Docs: For developers: New architectural decision added to list (#6397) Added a download checkbox to the import dialog (#6381) Bump jaxb-xjc from 2.3.2 to 2.3.3 (#6410) Bump flexmark-ext-gfm-strikethrough from 0.61.20 to 0.61.24 (#6413) Bump byte-buddy-parent from 1.10.9 to 1.10.10 (#6408) Bump flexmark-ext-gfm-tasklist from 0.61.20 to 0.61.24 (#6412) Bump org.beryx.jlink from 2.17.8 to 2.18.0 (#6411) Bump tika-core from 1.24 to 1.24.1 (#6409) Bump flexmark from 0.61.20 to 0.61.24 (#6416) ...
* upstream/master: Addition to the early pull #6438 (#6441) Fix the bug #6421 (#6438) Cleanup dead code (#6436) Fixed entry duplication on file download (#6437) Add workaround for buildSrc issue (#6433) Remove dash from illegal characters. (#6300) Docs: For developers: New architectural decision added to list (#6397)
43566f2 Update molecular-oncology.csl (#6354) 38f2b5f Update san-francisco-estuary-and-watershed-science.csl (#6350) 724cb12 Update australasian-journal-of-philosophy.csl (#6344) df6af86 Fix webpage in-text citation for council-of-science-editors-author-date.csl (#6318) 6900b58 Add month to magazine for Bluebook e0a8148 Update cambridge-university-press-author-date-cambridge-a.csl (#6345) 0823448 Add space & comma before pp (#6343) ff38cd2 Update american-journal-of-respiratory-and-critical-care-medicine.csl (#6341) 8727dfb Update early-music-history.csl (#6323) 1154354 Update boletin-de-pediatria.csl (#6310) f25438e Update pravnik.csl (#6309) db7a4ae Update zoological-journal-of-the-linnean-society.csl (#6304) 6c043a7 Create polygraphia.csl (#6307) 255e00c Create intellect-newgen-books.csl (#6308) 323629f Update historical-materialism.csl (#6300) f62b70d Create european-review-of-international-studies.csl (#6301) dff2698 Update ucl-university-college-harvard.csl (#6298) 0ce09c9 Update sciences-po-ecole-doctorale-note-french.csl (#6290) bdd53ec Update sciences-po-ecole-doctorale-author-date.csl (#6291) efde4d4 Create journal-of-law-medicine-ethics.csl (#6296) 7539b2c Create theses-de-sorbonne-universite.csl (#6295) 905f25a Update biochemical-society-transactions.csl (#6292) a76a3f5 Update smithsonian-institution-scholarly-press author-date and note (#6294) e6b6c6c Create exploration-of-targeted-anti-tumor-therapy.csl (#6276) 024c9c8 Create nys-nydanske-studier.csl (#6331) d9ac8e1 Update vox-sanguinis.csl (#6327) 9a98e92 Remove DOI from Genetics & Molecular Biology git-subtree-dir: buildres/csl/csl-styles git-subtree-split: 43566f2
Fixes #6295, Fixes #6257, Fixes #6252.