-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for changes in caching latex free authors #7301
Suggestions for changes in caching latex free authors #7301
Conversation
Also removes the previous cache solution
@@ -286,6 +272,15 @@ public Author getAuthor(int i) { | |||
return authors; | |||
} | |||
|
|||
public List<Author> getLatexFreeAuthors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could use the same strategy as for the Author class here, and introduce a AuthorList latexFree()
method. Then one could also remove all the "latexFree" versions of the methods below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to implements List<Author>
with a cache strategy so that repeated AuthorList.latexFree().latexFree()...
calls returns the same AuthorList
(I am trying to make it immutable anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repeated calls you might be able to deal with as follows:
latexFree() {
if authorsLatexFree == null
authorsLatexFree = AuthorList.of(unicode converted...)
authorsLatexFree.authorsLatexFree = authorsLatexFree
}
return authorsLatexFree;
but honestly, I wouldn't worry too much about it. In the normal usage, you get the author list from an entry and then use latexFree
once if you want to have a latexfree version.
Implementing List<Author>
is a good idea but might be a lot of work. For example, you want to make sure that add/remove etc update also the latexfree cache. Maybe it's enough for now to implement AuthorList.stream()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing
List<Author>
is a good idea but might be a lot of work
I have to admit I have not looked into it in too much detail yet. My "high-level" approach were to make an unmodifiable list based on util.AbstractList
. At least the API documentation claims it shouldn't be too hard 😜
you want to make sure that add/remove etc update also the latexfree cache
Regarding immutability
I believe AuthorList
should be unmodifiable because,
- It is non-trivial to add/remove authors while dealing with bibtex.
{Barnes and Noble} and Author
would need to insert Noble between Barnes and Author if the braces are removed. It is much easier to just re-parse it. - There is currently no parts of JabRef that relies on a list of authors being mutable. Barring AuthorListParser I have replaced all uses with Streams. The mutability required by
AuthorListParser
can be addressed but I haven't decided what is the best solution that does not come at a performance cost. (I am leaning towards makingAuthorListParser
an inner class ofAuthorList
to avoid creating an extra list/array throughAuthorList.of
) - Even if I am wrong, all static methods in
AuthorList
can (and in my opinion should) be made to work withList<Author>
. Worst case scenario, someone that must haveAuthorList
mutability can use any class extendingList
and use static calls instead.
Regarding mutability
Because I think immutability is the right approach, I haven't really spent as much thought on this option. I think that using an EnumMap
(or anything with an iterator) to store cached lists of AuthorList
s that are extended by Decorators would allow an easy and less error prone implementation. Adding/removing elements can be overridden to primarily work with indices.
E.g., if a latex free AuthorList
wants to remove a specific Author
, first make sure that the Author
is latex free and then find the index of that Author
(if it is in the list), then use an iterator over the cached AuthorList
s to remove that specific index. It is really not safe for concurrent modifications but neither is the current one.
I can add a "draft" implementation of one of these decorators if you'd prefer that to this explanation.
"P.S.", if you think mutability is the way to go, feel free to tell me that, and I'll look into it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought would be to have it as a mutable list. This might become handy in the future, although I don't really have a use case right now. But for now I would keep the structure like it is right now with a wrapped List<Author>
variable. Of course, if you want to add convient methods like stream()
or something like this, you are very welcome to do so. I suggest we keep it growing organically, and if at some point many of the list methods are implemented we can also fully implement the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AuthorList -> str
methods in various places
The AuthorList -> ?
is needed in various places. Take institute authors, both in
jabref/src/main/java/org/jabref/logic/msbib/MSBibConverter.java
Lines 127 to 131 in 54f27e9
// FIXME: #4152 This is an ugly hack because the latex2unicode formatter kills of all curly braces, so no more corporate author parsing possible | |
String authorLatexFree = entry.getLatexFreeField(field).orElse(""); | |
if (corporate) { | |
authorLatexFree = "{" + authorLatexFree + "}"; | |
} |
and
jabref/src/main/java/org/jabref/logic/citationkeypattern/BracketedPattern.java
Lines 497 to 501 in 54f27e9
String lastName = author.getLast() | |
.map(lastPart -> isInstitution(author) ? | |
generateInstitutionKey(lastPart) : | |
LatexToUnicodeAdapter.format(lastPart)) | |
.orElse(null); |
In both locations they should rely on the parsing of AuthorListParser
and only utilize List<Author>
.
I even have an Author.isInstitute()
method in one of my local branches ^^
getLatexFreeField
could be considered the culprit of this.
You can also view #7228 as related to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno. Do you think this discussion will lead to anything concrete?
I am happy just implementing public AuthorList latexFree()
and killing of quite a few methods 😄
As you might have noticed I have ideas both for dealing with getLatexFreeField
and implementing List
. Perhaps that is again better for a different PR and I can expand on the idea(s) so they can be criticised in a more structure fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these workarounds should actually be moved to the AuthorListParser
. There was a dark time in JabRef's history, where there was no Author class and everyone reimplemented different parsing and serialization schemes again and again. Happily it's now 2021 ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it makes sense to add the "Workarounds" for the Authorlist in one central place. And especially institution handling is important for exporters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a follow-up PR moving the formatting code of layout/Authors.java
to AuthorList
. I believe they produce different results when using oxford commas for two authors, which is not ideal.
Add links to discourse forum
Update issue templates
Add links to the discourse forum
Update issue templates
Devcall: whats the status here? |
@calixtus basically me not having as much time for this as I would have hoped. The remaining "big picture" parts (in my opinion) are to,
|
# Conflicts: # src/main/java/org/jabref/model/entry/Author.java
Using the extremely scientific approach of literally measuring the time it takes to open a .bib with 6400+ entries and an AutomatedPersonGroup takes me 47s until the 'AllEntries' count shows up in the current master and 50s with the changes currently in this branch. |
The previous AuthorList cached Strings. The new version caches AuthorLists which requires changes in all cache tests.
@calixtus @tobiasdiez would it help if I split this into 2-3 PRs? It should make it easier/less tedious to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it just didn't had this on the radar any more. I've now went through it and it looks very good. I have one suggestion for the caching...
No worries. I am trying to learn to scope/limit my PRs better. My apologies that it grew this big and I appreciate that you have stuck with it from the start! ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through your changes. Looks good to me too. Nothing more to say. 😄 Merging now.
* upstream/master: (191 commits) Fix for issue 7416: font size of the preferences dialog does not update with the rest of the GUI. (#7509) Fix school/instituation is printed twice (#7574) Dsiable notarisation until we hae an account for JabRef e.V. (#7572) Fix citation keys unintentionally being overwritten on import (#7443) Fix AuthentificationPlugin not declared in mergedModule (#7570) Suggestions for changes in caching latex free authors (#7301) Add simple Unit Tests (#7542) Fix drag and drop into empty library (#7555) Bump richtextfx from 0.10.4 to 0.10.6 (#7563) Bump pdfbox from 2.0.22 to 2.0.23 (#7561) Bump org.eclipse.jgit (#7560) Bump fontbox from 2.0.22 to 2.0.23 (#7562) Bump guava from 30.1-jre to 30.1.1-jre (#7564) Bump xmpbox from 2.0.22 to 2.0.23 (#7565) Bump hmarr/auto-approve-action from v2.0.0 to v2.1.0 (#7566) Add gource (#7193) UI: Fix for group icon (#7552) Fix for issue 6487: Opening BibTex file (doubleclick) from Folder with spaces not working (#7551) add ability to insert arxivId (#7549) Fixed missing trigger for linked file operations (#7548) ...
This PR helps with caching of latex-free last names for PR #7228. It should also improve readability.
The main idea is to cache a latex-free
AuthorList
rather than a different latex-free String for every different use case.Some suggestions are based on the discussion in #6552 (comment), where,
nameStyle
refers to the possibility of having different name parts in biblatex, it won't be addressed in this PR.displayStyle
(e.g.,getLastOnly
) andconjugationStyle
(e.g.,Natbib
) will likely be addressed by streams and collectors Suggestions for changes in caching latex free authors #7301 (comment).I'll flesh out the to-do list in the coming days.
Author
AuthorList
CacheString
s used in maintable?AuthorList
constructionAuthorList.of
Collector
forAuthorList
AuthorListParser
)Mark all remaining public constructors as deprecatedAuthorList
->String
methodsstatic String andCoordinatedConjunction
(should be done but out of scope for this PR)andCoordinatedConjunction
should be usingOptional<String>
and notString
since an author might be lacking the used valueEvaluate performance impactAvoid creating an intermediate list inCollector
bibEntry.getField().split(" and ").collect(...)
UpdateThis will have to be done in follow up PR. It benefits from other changes to/layout/format/Authors.java
to take advantage ofAuthorList
Authors.java
andAuthor.java
.Tests created for changes (if applicable)Screenshots added in PR description (for UI changes)Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.(no new functionality)