Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for changes in caching latex free authors #7301

Merged

Conversation

k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 commented Jan 5, 2021

This PR helps with caching of latex-free last names for PR #7228. It should also improve readability.

The main idea is to cache a latex-free AuthorList rather than a different latex-free String for every different use case.

Some suggestions are based on the discussion in #6552 (comment), where,

  1. nameStyle refers to the possibility of having different name parts in biblatex, it won't be addressed in this PR.
  2. displayStyle (e.g., getLastOnly) and conjugationStyle (e.g., Natbib) will likely be addressed by streams and collectors Suggestions for changes in caching latex free authors #7301 (comment).

I'll flesh out the to-do list in the coming days.

  • Change the previous caching solution for latex-free authors
    • Cache in Author
    • Cache in AuthorList
    • Remove the previous caching?
    • Cache Strings used in maintable?
  • "Modernize" the AuthorList construction
    • Add AuthorList.of
    • Add a Collector for AuthorList
    • Remove all public constructors (requires changes in AuthorListParser)
      • Mark all remaining public constructors as deprecated
  • "Modernize" the various AuthorList -> String methods
    • Draft using static String andCoordinatedConjunction
    • andCoordinatedConjunction should be using Optional<String> and not String since an author might be lacking the used value (should be done but out of scope for this PR)
  • Evaluate performance impact
    • Avoid creating an intermediate list in Collector
  • JavaDoc
    • Make sure that no one will use the stream collector on bibEntry.getField().split(" and ").collect(...)
  • Update /layout/format/Authors.java to take advantage of AuthorList This will have to be done in follow up PR. It benefits from other changes to Authors.java and Author.java.
  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository. (no new functionality)

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [Early WIP] Suggestions for changes in caching latex free authors [WIP] Draft suggestions for changes in caching latex free authors Jan 5, 2021
@@ -286,6 +272,15 @@ public Author getAuthor(int i) {
return authors;
}

public List<Author> getLatexFreeAuthors() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could use the same strategy as for the Author class here, and introduce a AuthorList latexFree() method. Then one could also remove all the "latexFree" versions of the methods below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to implements List<Author> with a cache strategy so that repeated AuthorList.latexFree().latexFree()... calls returns the same AuthorList (I am trying to make it immutable anyway).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeated calls you might be able to deal with as follows:

latexFree() {
  if authorsLatexFree == null
      authorsLatexFree = AuthorList.of(unicode converted...)
      authorsLatexFree.authorsLatexFree = authorsLatexFree 
}
return authorsLatexFree;

but honestly, I wouldn't worry too much about it. In the normal usage, you get the author list from an entry and then use latexFree once if you want to have a latexfree version.

Implementing List<Author> is a good idea but might be a lot of work. For example, you want to make sure that add/remove etc update also the latexfree cache. Maybe it's enough for now to implement AuthorList.stream().

Copy link
Member Author

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing List<Author> is a good idea but might be a lot of work

I have to admit I have not looked into it in too much detail yet. My "high-level" approach were to make an unmodifiable list based on util.AbstractList. At least the API documentation claims it shouldn't be too hard 😜

you want to make sure that add/remove etc update also the latexfree cache

Regarding immutability

I believe AuthorList should be unmodifiable because,

  1. It is non-trivial to add/remove authors while dealing with bibtex. {Barnes and Noble} and Author would need to insert Noble between Barnes and Author if the braces are removed. It is much easier to just re-parse it.
  2. There is currently no parts of JabRef that relies on a list of authors being mutable. Barring AuthorListParser I have replaced all uses with Streams. The mutability required by AuthorListParser can be addressed but I haven't decided what is the best solution that does not come at a performance cost. (I am leaning towards making AuthorListParser an inner class of AuthorList to avoid creating an extra list/array through AuthorList.of)
  3. Even if I am wrong, all static methods in AuthorList can (and in my opinion should) be made to work with List<Author>. Worst case scenario, someone that must have AuthorList mutability can use any class extending List and use static calls instead.

Regarding mutability

Because I think immutability is the right approach, I haven't really spent as much thought on this option. I think that using an EnumMap (or anything with an iterator) to store cached lists of AuthorLists that are extended by Decorators would allow an easy and less error prone implementation. Adding/removing elements can be overridden to primarily work with indices.

E.g., if a latex free AuthorList wants to remove a specific Author, first make sure that the Author is latex free and then find the index of that Author (if it is in the list), then use an iterator over the cached AuthorLists to remove that specific index. It is really not safe for concurrent modifications but neither is the current one.

I can add a "draft" implementation of one of these decorators if you'd prefer that to this explanation.

"P.S.", if you think mutability is the way to go, feel free to tell me that, and I'll look into it more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought would be to have it as a mutable list. This might become handy in the future, although I don't really have a use case right now. But for now I would keep the structure like it is right now with a wrapped List<Author> variable. Of course, if you want to add convient methods like stream() or something like this, you are very welcome to do so. I suggest we keep it growing organically, and if at some point many of the list methods are implemented we can also fully implement the interface.

Copy link
Member Author

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuthorList -> str methods in various places

The AuthorList -> ? is needed in various places. Take institute authors, both in

// FIXME: #4152 This is an ugly hack because the latex2unicode formatter kills of all curly braces, so no more corporate author parsing possible
String authorLatexFree = entry.getLatexFreeField(field).orElse("");
if (corporate) {
authorLatexFree = "{" + authorLatexFree + "}";
}

and

String lastName = author.getLast()
.map(lastPart -> isInstitution(author) ?
generateInstitutionKey(lastPart) :
LatexToUnicodeAdapter.format(lastPart))
.orElse(null);

In both locations they should rely on the parsing of AuthorListParser and only utilize List<Author>.
I even have an Author.isInstitute() method in one of my local branches ^^

getLatexFreeField could be considered the culprit of this.

You can also view #7228 as related to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno. Do you think this discussion will lead to anything concrete?
I am happy just implementing public AuthorList latexFree() and killing of quite a few methods 😄

As you might have noticed I have ideas both for dealing with getLatexFreeField and implementing List. Perhaps that is again better for a different PR and I can expand on the idea(s) so they can be criticised in a more structure fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these workarounds should actually be moved to the AuthorListParser. There was a dark time in JabRef's history, where there was no Author class and everyone reimplemented different parsing and serialization schemes again and again. Happily it's now 2021 ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it makes sense to add the "Workarounds" for the Authorlist in one central place. And especially institution handling is important for exporters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a follow-up PR moving the formatting code of layout/Authors.java to AuthorList. I believe they produce different results when using oxford commas for two authors, which is not ideal.

@calixtus
Copy link
Member

Devcall: whats the status here?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

@calixtus basically me not having as much time for this as I would have hoped.

The remaining "big picture" parts (in my opinion) are to,

  1. make it harder to use lists of authors without the AuthorList class (to discourage writing more author parsers)
  2. figure out how to minimize the performance impact

# Conflicts:
#	src/main/java/org/jabref/model/entry/Author.java
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

k3KAW8Pnf7mkmdSMPHz27 commented Mar 10, 2021

Using the extremely scientific approach of literally measuring the time it takes to open a .bib with 6400+ entries and an AutomatedPersonGroup takes me 47s until the 'AllEntries' count shows up in the current master and 50s with the changes currently in this branch.
Perhaps there is nothing to worry about.

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 marked this pull request as ready for review March 11, 2021 18:55
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Draft suggestions for changes in caching latex free authors Suggestions for changes in caching latex free authors Mar 11, 2021
@calixtus calixtus added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels Mar 13, 2021
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

@calixtus @tobiasdiez would it help if I split this into 2-3 PRs? It should make it easier/less tedious to review.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it just didn't had this on the radar any more. I've now went through it and it looks very good. I have one suggestion for the caching...

src/main/java/org/jabref/model/entry/Author.java Outdated Show resolved Hide resolved
src/test/java/org/jabref/model/entry/AuthorListTest.java Outdated Show resolved Hide resolved
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

Sorry, it just didn't had this on the radar any more

No worries. I am trying to learn to scope/limit my PRs better. My apologies that it grew this big and I appreciate that you have stuck with it from the start! ❤️

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ❤️

Copy link
Member

@calixtus calixtus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through your changes. Looks good to me too. Nothing more to say. 😄 Merging now.

@calixtus calixtus merged commit 13a4ad5 into JabRef:master Mar 25, 2021
Siedlerchr added a commit that referenced this pull request Mar 28, 2021
* upstream/master: (191 commits)
  Fix for issue 7416: font size of the preferences dialog does not update with the rest of the GUI. (#7509)
  Fix school/instituation is printed twice (#7574)
  Dsiable notarisation until we hae an account for JabRef e.V. (#7572)
  Fix citation keys unintentionally being overwritten on import (#7443)
  Fix AuthentificationPlugin not declared in mergedModule (#7570)
  Suggestions for changes in caching latex free authors (#7301)
  Add simple Unit Tests (#7542)
  Fix drag and drop into empty library (#7555)
  Bump richtextfx from 0.10.4 to 0.10.6 (#7563)
  Bump pdfbox from 2.0.22 to 2.0.23 (#7561)
  Bump org.eclipse.jgit (#7560)
  Bump fontbox from 2.0.22 to 2.0.23 (#7562)
  Bump guava from 30.1-jre to 30.1.1-jre (#7564)
  Bump xmpbox from 2.0.22 to 2.0.23 (#7565)
  Bump hmarr/auto-approve-action from v2.0.0 to v2.1.0 (#7566)
  Add gource (#7193)
  UI: Fix for group icon (#7552)
  Fix for issue 6487: Opening BibTex file (doubleclick) from Folder with spaces not working (#7551)
  add ability to insert arxivId (#7549)
  Fixed missing trigger for linked file operations (#7548)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants