-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mEDRA DOI fetcher implementation. #6641
Conversation
|
||
// mEDRA requires Accept header attribute with desired content type | ||
if (source.toString().contains(Medra.API_URL)) { | ||
this.addHeader("Accept", Medra.CONTENT_TYPE_JSON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use the application/x-bibtex
header as for the other DOI fetcher well? That would make fetching a lot easier!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few tests but bibtext type seems to not work as exptected with mEDRA.
Here are the outputs of URLDonwnload.asString():
mEDRA
@article{OLLETT, Andrew_2018, title={S?tav?hana and N?g?rjuna}, ISSN={2507-0347}, url={http://doi.org/10.2143/JIABS.41.0.3285748}, DOI={10.2143/JIABS.41.0.3285748}, number={0}, journal={Journal of the International Association of Buddhist Studies}, publisher={Peeters online journals}, author={OLLETT, Andrew}, year={2018}, pages={421–472} }
INSPIREHEP (similarly for GoogleScholar):
@article{Maggiore:2017jjr,
author = "Maggiore, M. and Campo, D. and Antonini, P. and Lombardi, A. and Manzolaro, M. and Andrighetto, A. and Monetti, A. and Scarpa, D. and Esposito, J. and Silvestrin, L.",
title = "{SPES: A new cyclotron-based facility for research and applications with high-intensity beams}",
doi = "10.1142/S0217732317400107",
journal = "Mod. Phys. Lett. A",
volume = "32",
number = "17",
pages = "1740010",
year = "2017"
}
From what I understand, as a result, BibtexParser is not able to parse mEDRA string and returns no entries, while it works correctly in the other case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this works, but the specific authors of the citation contain umlauts which seem to be not correclty returned:
http://doi.org/10.2143/JIABS.41.0.3285748
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I performed several other tests with random mEDRA DOIs retrieved at https://api.medra.org/doiprefixlist?doiPrefix=10.2143 but only 1 over maybe 50 happened to be correctly parsed in bibtex type. Json type, instead, never failed and the only errors was due to the 500 error code returned by mEDRA.
At this point i believe the bibtex style returned by mEDRA could be the issue, but since this is my first contribution I'm really open to any other advice or suggestion you think i could work on to tackle this issue.
I'm committing some refactoring of the previous code along with WebFetchers class modified to show mEDRA agency in the drop down list of the Web Search pane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation! I think it's fine to use the json as workaround then.
From a users perspective, I don't want to know to whom a DOI actually belongs or to manually try another DOI fetcher.
It would be nice if there was some "magic" in the background. I just enter the DOI (no matter who "owns" it ) and I get my bibtex entry from it.
Some suggestions: Implement the IDBasedFetcher interface, it's meant for fetchers which work based on an ID, e.g DOI, ISBN
The SearchBasedFetcher is meant for fetcher where you can enter a search term e.g. "abcdf"
Scenario:
- User enters doi (e.g. new entry from DOI)
- If the DOI starts with medra prefix number => call medra fetcher and return result
else use the normal DOI results.
We have a similar setup for ISBNFetcher, we first check site a and then site b in case the first didn't return anything.
For parsing DOIs we have the DOI.java class.
- Add a method like getPrefix to return the prefix of the DOI and then you could use that in the above scenario
I hope that makes it a bit more clear now. Feel free to ask if you have any further questions or problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Siedlerchr, I really appreciate your suggestions.
Here are some more checks I've performed.
I've figured out that the DoiFetcher.performSearchById was correctly retrieving the bibtex of mEDRA DOIs, even before I made any changes, but cannot parse it for the reason already mentioned. Thus, JabRef not retrieving mEDRA DOIs metadata was not the issue in #6602.
Furthermore, I have not found an easy way of knowing the agency by prefix through mEDRA api without retrieving a list of DOIs registered for a particular prefix and then checking whether this is empty or not, so I don't think a getPrefix method would help.
The most straightforward way seems to be calling https://doi.org/doiRA/10.2143/10.2143/JCS.2.0.583382 (DOI handbook - 5.2.1) which in turn returns a single object array as follow:
[ { "DOI": "10.2143/JCS.2.0.583382", "RA": "mEDRA" } ]
I've then used this and created a getAgency method on DoiFetcher. This in turn would provide the condition to move processing toward mEDRA fetcher.
Last, as suggested I have implemented IdBasedParserFetcher instead of SearchBasedParserFetcher and overridden IdBasedParserFetcher.performSearchByID within Medra class.
I hope my info are clear and if you think this could be a good way of proceeding I would update the PR.
In this case I should remove SearchBasedParserFetcher from the open PR. Can I do delete it directly from the Files Changed view of this PR? Is it the correct way to do it?
Thank you so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, then the DoiFetcher getAgency makes more sense.
Just delete/modify your changes locally and push your new commit with the changes, the PR will then be updated automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this issue. Have you tried to use the application/x-bibtex accept header?
src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java
Outdated
Show resolved
Hide resolved
* @throws IOException | ||
*/ | ||
public String getAgency(DOI doi) throws JSONException, IOException { | ||
String agency = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid null. Instead you can use Optionals.
https://blog.indrek.io/articles/optionals-in-java-8/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be something like this?
public Optional<String> getAgency(DOI doi) throws JSONException, IOException {
Optional<String> agency = Optional.empty();
URLDownload download = new URLDownload(DOI.AGENCY_RESOLVER + "/" + doi.getDOI());
JSONObject response = new JSONArray(download.asString()).getJSONObject(0);
if (response != null) {
agency = Optional.of(response.optString("RA"));
}
return agency;
}
and the condition in performSearchById would then be
if ("medra".equalsIgnoreCase(getAgency(doi.get()).get()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check if the value is present first:
getAgency(...).ifPresent() && getAgency(...).get().equals(medra)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure, and I also needs to change agency = Optional.of(response.optString("RA"));
to agency = Optional.ofNullable(response.optString("RA"));
since optString could return a null.
In general looks already good! 👍 It would also be nice if you could add a/some tests for the new feature |
Looks good so far, I only noticed one failing test, you have to add the Medra Fetcher to the test array in org.jabref.logic.importer.WebFetchersTest |
Ok, I added a test case to DoiFetcher. Passed. Regarding the test array in org.jabref.logic.importer.WebFetchersTest, if I've correctly understood the logic, I think I should remove Medra Fetcher in the below method, instead of adding. In this way tests are passed.
If you think this could be a proper approach, I will commit. |
* <p> | ||
* It requires "Accept" request Header attribute to be set to desired content-type. | ||
*/ | ||
public class Medra implements SearchBasedParserFetcher, IdBasedParserFetcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need the SearchBasedParserFetcher interface?
I think this is superflous. The IDBasedFetcher should be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I forgot to remove it.
In this case I don't see why i should add Medra Fetcher to org.jabref.logic.importer.WebFetchersTest array test, since it seems that this already loads Medra class to expected array of getIdBasedFetchersReturnsAllFetcherDerivingFromIdBasedFetcher()
and the test is passed.
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was probaby the reason for the failing test at first. If it passes now it's fine. Nothing to do then.
Please take a look at the checkstyle issues, than it's ready to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I fixed the style issues and wanted to run ./gradlew checkstyleMain checkstyleTest checkstyleJmh
from terminal to test them before committing but i get:
`Starting a Gradle Daemon (subsequent builds will be faster)
> Configure project :
Found module name 'org.jabref'
> Task :compileJava
org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins
at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201)
at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163)
at org.apache.logging.log4j.plugins.processor.PluginProcessor.process(PluginProcessor.java:91)
at org.gradle.api.internal.tasks.compile.processing.DelegatingProcessor.process(DelegatingProcessor.java:62)`
running with --info gives some more details:
> Task :compileJava
Caching disabled for task ':compileJava' because:
Build cache is disabled
Task ':compileJava' is not up-to-date because:
Task has failed previously.
The input changes require a full rebuild for incremental task ':compileJava'.
Full recompilation is required because no incremental change information is available. This is usually caused by clean builds or changing compiler arguments.
Compiling with JDK Java compiler API.
org.apache.logging.log4j.LoggingException: Unable to create Plugin Service Class org.jabref.gui.logging.plugins.Log4jPlugins
at org.apache.logging.log4j.plugins.processor.PluginProcessor.createSourceFile(PluginProcessor.java:201)
at org.apache.logging.log4j.plugins.processor.PluginProcessor.writeClassFile(PluginProcessor.java:163)
Should I commit anyway and see what the tests say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just commit.
Still checkstyle issues. I'll work on that. |
Are you using Intellij? Then you can import the style and let the IDE format your changes and organize the imports |
Nope, Eclipse. |
For the imports you can use ctr+ shift +o will organize the imports |
medra-fetcher # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! 👍 Although these checkstyle stuff is sometimes annoying, it helps to have consistency across all files and to to avoid unnecessary conflicts due to different formatting.
I hope you nonetheless had a good first experience of contributing.
So, we now wait for a second developer to review your code and then we can merge.
For external contributors we require the review of at least two developers before we merge the code.
I believe this has been a really great lesson for me since I've gone through several never encountered issues. Thanks a lot for your patience and guiding advices. |
Once your PR is merged in master it will be available in the next automatic build of development version, https://builds.jabref.org/master/ |
I think something went wrong with the merge of master into this branch. Need to investigate manually before merging. General remark: The discussion in this PR seems to be long?
|
Hi @koppor, thanks for the feedback. About comments I inserted a few lines in the code where I thought they could have help. Is this you are referring to? About ADR probably I need some more instructions. Is it something I should do on my own? Thanks again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! The code looks already pretty good, and I've only a few minor remarks.
src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java
Outdated
Show resolved
Hide resolved
- Rename getURLForID to getUrlForIdentifier - Shrink Medra fetcher - Shrink CrossRef fetcher
Reastion: It is the common base of both SearchBasedParserFetcher and IdBasedParserFetcher
Improve json reader
} | ||
if (responseStrBuilder.toString().isBlank()) { | ||
throw new ParseException("Empty input!"); | ||
String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String inputStr = new String(ByteStreams.toByteArray(inputStream), Charsets.UTF_8); | |
String inputStr = new String((inputStream.readAllByte()), Charsets.UTF_8); |
This is possible since java 9
Thanks for your contribution @Mind000 and the quick follow up! We hope you weren't frightened by our reviews ;) And we hope you will continue contributing! |
Shit, saw this only after merge: |
Anything I should do? |
The idea behind our High-level architcutre is to have the model as model package. We are aware this leads to an anemic domain model. Not sure whether downloading things should be part of a model. Think, this is a devcall thing. We should discuss whether we want to "undo" the seperation of model and logic. What are our "limits" in model? @stefan-kolb |
as the build is broken now I would vote to simply change it back to the original, e.g. move the agency fetching back to the fetcher. |
* upstream/master: (243 commits) fix checkstyle mEDRA DOI fetcher implementation. (#6641) Bump bcprov-jdk15on from 1.65.01 to 1.66 (#6676) Bump appleboy/ssh-action from v0.0.6 to v0.1.2 (#6674) Add localisation strings and renamed formatter Bump gittools/actions from v0.9.2 to v0.9.3 (#6675) Skip non-working sourcespy site Let dependabot update github actions Add three formatters to fix new lines in abstract and digits in editors Update src/test/java/org/jabref/logic/importer/fetcher/CollectionOfComputerScienceBibliographiesParserTest.java Fix to imports Make fetcher test more specific by checking each field explicitly Use builder instead of "setField" statements (docs) fix styling Update jpackage notes Fix automerge workflow fix markdown Bump org.beryx.jlink from 2.20.0 to 2.21.0 Bump unirest-java from 3.8.00 to 3.8.06 Fix to dependency on Global ...
What should I do now? |
@Mind000 Just create a new PR where you move the DOI agency getting back to the DOIFetcher class. That is the simplest solution and this would not destroy our architecture constraints |
So, if I got it right, I have to push the same files as the last commit I made, changing only the getAgency location. |
as the code is already merged into master
you fetch and pull the upstream master in your local master
then you create a new branch and then you just move the getAgency back to
DoiFetcher.
Then you commit your changes and push your new branch and create a new PR
Am Mo., 13. Juli 2020 um 09:02 Uhr schrieb Giovanni Caldarola <
[email protected]>:
… So, if I got it right, I have to push the same files as the last commit I
made, changing only the getAgency location.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#6641 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACOFZB6KFRY2YWRD7KQBYTR3KWRRANCNFSM4OJDHZDA>
.
|
For internal referencre: follow-up PR is #6683. Thus, everyhting should be fine now. |
fixes #6602
I added the possibility to retrieve DOI metadata from mEDRA agency .
Some DOIs return 500 code even if they are correctly registered at mEDRA. I'm waiting for the mEDRA team to answer on that.