Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new parser for MathSciNet search #11055

Merged
merged 27 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b22a6ff
Fix MathSciNet parser
subhramit Mar 19, 2024
8ece060
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 19, 2024
3de80ef
Conformed with stylechecks
subhramit Mar 19, 2024
bff1b11
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 19, 2024
3dc1f55
Fix Objects import missing
subhramit Mar 19, 2024
ec1b30e
Run gradle reWriteRun
subhramit Mar 19, 2024
95ac88c
Update BibEntry type set (via constructor)
subhramit Mar 20, 2024
8b6317d
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 20, 2024
a04b364
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 20, 2024
e1e1daa
Apply review changes
subhramit Mar 20, 2024
e6a2ec7
Update value:String to value:Optional<String>
subhramit Mar 20, 2024
4997846
Change value setting to lambda form
subhramit Mar 20, 2024
bd801b6
Update missing Optional.of()
subhramit Mar 20, 2024
021fd2f
Update instanceof pattern matching syntax, removed explicit casts
subhramit Mar 20, 2024
6d70470
Merge branch 'main' of https://github.com/subhramit/jabref into fix-f…
subhramit Mar 20, 2024
dd32e6a
applied second round of review changes
subhramit Mar 21, 2024
d496968
Merge branch 'JabRef:main' into fix-for-issue-10996
subhramit Mar 21, 2024
aafc8aa
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 21, 2024
4e154bd
Merge branch 'main' of https://github.com/JabRef/jabref
subhramit Mar 21, 2024
9a119ba
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 21, 2024
25afe1c
Merge branch 'JabRef:main' into fix-for-issue-10996
subhramit Mar 21, 2024
a6299eb
Changes as per third review round
subhramit Mar 21, 2024
d1709ca
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 21, 2024
4eb516b
Merge branch 'main' of https://github.com/JabRef/jabref
subhramit Mar 21, 2024
cde25ee
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 21, 2024
95b9288
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 21, 2024
73c165d
Readd bibtex parsing
Siedlerchr Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
- We fixed an issue where JabRef could not parse absolute file paths from Zotero exports. [#10959](https://github.com/JabRef/jabref/issues/10959)
- We fixed an issue where an exception occured when toggling between "Live" or "Locked" in the internal Document Viewer. [#10935](https://github.com/JabRef/jabref/issues/10935)
- Fixed an issue on Windows where the browser extension reported failure to send an entry to JabRef even though it was sent properly. [JabRef-Browser-Extension#493](https://github.com/JabRef/JabRef-Browser-Extension/issues/493)
- We fixed an issue with where JabRef would throw an error when using MathSciNet search, as it was unable to parse the fetched JSON coreectly. [10996](https://github.com/JabRef/jabref/issues/10996)

### Removed

Expand Down
127 changes: 113 additions & 14 deletions src/main/java/org/jabref/logic/importer/fetcher/MathSciNet.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,14 @@
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.jabref.logic.cleanup.DoiCleanup;
import org.jabref.logic.cleanup.FieldFormatterCleanup;
Expand All @@ -23,20 +26,19 @@
import org.jabref.logic.importer.Parser;
import org.jabref.logic.importer.SearchBasedParserFetcher;
import org.jabref.logic.importer.fetcher.transformers.DefaultQueryTransformer;
import org.jabref.logic.importer.fileformat.BibtexParser;
import org.jabref.logic.util.OS;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.AMSField;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.field.UnknownField;
import org.jabref.model.util.DummyFileUpdateMonitor;
import org.jabref.model.entry.types.StandardEntryType;

import kong.unirest.JsonNode;
import kong.unirest.json.JSONArray;
import kong.unirest.json.JSONException;
import kong.unirest.json.JSONObject;
import org.apache.http.client.utils.URIBuilder;
import org.apache.lucene.queryparser.flexible.core.nodes.QueryNode;
import org.jbibtex.TokenMgrException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand All @@ -46,7 +48,18 @@
public class MathSciNet implements SearchBasedParserFetcher, EntryBasedParserFetcher, IdBasedParserFetcher {
private static final Logger LOGGER = LoggerFactory.getLogger(MathSciNet.class);
private final ImportFormatPreferences preferences;

// Define the field mappings
private final Map<StandardField, List<String>> fieldMappings = Map.ofEntries(
Map.entry(StandardField.TITLE, List.of("titles", "title")),
Map.entry(StandardField.AUTHOR, List.of("authors")),
Map.entry(StandardField.YEAR, List.of("issue", "issue", "pubYear")),
Map.entry(StandardField.JOURNAL, List.of("issue", "issue", "journal", "shortTitle")),
Map.entry(StandardField.VOLUME, List.of("issue", "issue", "volume")),
Map.entry(StandardField.NUMBER, List.of("issue", "issue", "number")),
Map.entry(StandardField.PAGES, List.of("paging", "paging", "text")),
Map.entry(StandardField.KEYWORDS, List.of("primaryClass")),
Map.entry(StandardField.ISSN, List.of("issue", "issue", "journal", "issn"))
);
public MathSciNet(ImportFormatPreferences preferences) {
this.preferences = Objects.requireNonNull(preferences);
}
Expand Down Expand Up @@ -102,34 +115,121 @@ public URL getUrlForIdentifier(String identifier) throws URISyntaxException, Mal
public Parser getParser() {
return inputStream -> {
String response = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(OS.NEWLINE));
BibtexParser bibtexParser = new BibtexParser(preferences, new DummyFileUpdateMonitor());

List<BibEntry> entries = new ArrayList<>();

try {
// Depending on the type of query we might get either a json object or directly a json array
JsonNode node = new JsonNode(response);

if (node.isArray()) {
JSONArray entriesArray = node.getArray();
for (int i = 0; i < entriesArray.length(); i++) {
String bibTexFormat = entriesArray.getJSONObject(i).getString("bib");
entries.addAll(bibtexParser.parseEntries(bibTexFormat));
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
} else {
var element = node.getObject();
JSONArray entriesArray = element.getJSONObject("all").getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
String bibTexFormat = entriesArray.getJSONObject(i).getString("bibTexFormat");
entries.addAll(bibtexParser.parseEntries(bibTexFormat));

if (element.has("all")) {
JSONArray entriesArray = element.getJSONObject("all").getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
} else if (element.has("results")) {
JSONArray entriesArray = element.getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
}
}
} catch (JSONException | TokenMgrException e) {
} catch (JSONException | ParseException e) {
LOGGER.error("An error occurred while parsing fetched data", e);
throw new ParseException("Error when parsing entry", e);
}
return entries;
};
}

private BibEntry jsonItemToBibEntry(JSONObject item) throws ParseException {
try {
BibEntry entry = new BibEntry(StandardEntryType.Article);
// Set fields based on the mappings
for (Map.Entry<StandardField, List<String>> mapEntry : fieldMappings.entrySet()) {
StandardField field = mapEntry.getKey();
List<String> path = mapEntry.getValue();

Optional<String> value;
if (field == StandardField.AUTHOR) {
value = toAuthors(item.optJSONArray(path.getFirst()));
} else if (field == StandardField.KEYWORDS) {
value = Optional.of(getKeywords(item.optJSONObject(path.getFirst())));
} else {
value = getOrNull(item, path).orElse(null);
}

value.ifPresent(v -> entry.setField(field, v));
}
// Handle articleUrl and mrnumber fields separately
String doi = item.optString("articleUrl", "");
if (!doi.isEmpty()) {
entry.setField(StandardField.DOI, doi);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to route through DOI#parse. If that optional is present, use doi.getNormalized(), otherwise use the doi variable. In That way, the http prefix is removed if it is a valid doi, but the full string is kept in case of a DOI parsing error.

}

String mrNumber = item.optString("mrnumber", "");
if (!mrNumber.isEmpty()) {
entry.setField(StandardField.MR_NUMBER, mrNumber);
}
return entry;
} catch (JSONException exception) {
throw new ParseException("MathSciNet API JSON format has changed", exception);
}
}

private Optional<String> getOrNull(JSONObject item, List<String> keys) {
Object value = item;
for (String key : keys) {
if (value instanceof JSONObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (value instanceof JSONObject) {
if (value instanceof JSONObject obj) {
....

You can use the new instance of pattern matching syntax, which makes the extra casting step necessary
https://docs.oracle.com/en/java/javase/17/language/pattern-matching-instanceof-operator.html#GUID-843060B5-240C-4F47-A7B0-95C42E5B08A7

Copy link
Member Author

@subhramit subhramit Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I just went through the docs. If I'm not wrong, did you mean that it would make the extra casting step "unnecessary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the casts and followed the new syntax. It is odd that IntelliJ is giving me red squiggly lines and a suggestion to cast it back. Will ignore for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay nope, I added the obj syntax but seems like the cast is necessary, else I'm getting this compilation error:
error: cannot find symbol
value = value.opt(key);
^
symbol: method opt(String)
location: variable value of type Object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the examples in the linked docs, you need to use value = obj.get... After the if

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

value = ((JSONObject) value).opt(key);
} else if (value instanceof JSONArray) {
value = ((JSONArray) value).opt(Integer.parseInt(key));
} else {
break;
}
}

if (value instanceof String stringValue) {
return Optional.of(new String(stringValue.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8));
}

return Optional.empty();
}

private String toAuthors(JSONArray authors) {
if (authors == null) {
return "";
}

return IntStream.range(0, authors.length())
.mapToObj(authors::getJSONObject)
.map(author -> {
String name = author.optString("name", "");
return new String(name.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
})
.collect(Collectors.joining(" and "));
}

private String getKeywords(JSONObject primaryClass) {
if (primaryClass == null) {
return "";
}
return primaryClass.optString("description", "");
}

@Override
public void doPostCleanup(BibEntry entry) {
new MoveFieldCleanup(AMSField.FJOURNAL, StandardField.JOURNAL).cleanup(entry);
Expand All @@ -142,4 +242,3 @@ public void doPostCleanup(BibEntry entry) {
entry.setCommentsBeforeEntry("");
}
}

Loading
Loading