Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new parser for MathSciNet search #11055

Merged
merged 27 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b22a6ff
Fix MathSciNet parser
subhramit Mar 19, 2024
8ece060
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 19, 2024
3de80ef
Conformed with stylechecks
subhramit Mar 19, 2024
bff1b11
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 19, 2024
3dc1f55
Fix Objects import missing
subhramit Mar 19, 2024
ec1b30e
Run gradle reWriteRun
subhramit Mar 19, 2024
95ac88c
Update BibEntry type set (via constructor)
subhramit Mar 20, 2024
8b6317d
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 20, 2024
a04b364
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 20, 2024
e1e1daa
Apply review changes
subhramit Mar 20, 2024
e6a2ec7
Update value:String to value:Optional<String>
subhramit Mar 20, 2024
4997846
Change value setting to lambda form
subhramit Mar 20, 2024
bd801b6
Update missing Optional.of()
subhramit Mar 20, 2024
021fd2f
Update instanceof pattern matching syntax, removed explicit casts
subhramit Mar 20, 2024
6d70470
Merge branch 'main' of https://github.com/subhramit/jabref into fix-f…
subhramit Mar 20, 2024
dd32e6a
applied second round of review changes
subhramit Mar 21, 2024
d496968
Merge branch 'JabRef:main' into fix-for-issue-10996
subhramit Mar 21, 2024
aafc8aa
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 21, 2024
4e154bd
Merge branch 'main' of https://github.com/JabRef/jabref
subhramit Mar 21, 2024
9a119ba
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 21, 2024
25afe1c
Merge branch 'JabRef:main' into fix-for-issue-10996
subhramit Mar 21, 2024
a6299eb
Changes as per third review round
subhramit Mar 21, 2024
d1709ca
Merge branch 'main' of https://github.com/JabRef/jabref into fix-for-…
subhramit Mar 21, 2024
4eb516b
Merge branch 'main' of https://github.com/JabRef/jabref
subhramit Mar 21, 2024
cde25ee
Merge branch 'main' into fix-for-issue-10996
subhramit Mar 21, 2024
95b9288
Merge branch 'fix-for-issue-10996' of https://github.com/subhramit/ja…
subhramit Mar 21, 2024
73c165d
Readd bibtex parsing
Siedlerchr Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
- We fixed an issue where JabRef could not parse absolute file paths from Zotero exports. [#10959](https://github.com/JabRef/jabref/issues/10959)
- We fixed an issue where an exception occured when toggling between "Live" or "Locked" in the internal Document Viewer. [#10935](https://github.com/JabRef/jabref/issues/10935)
- Fixed an issue on Windows where the browser extension reported failure to send an entry to JabRef even though it was sent properly. [JabRef-Browser-Extension#493](https://github.com/JabRef/JabRef-Browser-Extension/issues/493)
- We fixed an issue with where JabRef would throw an error when using MathSciNet search, as it was unable to parse the fetched JSON coreectly. [10996](https://github.com/JabRef/jabref/issues/10996)

### Removed

Expand Down
143 changes: 130 additions & 13 deletions src/main/java/org/jabref/logic/importer/fetcher/MathSciNet.java
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,14 @@
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.jabref.logic.cleanup.DoiCleanup;
import org.jabref.logic.cleanup.FieldFormatterCleanup;
Expand All @@ -23,20 +26,19 @@
import org.jabref.logic.importer.Parser;
import org.jabref.logic.importer.SearchBasedParserFetcher;
import org.jabref.logic.importer.fetcher.transformers.DefaultQueryTransformer;
import org.jabref.logic.importer.fileformat.BibtexParser;
import org.jabref.logic.util.OS;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.AMSField;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.field.UnknownField;
import org.jabref.model.util.DummyFileUpdateMonitor;
import org.jabref.model.entry.types.StandardEntryType;

import kong.unirest.JsonNode;
import kong.unirest.json.JSONArray;
import kong.unirest.json.JSONException;
import kong.unirest.json.JSONObject;
import org.apache.http.client.utils.URIBuilder;
import org.apache.lucene.queryparser.flexible.core.nodes.QueryNode;
import org.jbibtex.TokenMgrException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand All @@ -45,6 +47,19 @@
*/
public class MathSciNet implements SearchBasedParserFetcher, EntryBasedParserFetcher, IdBasedParserFetcher {
private static final Logger LOGGER = LoggerFactory.getLogger(MathSciNet.class);

private static final Map<StandardField, List<String>> FIELD_MAPPINGS = Map.of(
StandardField.TITLE, List.of("titles", "title"),
StandardField.AUTHOR, List.of("authors"),
StandardField.YEAR, List.of("issue", "issue", "pubYear"),
StandardField.JOURNAL, List.of("issue", "issue", "journal", "shortTitle"),
StandardField.VOLUME, List.of("issue", "issue", "volume"),
StandardField.NUMBER, List.of("issue", "issue", "number"),
StandardField.PAGES, List.of("paging", "paging", "text"),
StandardField.KEYWORDS, List.of("primaryClass"),
StandardField.ISSN, List.of("issue", "issue", "journal", "issn")
);

private final ImportFormatPreferences preferences;

public MathSciNet(ImportFormatPreferences preferences) {
Expand Down Expand Up @@ -102,34 +117,137 @@ public URL getUrlForIdentifier(String identifier) throws URISyntaxException, Mal
public Parser getParser() {
return inputStream -> {
String response = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(OS.NEWLINE));
BibtexParser bibtexParser = new BibtexParser(preferences, new DummyFileUpdateMonitor());

List<BibEntry> entries = new ArrayList<>();

try {
// Depending on the type of query we might get either a json object or directly a json array
JsonNode node = new JsonNode(response);

if (node.isArray()) {
JSONArray entriesArray = node.getArray();
for (int i = 0; i < entriesArray.length(); i++) {
String bibTexFormat = entriesArray.getJSONObject(i).getString("bib");
entries.addAll(bibtexParser.parseEntries(bibTexFormat));
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
} else {
var element = node.getObject();
JSONArray entriesArray = element.getJSONObject("all").getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
String bibTexFormat = entriesArray.getJSONObject(i).getString("bibTexFormat");
entries.addAll(bibtexParser.parseEntries(bibTexFormat));

if (element.has("all")) {
JSONArray entriesArray = element.getJSONObject("all").getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
} else if (element.has("results")) {
JSONArray entriesArray = element.getJSONArray("results");
for (int i = 0; i < entriesArray.length(); i++) {
JSONObject entryObject = entriesArray.getJSONObject(i);
BibEntry bibEntry = jsonItemToBibEntry(entryObject);
entries.add(bibEntry);
}
}
}
} catch (JSONException | TokenMgrException e) {
} catch (JSONException | ParseException e) {
LOGGER.error("An error occurred while parsing fetched data", e);
throw new ParseException("Error when parsing entry", e);
}
return entries;
};
}

private BibEntry jsonItemToBibEntry(JSONObject item) throws ParseException {
try {
BibEntry entry = new BibEntry(StandardEntryType.Article);

// Set the author and keywords field
Optional<String> authors = toAuthors(item.optJSONArray("authors"));
authors.ifPresent(value -> entry.setField(StandardField.AUTHOR, value));

Optional<String> keywords = Optional.ofNullable(getKeywords(item.optJSONObject("primaryClass")));
keywords.ifPresent(value -> entry.setField(StandardField.KEYWORDS, value));

// Set the rest of the fields based on the mappings
for (Map.Entry<StandardField, List<String>> mapEntry : FIELD_MAPPINGS.entrySet()) {
StandardField field = mapEntry.getKey();
List<String> path = mapEntry.getValue();

// Skip author and keywords fields as they are already set
if (field == StandardField.AUTHOR || field == StandardField.KEYWORDS) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove StandardField.AUTHOR and StandardField.KEYWORDS from FIELD_MAPPING. Think, this is the only "use" in the code, which savely can be removed.


Optional<String> value = getOthers(item, path);
value.ifPresent(v -> entry.setField(field, v));
}

// Handle articleUrl and mrnumber fields separately, as they are non-nested properties in the JSON and can be retrieved as Strings directly
String doi = item.optString("articleUrl", "");
if (!doi.isEmpty()) {
entry.setField(StandardField.DOI, doi);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to route through DOI#parse. If that optional is present, use doi.getNormalized(), otherwise use the doi variable. In That way, the http prefix is removed if it is a valid doi, but the full string is kept in case of a DOI parsing error.

}

String mrNumber = item.optString("mrnumber", "");
if (!mrNumber.isEmpty()) {
entry.setField(StandardField.MR_NUMBER, mrNumber);
}

return entry;
} catch (JSONException exception) {
throw new ParseException("MathSciNet API JSON format has changed", exception);
}
}

private Optional<String> toAuthors(JSONArray authors) {
if (authors == null) {
return Optional.empty();
}

String authorsString = IntStream.range(0, authors.length())
.mapToObj(authors::getJSONObject)
.map(author -> {
String name = author.optString("name", "");
return fixStringEncoding(name);
})
.collect(Collectors.joining(" and "));

return Optional.of(authorsString);
}

private String getKeywords(JSONObject primaryClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also use the Optional<String> thing you did at toAuhtors. Because of consistency.

if (primaryClass == null) {
return "";
}
return primaryClass.optString("description", "");
}

private Optional<String> getOthers(JSONObject item, List<String> keys) {
Object value = item;
for (String key : keys) {
if (value instanceof JSONObject obj) {
value = ((JSONObject) value).opt(key);
} else if (value instanceof JSONArray) {
value = ((JSONArray) value).opt(Integer.parseInt(key));
} else {
break;
}
}

if (value instanceof String stringValue) {
return Optional.of(fixStringEncoding(stringValue));
} else if (value instanceof Integer intValue) {
return Optional.of(intValue.toString());
}

return Optional.empty();
}

// Method to change character set, to fix output string encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to full JavaDoc:

/**
 * ... text ...
 */

private String fixStringEncoding(String value) {
return new String(value.getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
}

@Override
public void doPostCleanup(BibEntry entry) {
new MoveFieldCleanup(AMSField.FJOURNAL, StandardField.JOURNAL).cleanup(entry);
Expand All @@ -142,4 +260,3 @@ public void doPostCleanup(BibEntry entry) {
entry.setCommentsBeforeEntry("");
}
}

35 changes: 24 additions & 11 deletions src/test/java/org/jabref/logic/importer/fetcher/MathSciNetTest.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
package org.jabref.logic.importer.fetcher;

import java.util.Collections;
import java.io.InputStream;
import java.util.List;
import java.util.Optional;

import org.jabref.logic.importer.ImportFormatPreferences;
import org.jabref.model.entry.BibEntry;
Expand All @@ -22,15 +21,13 @@

@FetcherTest
class MathSciNetTest {

MathSciNet fetcher;
private BibEntry ratiuEntry;

@BeforeEach
void setUp() throws Exception {
ImportFormatPreferences importFormatPreferences = mock(ImportFormatPreferences.class, Answers.RETURNS_DEEP_STUBS);
when(importFormatPreferences.bibEntryPreferences().getKeywordSeparator()).thenReturn(',');

fetcher = new MathSciNet(importFormatPreferences);

ratiuEntry = new BibEntry();
Expand All @@ -43,9 +40,9 @@ void setUp() throws Exception {
ratiuEntry.setField(StandardField.YEAR, "2016");
ratiuEntry.setField(StandardField.NUMBER, "3");
ratiuEntry.setField(StandardField.PAGES, "571--589");
ratiuEntry.setField(StandardField.ISSN, "1422-6928,1422-6952");
ratiuEntry.setField(StandardField.KEYWORDS, "76A15 (35A01 35A02 35K61 82D30)");
ratiuEntry.setField(StandardField.MR_NUMBER, "3537908");
ratiuEntry.setField(StandardField.ISSN, "1422-6928, 1422-6952");
ratiuEntry.setField(StandardField.DOI, "10.1007/s00021-016-0250-0");
}

Expand All @@ -57,7 +54,7 @@ void searchByEntryFindsEntry() throws Exception {
searchEntry.setField(StandardField.JOURNAL, "fluid");

List<BibEntry> fetchedEntries = fetcher.performSearch(searchEntry);
assertEquals(Collections.singletonList(ratiuEntry), fetchedEntries);
assertEquals(List.of(ratiuEntry), fetchedEntries);
}

@Test
Expand All @@ -67,7 +64,7 @@ void searchByIdInEntryFindsEntry() throws Exception {
searchEntry.setField(StandardField.MR_NUMBER, "3537908");

List<BibEntry> fetchedEntries = fetcher.performSearch(searchEntry);
assertEquals(Collections.singletonList(ratiuEntry), fetchedEntries);
assertEquals(List.of(ratiuEntry), fetchedEntries);
}

@Test
Expand All @@ -79,9 +76,25 @@ void searchByQueryFindsEntry() throws Exception {
}

@Test
@DisabledOnCIServer("CI server has no subscription to MathSciNet and thus gets 401 response")
void searchByIdFindsEntry() throws Exception {
Optional<BibEntry> fetchedEntry = fetcher.performSearchById("3537908");
assertEquals(Optional.of(ratiuEntry), fetchedEntry);
void getParser() throws Exception {
String fileName = "/importer/fetcher/jsonTest.json";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java has a special folder for resources src/main/resoures and for test src/test/resources See the other importer test files there

try (InputStream is = MathSciNetTest.class.getResourceAsStream(fileName)) {
List<BibEntry> entries = fetcher.getParser().parseEntries(is);

assertEquals(List.of(
new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "On the weights of general MDS codes")
.withField(StandardField.AUTHOR, "Alderson, Tim L.")
.withField(StandardField.YEAR, "2020")
.withField(StandardField.JOURNAL, "IEEE Trans. Inform. Theory")
.withField(StandardField.VOLUME, "66")
.withField(StandardField.NUMBER, "9")
.withField(StandardField.PAGES, "5414--5418")
.withField(StandardField.MR_NUMBER, "4158623")
.withField(StandardField.KEYWORDS, "Bounds on codes")
.withField(StandardField.DOI, "https://doi.org/10.1109/TIT.2020.2977319")
.withField(StandardField.ISSN, "0018-9448")
), entries);
}
}
}
Loading
Loading