Fix more fetchers (#6790)

JabRef · Aug 26, 2020 · d052170 · d052170
1 parent 92161af
commit d052170
Show file tree

Hide file tree

Showing 14 changed files with 262 additions and 107 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -106,6 +106,7 @@ Note that this project **does not** adhere to [Semantic Versioning](http://semve
 - We fixed an issue where percent sign ('%') was not formatted properly by the HTML formatter [#6753](https://github.com/JabRef/jabref/issues/6753)
 - We fixed an issue with the [SAO/NASA Astrophysics Data System](https://docs.jabref.org/collect/import-using-online-bibliographic-database/ads) fetcher where `\textbackslash` appeared at the end of the abstract.
 - We fixed an issue with the Science Direct fetcher where PDFs could not be downloaded. Fixes [#5860](https://github.com/JabRef/jabref/issues/5860)
+- We fixed an issue with the Library of Congress importer.
 
 ### Removed
 

diff --git a/docs/adr/0014-separate-URL-creation-to-enable-proper-logging.md b/docs/adr/0014-separate-URL-creation-to-enable-proper-logging.md
@@ -0,0 +1,104 @@
+# Separate URL creation to enable proper logging
+
+## Context and Problem Statement
+
+Fetchers are failing.
+The reason why they are failing needs to be investigated.
+
+* Claim 1: Knowing the URL which was used to query the fetcher eases debugging
+* Claim 2: Somehow logging the URL eases debugging (instead of showing it in the debugger only)
+
+How to properly log the URL used for fetching?
+
+## Decision Drivers
+
+* Code should be easy to read
+* Include URL in the exception instead of logging in case an exception is thrown already (see <https://howtodoinjava.com/best-practices/java-exception-handling-best-practices/#6>)
+
+## Considered Options
+
+* Separate URL creation
+* Create URL when logging the URL
+* Include URL creation as statement before the stream creation in the try-with-resources block
+
+## Decision Outcome
+
+Chosen option: "Separate URL creation", because comes out best \(see below\).
+
+## Pros and Cons of the Options
+
+### Separate URL creation
+
+```java
+    URL urlForQuery;
+    try {
+        urlForQuery = getURLForQuery(query);
+    } catch (URISyntaxException | MalformedURLException | FetcherException e) {
+        throw new FetcherException(String.format("Search URI %s is malformed", query), e);
+    }
+    try (InputStream stream = getUrlDownload(complexQueryURL).asInputStream()) {
+        ...
+    } catch (IOException e) {
+        throw new FetcherException("A network error occurred while fetching from " + urlForQuery.toString(), e);
+    } catch (ParseException e) {
+        throw new FetcherException("An internal parser error occurred while fetching from " + urlForQuery.toString(), e);
+    }
+```
+
+* Good, because exceptions thrown at method are directly catched
+* Good, because exceptions in different statements belong to different catch blocks
+* Good, because code to determine URL is written once
+* OK, because "Java by Comparison" does not state anything about it
+* Bad, because multiple try/catch statements are required
+* Bad, because this style seems to be uncommon to Java coders
+
+### Create URL when logging the URL
+
+The "logging" is done when throwing the exception.
+
+Example code:
+
+```java
+    try (InputStream stream = getUrlDownload(getURLForQuery(query)).asInputStream()) {
+        ...
+    } catch (URISyntaxException | MalformedURLException | FetcherException e) {
+        throw new FetcherException(String.format("Search URI %s is malformed", query), e);
+    } catch (IOException e) {
+        try {
+            throw new FetcherException("A network error occurred while fetching from " + getURLForQuery(query), e);
+        } catch (URISyntaxException | MalformedURLException uriSyntaxException) {
+            // does not happen
+            throw new FetcherException("A network error occurred", e);
+        }
+    } catch (ParseException e) {
+        try {
+            throw new FetcherException("An internal parser error occurred while fetching from " + getURLForQuery(query), e);
+        } catch (URISyntaxException | MalformedURLException uriSyntaxException) {
+            // does not happen
+            throw new FetcherException("An internal parser error occurred", e);
+        }
+    }
+```
+
+* Good, because code inside the `try` statement stays the same
+* OK, because "Java by Comparison" does not state anything about it
+* Bad, because an additional try/catch-block is added to each catch statement
+* Bad, because needs a `throw` statement in the `URISyntaxException` catch block (even though at this point the exception cannot be thrown), because Java otherwise misses a `return` statement.
+
+### Include URL creation as statement before the stream creation in the try-with-resources block
+
+```java
+    try (URL urlForQuery = getURLForQuery(query); InputStream stream = urlForQuery.asInputStream()) {
+        ...
+    } catch (URISyntaxException | MalformedURLException | FetcherException e) {
+        throw new FetcherException(String.format("Search URI %s is malformed", query), e);
+    } catch (IOException e) {
+        throw new FetcherException("A network error occurred while fetching from " + urlForQuery.toString(), e);
+    } catch (ParseException e) {
+        throw new FetcherException("An internal parser error occurred while fetching from " + urlForQuery.toString(), e);
+    }
+```
+
+* Good, because the single try/catch-block can be kept
+* Good, because logical flow is kept
+* Bad, because does not compile (because URL is not an `AutoClosable`)
diff --git a/src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java b/src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java
@@ -15,9 +15,11 @@
 
 /**
  * Provides a convenient interface for search-based fetcher, which follow the usual three-step procedure:
- * 1. Open a URL based on the search query
- * 2. Parse the response to get a list of {@link BibEntry}
- * 3. Post-process fetched entries
+ * <ol>
+ *     <li>Open a URL based on the search query</li>
+ *     <li>Parse the response to get a list of {@link BibEntry}</li>
+ *     <li>Post-process fetched entries</li>
+ * </ol>
  */
 public interface SearchBasedParserFetcher extends SearchBasedFetcher {
 
@@ -39,21 +41,14 @@ default List<BibEntry> performSearch(String query) throws FetcherException {
             return Collections.emptyList();
         }
 
-        try (InputStream stream = getUrlDownload(getURLForQuery(query)).asInputStream()) {
-            List<BibEntry> fetchedEntries = getParser().parseEntries(stream);
-
-            // Post-cleanup
-            fetchedEntries.forEach(this::doPostCleanup);
-
-            return fetchedEntries;
-        } catch (URISyntaxException e) {
-            throw new FetcherException("Search URI is malformed", e);
-        } catch (IOException e) {
-            // TODO: Catch HTTP Response 401/403 errors and report that user has no rights to access resource
-            throw new FetcherException("A network error occurred", e);
-        } catch (ParseException e) {
-            throw new FetcherException("An internal parser error occurred", e);
+        // ADR-0014
+        URL urlForQuery;
+        try {
+            urlForQuery = getURLForQuery(query);
+        } catch (URISyntaxException | MalformedURLException | FetcherException e) {
+            throw new FetcherException(String.format("Search URI crafted from query %s is malformed", query), e);
         }
+        return getBibEntries(urlForQuery);
     }
 
     /**
@@ -65,17 +60,25 @@ default List<BibEntry> performSearch(String query) throws FetcherException {
      */
     @Override
     default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
-        try (InputStream stream = getUrlDownload(getComplexQueryURL(complexSearchQuery)).asInputStream()) {
+        // ADR-0014
+        URL urlForQuery;
+        try {
+            urlForQuery = getComplexQueryURL(complexSearchQuery);
+        } catch (URISyntaxException | MalformedURLException | FetcherException e) {
+            throw new FetcherException("Search URI crafted from complex search query is malformed", e);
+        }
+        return getBibEntries(urlForQuery);
+    }
+
+    private List<BibEntry> getBibEntries(URL urlForQuery) throws FetcherException {
+        try (InputStream stream = getUrlDownload(urlForQuery).asInputStream()) {
             List<BibEntry> fetchedEntries = getParser().parseEntries(stream);
             fetchedEntries.forEach(this::doPostCleanup);
             return fetchedEntries;
-        } catch (URISyntaxException e) {
-            throw new FetcherException("Search URI is malformed", e);
         } catch (IOException e) {
-            // TODO: Catch HTTP Response 401/403 errors and report that user has no rights to access resource
-            throw new FetcherException("A network error occurred", e);
+            throw new FetcherException("A network error occurred while fetching from " + urlForQuery, e);
         } catch (ParseException e) {
-            throw new FetcherException("An internal parser error occurred", e);
+            throw new FetcherException("An internal parser error occurred while fetching from " + urlForQuery, e);
         }
     }
 
@@ -86,15 +89,16 @@ default URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URI
 
     /**
      * Performs a cleanup of the fetched entry.
-     *
+     * <p>
      * Only systematic errors of the fetcher should be corrected here
      * (i.e. if information is consistently contained in the wrong field or the wrong format)
      * but not cosmetic issues which may depend on the user's taste (for example, LateX code vs HTML in the abstract).
-     *
+     * <p>
      * Try to reuse existing {@link Formatter} for the cleanup. For example,
      * {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
-     *
+     * <p>
      * By default, no cleanup is done.
+     *
      * @param entry the entry to be cleaned-up
      */
     default void doPostCleanup(BibEntry entry) {

diff --git a/...a/org/jabref/logic/importer/fetcher/CollectionOfComputerScienceBibliographiesFetcher.java b/...a/org/jabref/logic/importer/fetcher/CollectionOfComputerScienceBibliographiesFetcher.java
@@ -3,6 +3,7 @@
 import java.net.MalformedURLException;
 import java.net.URISyntaxException;
 import java.net.URL;
+import java.util.Arrays;
 
 import org.jabref.logic.formatter.bibtexfields.RemoveDigitsFormatter;
 import org.jabref.logic.formatter.bibtexfields.RemoveNewlinesFormatter;
@@ -14,7 +15,10 @@
 import org.jabref.logic.importer.SearchBasedParserFetcher;
 import org.jabref.model.cleanup.FieldFormatterCleanup;
 import org.jabref.model.entry.BibEntry;
+import org.jabref.model.entry.field.Field;
+import org.jabref.model.entry.field.FieldFactory;
 import org.jabref.model.entry.field.StandardField;
+import org.jabref.model.entry.field.UnknownField;
 
 import org.apache.http.client.utils.URIBuilder;
 
@@ -31,10 +35,10 @@ public CollectionOfComputerScienceBibliographiesFetcher(ImportFormatPreferences
     @Override
     public URL getURLForQuery(String query) throws URISyntaxException, MalformedURLException, FetcherException {
         return new URIBuilder(BASIC_SEARCH_URL)
-            .addParameter("query", query)
-            .addParameter("sort", "score")
-            .build()
-            .toURL();
+                .addParameter("query", query)
+                .addParameter("sort", "score")
+                .build()
+                .toURL();
     }
 
     @Override
@@ -53,5 +57,32 @@ public void doPostCleanup(BibEntry entry) {
         new FieldFormatterCleanup(StandardField.ABSTRACT, new ReplaceTabsBySpaceFormater()).cleanup(entry);
         new FieldFormatterCleanup(StandardField.ABSTRACT, new RemoveRedundantSpacesFormatter()).cleanup(entry);
         new FieldFormatterCleanup(StandardField.EDITOR, new RemoveDigitsFormatter()).cleanup(entry);
+        // identifier fields is a key-value field
+        // example: "urn:isbn:978-1-4503-5217-8; doi:10.1145/3129790.3129810; ISI:000505046100032; Scopus 2-s2.0-85037741580"
+        // thus, key can contain multiple ":"; sometimes value separated by " " instead of ":"
+        UnknownField identifierField = new UnknownField("identifier");
+        entry.getField(identifierField)
+             .stream()
+             .flatMap(value -> Arrays.stream(value.split("; ")))
+             .forEach(identifierKeyValue -> {
+                 // check for pattern "Scopus 2-..."
+                 String[] identifierKeyValueSplit = identifierKeyValue.split(" ");
+                 if (identifierKeyValueSplit.length == 1) {
+                     // check for pattern "doi:..."
+                     identifierKeyValueSplit = identifierKeyValue.split(":");
+                 }
+                 int length = identifierKeyValueSplit.length;
+                 if (length < 2) {
+                     return;
+                 }
+                 // in the case "urn:isbn:", just "isbn" is used
+                 String key = identifierKeyValueSplit[length - 2];
+                 String value = identifierKeyValueSplit[length - 1];
+                 Field field = FieldFactory.parseField(key);
+                 if (!entry.hasField(field)) {
+                     entry.setField(field, value);
+                 }
+             });
+        entry.clearField(identifierField);
     }
 }