Add testing interface, including a set of capabilities to tests for (#…

…6687) Co-authored-by: Oliver Kopp <[email protected]>
JabRef · Aug 2, 2020 · b506baa · b506baa
1 parent 78aed32
commit b506baa
Show file tree

Hide file tree

Showing 28 changed files with 870 additions and 183 deletions.
diff --git a/config/checkstyle/checkstyle.xml b/config/checkstyle/checkstyle.xml
@@ -12,7 +12,9 @@
 
     <!-- BeforeExecutionFileFilters is required for sources that are based on java14 -->
     <module name="BeforeExecutionExclusionFileFilter">
-      <property name="fileNamePattern" value="AuthorAndsReplacer.java|EntryTypeView.java|MainTableFieldValueFormatter.java|Ordinal.java" />
+        <property
+                name="fileNamePattern"
+                value="AuthorAndsReplacer.java|EntryTypeView.java|IEEE.java|MainTableFieldValueFormatter.java|Ordinal.java"/>
     </module>
 
     <module name="SuppressionFilter">

diff --git a/docs/adr/0012-handle-different-bibEntry-formats-of-fetchers.md b/docs/adr/0012-handle-different-bibEntry-formats-of-fetchers.md
@@ -0,0 +1,56 @@
+# Handle different bibentry formats of fetchers by adding a layer
+
+## Context and Problem Statement
+
+All fetchers (except IDFetchers) in JabRef return BibEntries when fetching entries from their API.
+Some fetchers directly receive BibTeX entries from their API, the other fetchers receive their entries in some kind of exchange format such as JSON or XML and then parse this into BibEntries.
+Currently, all fetchers either return BibEntries in BibTeX or BibLaTeX format.
+This can lead to importing BibEntries of one format in a database of the other format.
+How can this inconsistency between fetchers, and their used formats be addressed?
+
+## Considered Options
+
+* Pass fetchers the format, they have to create entries accordingly (in the correct format).
+* Pass fetchers the format, they have to call a conversion method if necessary (in the correct format).
+* Let the caller handle any format inconsistencies and the conversion.
+* Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion
+
+## Decision Outcome
+
+Chosen option: "Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion",
+because it can compose all steps required during importing, not only format conversion of fetched entries.
+[As described here (comment)](https://github.com/JabRef/jabref/pull/6687)
+
+## Pros and Cons of the Options
+
+### Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion
+
+* Good, because fetchers do not have to think about conversion (Separation of concerns)
+* Good, because no other code that currently relies on fetchers has to do the conversion
+* Good, because this layer can be used for any kind of import to handle all conversion steps (not only format). [As described here (comment)](https://github.com/JabRef/jabref/pull/6687)
+* Good, because this layer can easily be extended if the import procedure changes
+* Bad, because this requires a lot of code changes
+* Bad, because this has to be tested extensively
+
+### Pass fetchers the format, they have to call a conversion method if necessary
+
+* Good, because less code has to be written than with option "Pass fetchers the format, they have to create entries accordingly"
+* Good, because code is already tested
+* Good, because keeps all conversion code centralized (code reuse)
+* Bad, because fetcher first creates the BibEntry in a possibly "wrong" format, this can easily lead to bugs due to e.g. code changes
+* Bad, because adds dependency
+
+### Pass fetchers the format, they have to create entries accordingly
+
+* Good, because fetchers already handle BibEntry creation (in their format of choice). This is part of his responsibility.
+* Good, because fetchers only create BibEntries of the "correct" format. At no point there exists the chance of the wrong format being passed on due to e.g. code changes.
+* Good, because the conversion does not have to take place
+* Bad, because fetcher has to "know" all differences of the formats -> clutters the code.
+* Bad, because this code has to be tested. Conversion already exists.
+
+### Let the caller handle any format inconsistencies and the conversion
+
+* Good, because fetcher code does not have to change
+* Good, because fetcher only has to fetch and does not need to know anything about the formats
+* Bad, because programmers might assume that a certain format is used, e.g. the preferred format (which would not work as the databases that imports the entries does not have to conform to the preferred format)
+* Bad, because at every place where fetchers are used, and the format matters, conversion has to be used, creating more dependencies
diff --git a/src/main/java/org/jabref/logic/importer/EntryBasedParserFetcher.java b/src/main/java/org/jabref/logic/importer/EntryBasedParserFetcher.java
@@ -43,6 +43,7 @@ public interface EntryBasedParserFetcher extends EntryBasedFetcher {
      * {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
      *
      * By default, no cleanup is done.
+     *
      * @param entry the entry to be cleaned-up
      */
     default void doPostCleanup(BibEntry entry) {

diff --git a/src/main/java/org/jabref/logic/importer/IdBasedFetcher.java b/src/main/java/org/jabref/logic/importer/IdBasedFetcher.java
@@ -14,7 +14,6 @@ public interface IdBasedFetcher extends WebFetcher {
      *
      * @param identifier a string which uniquely identifies the item
      * @return a {@link BibEntry} containing the bibliographic information (or an empty optional if no data was found)
-     * @throws FetcherException
      */
     Optional<BibEntry> performSearchById(String identifier) throws FetcherException;
 }
diff --git a/src/main/java/org/jabref/logic/importer/IdBasedParserFetcher.java b/src/main/java/org/jabref/logic/importer/IdBasedParserFetcher.java
@@ -48,6 +48,7 @@ public interface IdBasedParserFetcher extends IdBasedFetcher {
      * {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
      *
      * By default, no cleanup is done.
+     *
      * @param entry the entry to be cleaned-up
      */
     default void doPostCleanup(BibEntry entry) {

diff --git a/src/main/java/org/jabref/logic/importer/ImportCleanup.java b/src/main/java/org/jabref/logic/importer/ImportCleanup.java
@@ -0,0 +1,38 @@
+package org.jabref.logic.importer;
+
+import java.util.Collection;
+
+import org.jabref.logic.cleanup.ConvertToBiblatexCleanup;
+import org.jabref.logic.cleanup.ConvertToBibtexCleanup;
+import org.jabref.model.database.BibDatabaseMode;
+import org.jabref.model.entry.BibEntry;
+
+public class ImportCleanup {
+
+    private final BibDatabaseMode targetBibEntryFormat;
+
+    public ImportCleanup(BibDatabaseMode targetBibEntryFormat) {
+        this.targetBibEntryFormat = targetBibEntryFormat;
+    }
+
+    /**
+     * Performs a format conversion of the given entry into the targeted format.
+     *
+     * @return Returns the cleaned up bibentry to enable usage of doPostCleanup in streams.
+     */
+    public BibEntry doPostCleanup(BibEntry entry) {
+        if (targetBibEntryFormat == BibDatabaseMode.BIBTEX) {
+            new ConvertToBibtexCleanup().cleanup(entry);
+        } else if (targetBibEntryFormat == BibDatabaseMode.BIBLATEX) {
+            new ConvertToBiblatexCleanup().cleanup(entry);
+        }
+        return entry;
+    }
+
+    /**
+     * Performs a format conversion of the given entry collection into the targeted format.
+     */
+    public void doPostCleanup(Collection<BibEntry> entries) {
+        entries.parallelStream().forEach(entry -> doPostCleanup(entry));
+    }
+}
diff --git a/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java b/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java
@@ -2,6 +2,7 @@
 
 import java.util.List;
 
+import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
 import org.jabref.model.entry.BibEntry;
 
 /**
@@ -17,4 +18,15 @@ public interface SearchBasedFetcher extends WebFetcher {
      * @return a list of {@link BibEntry}, which are matched by the query (may be empty)
      */
     List<BibEntry> performSearch(String query) throws FetcherException;
+
+    /**
+     * This method is used to send complex queries using fielded search.
+     *
+     * @param complexSearchQuery the search query defining all fielded search parameters
+     * @return a list of {@link BibEntry}, which are matched by the query (may be empty)
+     */
+    default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
+        // Default Implementation behaves like perform search using the default field as query
+        return performSearch(complexSearchQuery.getDefaultField().orElse(""));
+    }
 }
diff --git a/src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java b/src/main/java/org/jabref/logic/importer/SearchBasedParserFetcher.java
@@ -8,6 +8,7 @@
 import java.util.Collections;
 import java.util.List;
 
+import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
 import org.jabref.model.cleanup.Formatter;
 import org.jabref.model.entry.BibEntry;
 import org.jabref.model.strings.StringUtil;
@@ -22,6 +23,7 @@ public interface SearchBasedParserFetcher extends SearchBasedFetcher {
 
     /**
      * Constructs a URL based on the query.
+     *
      * @param query the search query
      */
     URL getURLForQuery(String query) throws URISyntaxException, MalformedURLException, FetcherException;
@@ -31,24 +33,6 @@ public interface SearchBasedParserFetcher extends SearchBasedFetcher {
      */
     Parser getParser();
 
-    /**
-     * Performs a cleanup of the fetched entry.
-     *
-     * Only systematic errors of the fetcher should be corrected here
-     * (i.e. if information is consistently contained in the wrong field or the wrong format)
-     * but not cosmetic issues which may depend on the user's taste (for example, LateX code vs HTML in the abstract).
-     *
-     * Try to reuse existing {@link Formatter} for the cleanup. For example,
-     * {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
-     *
-     * By default, no cleanup is done.
-     *
-     * @param entry the entry to be cleaned-up
-     */
-    default void doPostCleanup(BibEntry entry) {
-        // Do nothing by default
-    }
-
     @Override
     default List<BibEntry> performSearch(String query) throws FetcherException {
         if (StringUtil.isBlank(query)) {
@@ -71,4 +55,49 @@ default List<BibEntry> performSearch(String query) throws FetcherException {
             throw new FetcherException("An internal parser error occurred", e);
         }
     }
+
+    /**
+     * This method is used to send queries with advanced URL parameters.
+     * This method is necessary as the performSearch method does not support certain URL parameters that are used for
+     * fielded search, such as a title, author, or year parameter.
+     *
+     * @param complexSearchQuery the search query defining all fielded search parameters
+     */
+    @Override
+    default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
+        try (InputStream stream = getUrlDownload(getComplexQueryURL(complexSearchQuery)).asInputStream()) {
+            List<BibEntry> fetchedEntries = getParser().parseEntries(stream);
+            fetchedEntries.forEach(this::doPostCleanup);
+            return fetchedEntries;
+        } catch (URISyntaxException e) {
+            throw new FetcherException("Search URI is malformed", e);
+        } catch (IOException e) {
+            // TODO: Catch HTTP Response 401/403 errors and report that user has no rights to access resource
+            throw new FetcherException("A network error occurred", e);
+        } catch (ParseException e) {
+            throw new FetcherException("An internal parser error occurred", e);
+        }
+    }
+
+    default URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException, FetcherException {
+        // Default Implementation behaves like getURLForQuery using the default field as query
+        return this.getURLForQuery(complexSearchQuery.getDefaultField().orElse(""));
+    }
+
+    /**
+     * Performs a cleanup of the fetched entry.
+     *
+     * Only systematic errors of the fetcher should be corrected here
+     * (i.e. if information is consistently contained in the wrong field or the wrong format)
+     * but not cosmetic issues which may depend on the user's taste (for example, LateX code vs HTML in the abstract).
+     *
+     * Try to reuse existing {@link Formatter} for the cleanup. For example,
+     * {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
+     *
+     * By default, no cleanup is done.
+     * @param entry the entry to be cleaned-up
+     */
+    default void doPostCleanup(BibEntry entry) {
+        // Do nothing by default
+    }
 }
diff --git a/src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java b/src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java
@@ -158,7 +158,7 @@ private List<ArXivEntry> searchForEntries(String searchQuery) throws FetcherExce
     }
 
     private List<ArXivEntry> queryApi(String searchQuery, List<ArXivIdentifier> ids, int start, int maxResults)
-        throws FetcherException {
+            throws FetcherException {
         Document result = callApi(searchQuery, ids, start, maxResults);
         List<Node> entries = XMLUtil.asList(result.getElementsByTagName("entry"));
 
@@ -255,6 +255,25 @@ public List<BibEntry> performSearch(String query) throws FetcherException {
                                       .collect(Collectors.toList());
     }
 
+    /**
+     * Constructs a complex query string using the field prefixes specified at https://arxiv.org/help/api/user-manual
+     *
+     * @param complexSearchQuery the search query defining all fielded search parameters
+     * @return A list of entries matching the complex query
+     */
+    @Override
+    public List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
+        List<String> searchTerms = new ArrayList<>();
+        complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("au:" + author)));
+        complexSearchQuery.getTitlePhrases().ifPresent(title -> searchTerms.add("ti:" + title));
+        complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("jr:" + journal));
+        // Since ArXiv API does not support year search, we ignore the year related terms
+        complexSearchQuery.getToYear().ifPresent(year -> searchTerms.add(year.toString()));
+        complexSearchQuery.getDefaultField().ifPresent(defaultField -> searchTerms.add(defaultField));
+        String complexQueryString = String.join(" AND ", searchTerms);
+        return performSearch(complexQueryString);
+    }
+
     @Override
     public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
         return searchForEntryById(identifier)

diff --git a/src/main/java/org/jabref/logic/importer/fetcher/CiteSeer.java b/src/main/java/org/jabref/logic/importer/fetcher/CiteSeer.java
@@ -59,7 +59,6 @@ public Parser getParser() {
         // So we extract the data string from the <span class="Z3988" title="<data>"></span> tags and pass the content to the COinS parser
         return inputStream -> {
             String response = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(OS.NEWLINE));
-
             List<BibEntry> entries = new ArrayList<>();
             CoinsParser parser = new CoinsParser();
             Pattern pattern = Pattern.compile("<span class=\"Z3988\" title=\"(.*)\"></span>");