Skip to content

Commit

Permalink
Add testing interface, including a set of capabilities to tests for (#…
Browse files Browse the repository at this point in the history
…6687)

Co-authored-by: Oliver Kopp <[email protected]>
  • Loading branch information
DominikVoigt and koppor authored Aug 2, 2020
1 parent 78aed32 commit b506baa
Show file tree
Hide file tree
Showing 28 changed files with 870 additions and 183 deletions.
4 changes: 3 additions & 1 deletion config/checkstyle/checkstyle.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@

<!-- BeforeExecutionFileFilters is required for sources that are based on java14 -->
<module name="BeforeExecutionExclusionFileFilter">
<property name="fileNamePattern" value="AuthorAndsReplacer.java|EntryTypeView.java|MainTableFieldValueFormatter.java|Ordinal.java" />
<property
name="fileNamePattern"
value="AuthorAndsReplacer.java|EntryTypeView.java|IEEE.java|MainTableFieldValueFormatter.java|Ordinal.java"/>
</module>

<module name="SuppressionFilter">
Expand Down
56 changes: 56 additions & 0 deletions docs/adr/0012-handle-different-bibEntry-formats-of-fetchers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Handle different bibentry formats of fetchers by adding a layer

## Context and Problem Statement

All fetchers (except IDFetchers) in JabRef return BibEntries when fetching entries from their API.
Some fetchers directly receive BibTeX entries from their API, the other fetchers receive their entries in some kind of exchange format such as JSON or XML and then parse this into BibEntries.
Currently, all fetchers either return BibEntries in BibTeX or BibLaTeX format.
This can lead to importing BibEntries of one format in a database of the other format.
How can this inconsistency between fetchers, and their used formats be addressed?

## Considered Options

* Pass fetchers the format, they have to create entries accordingly (in the correct format).
* Pass fetchers the format, they have to call a conversion method if necessary (in the correct format).
* Let the caller handle any format inconsistencies and the conversion.
* Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion

## Decision Outcome

Chosen option: "Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion",
because it can compose all steps required during importing, not only format conversion of fetched entries.
[As described here (comment)](https://github.com/JabRef/jabref/pull/6687)

## Pros and Cons of the Options

### Introduce a new layer between fetchers and caller, such as a FetcherHandler, that manages the conversion

* Good, because fetchers do not have to think about conversion (Separation of concerns)
* Good, because no other code that currently relies on fetchers has to do the conversion
* Good, because this layer can be used for any kind of import to handle all conversion steps (not only format). [As described here (comment)](https://github.com/JabRef/jabref/pull/6687)
* Good, because this layer can easily be extended if the import procedure changes
* Bad, because this requires a lot of code changes
* Bad, because this has to be tested extensively

### Pass fetchers the format, they have to call a conversion method if necessary

* Good, because less code has to be written than with option "Pass fetchers the format, they have to create entries accordingly"
* Good, because code is already tested
* Good, because keeps all conversion code centralized (code reuse)
* Bad, because fetcher first creates the BibEntry in a possibly "wrong" format, this can easily lead to bugs due to e.g. code changes
* Bad, because adds dependency

### Pass fetchers the format, they have to create entries accordingly

* Good, because fetchers already handle BibEntry creation (in their format of choice). This is part of his responsibility.
* Good, because fetchers only create BibEntries of the "correct" format. At no point there exists the chance of the wrong format being passed on due to e.g. code changes.
* Good, because the conversion does not have to take place
* Bad, because fetcher has to "know" all differences of the formats -> clutters the code.
* Bad, because this code has to be tested. Conversion already exists.

### Let the caller handle any format inconsistencies and the conversion

* Good, because fetcher code does not have to change
* Good, because fetcher only has to fetch and does not need to know anything about the formats
* Bad, because programmers might assume that a certain format is used, e.g. the preferred format (which would not work as the databases that imports the entries does not have to conform to the preferred format)
* Bad, because at every place where fetchers are used, and the format matters, conversion has to be used, creating more dependencies
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ public interface EntryBasedParserFetcher extends EntryBasedFetcher {
* {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
*
* By default, no cleanup is done.
*
* @param entry the entry to be cleaned-up
*/
default void doPostCleanup(BibEntry entry) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ public interface IdBasedFetcher extends WebFetcher {
*
* @param identifier a string which uniquely identifies the item
* @return a {@link BibEntry} containing the bibliographic information (or an empty optional if no data was found)
* @throws FetcherException
*/
Optional<BibEntry> performSearchById(String identifier) throws FetcherException;
}
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ public interface IdBasedParserFetcher extends IdBasedFetcher {
* {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
*
* By default, no cleanup is done.
*
* @param entry the entry to be cleaned-up
*/
default void doPostCleanup(BibEntry entry) {
Expand Down
38 changes: 38 additions & 0 deletions src/main/java/org/jabref/logic/importer/ImportCleanup.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package org.jabref.logic.importer;

import java.util.Collection;

import org.jabref.logic.cleanup.ConvertToBiblatexCleanup;
import org.jabref.logic.cleanup.ConvertToBibtexCleanup;
import org.jabref.model.database.BibDatabaseMode;
import org.jabref.model.entry.BibEntry;

public class ImportCleanup {

private final BibDatabaseMode targetBibEntryFormat;

public ImportCleanup(BibDatabaseMode targetBibEntryFormat) {
this.targetBibEntryFormat = targetBibEntryFormat;
}

/**
* Performs a format conversion of the given entry into the targeted format.
*
* @return Returns the cleaned up bibentry to enable usage of doPostCleanup in streams.
*/
public BibEntry doPostCleanup(BibEntry entry) {
if (targetBibEntryFormat == BibDatabaseMode.BIBTEX) {
new ConvertToBibtexCleanup().cleanup(entry);
} else if (targetBibEntryFormat == BibDatabaseMode.BIBLATEX) {
new ConvertToBiblatexCleanup().cleanup(entry);
}
return entry;
}

/**
* Performs a format conversion of the given entry collection into the targeted format.
*/
public void doPostCleanup(Collection<BibEntry> entries) {
entries.parallelStream().forEach(entry -> doPostCleanup(entry));
}
}
12 changes: 12 additions & 0 deletions src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import java.util.List;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
import org.jabref.model.entry.BibEntry;

/**
Expand All @@ -17,4 +18,15 @@ public interface SearchBasedFetcher extends WebFetcher {
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
List<BibEntry> performSearch(String query) throws FetcherException;

/**
* This method is used to send complex queries using fielded search.
*
* @param complexSearchQuery the search query defining all fielded search parameters
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
// Default Implementation behaves like perform search using the default field as query
return performSearch(complexSearchQuery.getDefaultField().orElse(""));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import java.util.Collections;
import java.util.List;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
import org.jabref.model.cleanup.Formatter;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.strings.StringUtil;
Expand All @@ -22,6 +23,7 @@ public interface SearchBasedParserFetcher extends SearchBasedFetcher {

/**
* Constructs a URL based on the query.
*
* @param query the search query
*/
URL getURLForQuery(String query) throws URISyntaxException, MalformedURLException, FetcherException;
Expand All @@ -31,24 +33,6 @@ public interface SearchBasedParserFetcher extends SearchBasedFetcher {
*/
Parser getParser();

/**
* Performs a cleanup of the fetched entry.
*
* Only systematic errors of the fetcher should be corrected here
* (i.e. if information is consistently contained in the wrong field or the wrong format)
* but not cosmetic issues which may depend on the user's taste (for example, LateX code vs HTML in the abstract).
*
* Try to reuse existing {@link Formatter} for the cleanup. For example,
* {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
*
* By default, no cleanup is done.
*
* @param entry the entry to be cleaned-up
*/
default void doPostCleanup(BibEntry entry) {
// Do nothing by default
}

@Override
default List<BibEntry> performSearch(String query) throws FetcherException {
if (StringUtil.isBlank(query)) {
Expand All @@ -71,4 +55,49 @@ default List<BibEntry> performSearch(String query) throws FetcherException {
throw new FetcherException("An internal parser error occurred", e);
}
}

/**
* This method is used to send queries with advanced URL parameters.
* This method is necessary as the performSearch method does not support certain URL parameters that are used for
* fielded search, such as a title, author, or year parameter.
*
* @param complexSearchQuery the search query defining all fielded search parameters
*/
@Override
default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
try (InputStream stream = getUrlDownload(getComplexQueryURL(complexSearchQuery)).asInputStream()) {
List<BibEntry> fetchedEntries = getParser().parseEntries(stream);
fetchedEntries.forEach(this::doPostCleanup);
return fetchedEntries;
} catch (URISyntaxException e) {
throw new FetcherException("Search URI is malformed", e);
} catch (IOException e) {
// TODO: Catch HTTP Response 401/403 errors and report that user has no rights to access resource
throw new FetcherException("A network error occurred", e);
} catch (ParseException e) {
throw new FetcherException("An internal parser error occurred", e);
}
}

default URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException, FetcherException {
// Default Implementation behaves like getURLForQuery using the default field as query
return this.getURLForQuery(complexSearchQuery.getDefaultField().orElse(""));
}

/**
* Performs a cleanup of the fetched entry.
*
* Only systematic errors of the fetcher should be corrected here
* (i.e. if information is consistently contained in the wrong field or the wrong format)
* but not cosmetic issues which may depend on the user's taste (for example, LateX code vs HTML in the abstract).
*
* Try to reuse existing {@link Formatter} for the cleanup. For example,
* {@code new FieldFormatterCleanup(StandardField.TITLE, new RemoveBracesFormatter()).cleanup(entry);}
*
* By default, no cleanup is done.
* @param entry the entry to be cleaned-up
*/
default void doPostCleanup(BibEntry entry) {
// Do nothing by default
}
}
21 changes: 20 additions & 1 deletion src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ private List<ArXivEntry> searchForEntries(String searchQuery) throws FetcherExce
}

private List<ArXivEntry> queryApi(String searchQuery, List<ArXivIdentifier> ids, int start, int maxResults)
throws FetcherException {
throws FetcherException {
Document result = callApi(searchQuery, ids, start, maxResults);
List<Node> entries = XMLUtil.asList(result.getElementsByTagName("entry"));

Expand Down Expand Up @@ -255,6 +255,25 @@ public List<BibEntry> performSearch(String query) throws FetcherException {
.collect(Collectors.toList());
}

/**
* Constructs a complex query string using the field prefixes specified at https://arxiv.org/help/api/user-manual
*
* @param complexSearchQuery the search query defining all fielded search parameters
* @return A list of entries matching the complex query
*/
@Override
public List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
List<String> searchTerms = new ArrayList<>();
complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("au:" + author)));
complexSearchQuery.getTitlePhrases().ifPresent(title -> searchTerms.add("ti:" + title));
complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("jr:" + journal));
// Since ArXiv API does not support year search, we ignore the year related terms
complexSearchQuery.getToYear().ifPresent(year -> searchTerms.add(year.toString()));
complexSearchQuery.getDefaultField().ifPresent(defaultField -> searchTerms.add(defaultField));
String complexQueryString = String.join(" AND ", searchTerms);
return performSearch(complexQueryString);
}

@Override
public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
return searchForEntryById(identifier)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ public Parser getParser() {
// So we extract the data string from the <span class="Z3988" title="<data>"></span> tags and pass the content to the COinS parser
return inputStream -> {
String response = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(OS.NEWLINE));

List<BibEntry> entries = new ArrayList<>();
CoinsParser parser = new CoinsParser();
Pattern pattern = Pattern.compile("<span class=\"Z3988\" title=\"(.*)\"></span>");
Expand Down
Loading

0 comments on commit b506baa

Please sign in to comment.