-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature/enable lucene query parsing (#6799)
- Loading branch information
1 parent
c1f2d3f
commit dfefd4d
Showing
14 changed files
with
319 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
56 changes: 56 additions & 0 deletions
56
docs/adr/0015-support-an-abstract-query-syntax-for-query-conversion.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Query syntax design | ||
|
||
## Context and Problem Statement | ||
|
||
All libraries use their own query syntax for advanced search options. To increase usability, users should be able to formulate their (abstract) search queries in a query syntax that can be mapped to the library specific search queries. To achieve this, the query has to be parsed into an AST. | ||
|
||
Which query syntax should be used for the abstract queries? | ||
Which features should the syntax support? | ||
|
||
## Considered Options | ||
|
||
* Use a simplified syntax that is derived of the [lucene](https://lucene.apache.org/core/8_6_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html) query syntax | ||
* Formulate a own query syntax | ||
|
||
## Decision Outcome | ||
|
||
Chosen option: "Use a syntax that is derived of the lucene query syntax", because only option that is already known, and easy to implemenent. | ||
Furthermore parsers for lucene already exist and are tested. | ||
For simplicitly, and lack of universal capabilities across fetchers, only basic query features and therefor syntax is supported: | ||
|
||
* All terms in the query are whitespace separated and will be ANDed | ||
* Default and certain fielded terms are supported | ||
* Fielded Terms: | ||
* `author` | ||
* `title` | ||
* `journal` | ||
* `year` (for single year) | ||
* `year-range` (for range e.g. `year-range:2012-2015`) | ||
* The `journal`, `year`, and `year-range` fields should only be populated once in each query | ||
* Example: | ||
* `author:"Igor Steinmacher" author:"Christoph Treude" year:2017` will be converted to | ||
* `author:"Igor Steinmacher" AND author:"Christoph Treude" AND year:2017` | ||
|
||
### Positive Consequences | ||
|
||
* Already tested | ||
* Well known | ||
* Easy to implement | ||
* Can use an existing parser | ||
|
||
## Pros and Cons of the Options | ||
|
||
### Use a syntax that is derived of the lucene query syntax | ||
|
||
* Good, because already exists | ||
* Good, because already well known | ||
* Good, because there already exists a [parser for lucene syntax](https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.html) | ||
* Good, because capabilities of query conversion can easily be extended using the [flexible lucene framework](https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html) | ||
|
||
### Formulate a own query syntax | ||
|
||
* Good, because allows for flexibility | ||
* Bad, because needs a new parser (has to be decided whether to use [ANTLR](https://www.antlr.org/), [JavaCC](https://javacc.github.io/javacc/), or [LogicNG](https://github.com/logic-ng/LogicNG)) | ||
* Bad, because has to be tested | ||
* Bad, because syntax is not well known | ||
* Bad, because the design should be easily extensible, requires an appropriate design (high effort) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
package org.jabref.logic.importer; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Comparator; | ||
import java.util.HashSet; | ||
import java.util.List; | ||
import java.util.Optional; | ||
import java.util.Set; | ||
|
||
import org.jabref.logic.importer.fetcher.ComplexSearchQuery; | ||
|
||
import org.apache.lucene.index.Term; | ||
import org.apache.lucene.queryparser.flexible.core.QueryNodeException; | ||
import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.QueryVisitor; | ||
|
||
/** | ||
* This class converts a query string written in lucene syntax into a complex search query. | ||
* | ||
* For simplicity this is limited to fielded data and the boolean AND operator. | ||
*/ | ||
public class QueryParser { | ||
|
||
/** | ||
* Parses the given query string into a complex query using lucene. | ||
* Note: For unique fields, the alphabetically first instance in the query string is used in the complex query. | ||
* | ||
* @param queryString The given query string | ||
* @return A complex query containing all fields of the query string | ||
* @throws QueryNodeException Error during parsing | ||
*/ | ||
public Optional<ComplexSearchQuery> parseQueryStringIntoComplexQuery(String queryString) { | ||
try { | ||
ComplexSearchQuery.ComplexSearchQueryBuilder builder = ComplexSearchQuery.builder(); | ||
|
||
StandardQueryParser parser = new StandardQueryParser(); | ||
Query luceneQuery = parser.parse(queryString, "default"); | ||
Set<Term> terms = new HashSet<>(); | ||
// This implementation collects all terms from the leaves of the query tree independent of the internal boolean structure | ||
// If further capabilities are required in the future the visitor and ComplexSearchQuery has to be adapted accordingly. | ||
QueryVisitor visitor = QueryVisitor.termCollector(terms); | ||
luceneQuery.visit(visitor); | ||
|
||
List<Term> sortedTerms = new ArrayList<>(terms); | ||
sortedTerms.sort(Comparator.comparing(Term::text)); | ||
builder.terms(sortedTerms); | ||
return Optional.of(builder.build()); | ||
} catch (QueryNodeException | IllegalStateException ex) { | ||
return Optional.empty(); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.