Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that escaped query characters are not treated as escaped when calling queryParser.Parse() #850

Closed
suchoss opened this issue May 8, 2023 · 1 comment

Comments

@suchoss
Copy link

suchoss commented May 8, 2023

Having following query analyzer:

internal class DefaultQA : Lucene.Net.Analysis.Analyzer
{
    private LuceneVersion _version = Lucene.Net.Util.LuceneVersion.LUCENE_48;
    
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    {
        Tokenizer tokenizer = new Lucene.Net.Analysis.Standard.StandardTokenizer(_version, reader);
        TokenStream result = new StandardFilter(_version, tokenizer);
        result = new LowerCaseFilter(_version, result);
        result = new ASCIIFoldingFilter(result);

        return new TokenStreamComponents(tokenizer, result);
    }
}

When I try to parse a query containing & sign, it will generate AND query instead, even if & is escaped like this \&:

var qa = new DefaultQA();

var queryParser = new QueryParser(Lucene.Net.Util.LuceneVersion.LUCENE_48, "fieldName", qa)
{
    DefaultOperator = Lucene.Net.QueryParsers.Classic.Operator.AND
};

var query = QueryParser.Escape("more&more"); //this is the query I am trying to "parse"

var searchQuery = queryParser.Parse(query);
var searchQuery2 = queryParser.CreateBooleanQuery("fieldName", query);
var searchQuery3 = queryParser.CreatePhraseQuery("fieldName", query);

Result of previous gives me following:

searchQuery = +fieldName:more +fieldName:more
searchQuery2 = fieldName:more fieldName:more
searchQuery3 = fieldName:"more more"

But I am expecting something like this:

searchQuery = +fieldName:more\&more
searchQuery2 = fieldName:more\&more
searchQuery3 = fieldName:"more\&more"

It seems to me that escaped characters are treated the same way as unescaped ones.

@paulirwin
Copy link
Contributor

@suchoss Thanks for filing this report, but I have reproduced this in Java Lucene 4.8 and it produces the same result, therefore this is not a bug with our project. With modifications to the code below, I also reproduced the same result in Lucene 10.0.0. If you feel this is a bug, please file an issue with the upstream project. Thanks!

package org.example;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.util.Version;

import java.io.Reader;

public class DefaultQA extends Analyzer
{
    private Version _version = Version.LUCENE_48;

    @Override
    protected TokenStreamComponents createComponents(String s, Reader reader) {
        Tokenizer tokenizer = new StandardTokenizer(_version, reader);
        TokenStream result = new StandardFilter(_version, tokenizer);
        result = new LowerCaseFilter(_version, result);
        result = new ASCIIFoldingFilter(result);

        return new TokenStreamComponents(tokenizer, result);
    }
}
package org.example;

import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.util.Version;

public class Main {
    public static void main(String[] args) throws ParseException {
        var qa = new DefaultQA();

        var queryParser = new QueryParser(Version.LUCENE_48, "fieldName", qa);
        queryParser.setDefaultOperator(QueryParser.Operator.AND);

        var query = QueryParser.escape("more&more"); //this is the query I am trying to "parse"

        var searchQuery = queryParser.parse(query);
        var searchQuery2 = queryParser.createBooleanQuery("fieldName", query);
        var searchQuery3 = queryParser.createPhraseQuery("fieldName", query);

        System.out.println(searchQuery);
        System.out.println(searchQuery2);
        System.out.println(searchQuery3);
    }
}

output:

+fieldName:more +fieldName:more
fieldName:more fieldName:more
fieldName:"more more"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants