Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching in Swift? #13

Closed
keehun opened this issue Feb 27, 2016 · 4 comments
Closed

Searching in Swift? #13

keehun opened this issue Feb 27, 2016 · 4 comments

Comments

@keehun
Copy link

keehun commented Feb 27, 2016

Hi,

Been trying to get search to work in Swift. I've spent the last week or so figuring out how to scrape a lot of HTML and have about 220,000 documents in Lucene. Searching with Java works really well. Exactly what I need.

Now I'm trying to bring it to iOS with Swift. I've got the framework set up correctly, and the CLuceneSearchService class initializes fine.

func application(application: UIApplication, didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool {

        searchService = CLuceneSearchService(indexPath:  NSBundle.mainBundle().resourcePath!.stringByAppendingString("/LuceneIndexes"))
        if (searchService as BRSearchService?) == nil {
            print("unable to intialize search database")
        }

        let results:BRSearchResults = searchService.search("piano")
        NSLog("Results: %i", results.count())

        return true
    }

When this code runs, it always returns 0. I'm not sure if:

  1. CLuceneSearchService isn't actually reading the imported indexes (I've checked that the path is correct and files are recognized to already exist)... How do I check how many documents CLuceneSearchService is looking at? IndexReader doesn't seem to be available?
  2. If I need to recreate the indexes with iOS code and not Java code--although I thought the whole point was it wouldn't matter

Any pointers on why this isn't working? The term "piano" should hit many terms (134 hits in the Java code).

Thanks

p.s. Here's the Java code for the search:

public static void searchIndex(String searchString) throws IOException, ParseException {
        System.out.println("Searching for '" + searchString + "'");
        Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
        IndexReader indexReader = IndexReader.open(directory);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);

        Analyzer analyzer = new StopAnalyzer();

        QueryParser queryParser = new QueryParser(JSON_WORD_SEARCH, analyzer);
        Query query = queryParser.parse(searchString);
        Hits hits = indexSearcher.search(query);
        System.out.println("Number of hits: " + hits.length());

        Iterator<Hit> it = hits.iterator();
        while (it.hasNext()) {
            Hit hit = it.next();
            org.apache.lucene.document.Document document = hit.getDocument();
            String path = document.get(JSON_WORD_DISPLAY);
            System.out.println("Hit: " + path);
        }

    }
@keehun
Copy link
Author

keehun commented Feb 27, 2016

Ok--I think it has to do with the default field names that BRFullTextSearch has set. There's no easy way to change this if using CocoaPods--will recompile with a dependent project and use my own field keys and see if that'll work.

@keehun
Copy link
Author

keehun commented Feb 27, 2016

The problem actually was that the built-in search code was using the wrong analyzer for my indexes. It seems really weird that I have to full-on edit the Framework to change these settings. Can they be changed anywhere in my code and leave the library as-is? The snowball analyzer is hard-coded into [CLuceneSearchService defaultAnalyzer] and search: has defaultAnalyzer: hard-coded in, so I'm guessing that it's not yet implemented. I propose these become settings that can be readily changed.

Once I changed this method to what I have here:

- (std::auto_ptr<Analyzer>)analyzerForLanguage:(NSString *)lang {
    std::auto_ptr<Analyzer> stopanalysis(new lucene::analysis::StopAnalyzer());
    return stopanalysis;
}

it worked perfectly. I also had to change the field keys. That seems like something that should definitely be in a setting/property.

I do get that Lucene and BRFullTextSearch was built to used to both create and search documents (and thus negating the need to set your own field keys or analyzer types). Creating an index from elsewhere and importing it into a project wasn't the first consideration--so I understand these design choices.

@keehun keehun closed this as completed Feb 27, 2016
@msqr
Copy link
Contributor

msqr commented Feb 28, 2016

Hi @keehun,

Some parts of this API have not been exposed for extensibility or configurability yet, as you have discovered. This is more from time constraints rather than oversight. In your case, the defaultAnalyzer method can be overridden by subclassing CLuceneSearchService to return the Analyzer instance you'd like to use.

The generalTextFields internal array should be a configurable property on CLuceneSearchService, but as a work-around you can avoid using kBRSearchFieldNameTitle or kBRSearchFieldNameValue field keys in your BRIndexable implementations. As you're creating the index outside of the app, you could just use the expected field names to match what CLuceneSearchService is written to support by default.

Thanks for your feedback!

@msqr
Copy link
Contributor

msqr commented Feb 28, 2016

@keehun I forgot to also mention, you can set

@property (nonatomic, getter=isStemmingDisabled) BOOL stemmingDisabled;

to YES to turn off stemming, which makes the Analyzer more similar to using a plain StopAnalyzer, that is a StandardTokenizer along with a LowercaseFilter and StopFilter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants