Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix EZP-24553: no results for exact phrases with camelCase words #196

Merged

Conversation

joaoinacio
Copy link
Contributor

JIRA: https://jira.ez.no/browse/EZP-24553

When performing an exact match for some phrases, no results are returned.

Problem:

After investigation it seems the cause of the problem is caused by the existence of "camelCase" words in the sentence (eg: eZ)

The wordDelimiter filter in the index analyzer performs splitOnCaseChange and concatenateWords ( which causes "eZ Publish Platform" to be indexed as "e / ez / z | publish | platform" );
However, the query analyzer does not perform these operations, so the terms end up being different and not matching.

Solution:

Changing either the index or query analyzers to the same camelCase splitting behavior.

ping @paulborgermans

@peterkeung
Copy link
Collaborator

If the alternative means that you modify the index analyzer to NOT split on camel case, then I vote for the alternative. There are definitely times where you want to find something like "BlackBerry" and not have it match "Black Berry". It is worth having to re-index to avoid bad matches when you were explicitly quoting the string.

@bgamrat
Copy link

bgamrat commented Jul 2, 2015

I agree with Peter - it is better to reindex and avoid bad matches.

@andrerom
Copy link
Contributor

andrerom commented Jul 2, 2015

I think I agree with @peterkeung and @bgamrat, but would like some input from @paulborgermans, he should be back next week afaik.

@joaoinacio
Copy link
Contributor Author

Thanks for the feedback; indeed, changing the index instead makes perfect sense, so this PR should most likely be updated accordingly.

As noted, waiting for some additional input here from pb in case something was overlooked.
In any case everything looked good on my tests, besides the very minor BC (is it even one?)

@paulborgermans
Copy link
Contributor

+1, I am ok with the pull request, it should indeed be consistent.

But also be aware that preferences on the analysis steps are up to the particular implementation. So camel case splitting may be turned of for certain customer projects, but active for others.

There is no big "truth" on what analysis should do out of the box/per default

@joaoinacio joaoinacio force-pushed the EZP-24553_camelcase_exact_matches branch from a52bf0a to 61ced32 Compare July 14, 2015 12:38
@joaoinacio
Copy link
Contributor Author

Updated w/ changes to index-time.
@paulborgermans: Thanks for the feedback, indeed the configuration may be a matter of particular needs and modified accordingly; this modifies the default behavior to be more "natural" with most common use cases.
@andrerom: Would there be any need to for doc. here?

@bdunogier
Copy link
Member

+1

joaoinacio pushed a commit that referenced this pull request Jul 20, 2015
…ches

Fix EZP-24553: no results for exact phrases with camelCase words
@joaoinacio joaoinacio merged commit 0f16b90 into ezsystems:master Jul 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants