Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Expand Full-Text Search capabilities #40583

Closed
wants to merge 1 commit into from

Conversation

jonnott
Copy link
Contributor

@jonnott jonnott commented Jan 24, 2022

Following on from @driesvints' excellent work in #40129, this adds selectFullText() and addSelectFullText() methods.

Whilst whereFullText() is useful in itself, an arguably more practical use-case for fulltext search queries involves being able to select the relevance score in the query, filter to only results with > 0 relevance score using a having clause, and ordering the results by relevance).

This is a scenario used in the fulltext search lesson of @reinink's Eloquent Performance Patterns video course, except there the MATCH()..AGAINST() is used in with select and where clauses, which isn't actually necessary .. see 'Ordering By Result Relevance' section here https://www.cloudsavvyit.com/10172/how-to-use-full-text-searches-in-mysql/

Example (assuming a fulltext index exists on 'bio,resume' (both string) fields):

$users = \App\Models\User::addSelectFullText(['bio', 'resume'], 'my previous work', 'relevance', ['mode' => 'boolean'])
->having('relevance', '>', 0);
->orderBy('relevance', 'desc')
->get();

TODO:

  • Postgres version (I'm out of my depth there) .. hoping @tpetry can help
  • Add more tests, specific to selectFullText() and addSelectFullText()

@jonnott jonnott force-pushed the add-fulltext-select branch from f12afb1 to 189d76c Compare January 24, 2022 10:44
@tpetry
Copy link
Contributor

tpetry commented Jan 24, 2022

This is a scenario used in the fulltext search lesson of @reinink's Eloquent Performance Patterns video course, except there the MATCH()..AGAINST() is used in with select and where clauses, which isn't actually necessary .. see 'Ordering By Result Relevance' section here https://www.cloudsavvyit.com/10172/how-to-use-full-text-searches-in-mysql/

If you don't usw the MATCH(...) AGAINST(...) clause in the WHERE part then any database is calculating the score for any row. It's calculating a score, but not using an index to filter all the rows. Your HAVING clause is then filtering every row by the fulltext expression, without an index. Your implementation does work, but will not use an index. I am strictly suggesting not using that solution.

I will look at the PR and it's effect on PostgreSQL later. But I don't share the actual implementation idea. It's making a simple "order by ranking" logic very complicated as I have to add a select part (which I don't need) and then sort by that column.

I, personally, would prefer an orderByFullText($columns, $value, $direction = 'desc', array $options = []) method and your addSelectFullText implementation in case the application requires the score. @driesvints What's your opinion?

@driesvints
Copy link
Member

@tpetry my knowledge about full text search doesn't extends that far. I suggest you both work on this PR as you see fit and mark it as ready when you think it's valid for review.

@jonnott
Copy link
Contributor Author

jonnott commented Jan 24, 2022

If you don't usw the MATCH(...) AGAINST(...) clause in the WHERE part then any database is calculating the score for any row.

Great point @tpetry .. and not a consideration I'd realised. How would you see the orderByFullText() operating - would it add the where condition AND the order by clause?

@jonnott
Copy link
Contributor Author

jonnott commented Jan 24, 2022

@tpetry I guess my main goal in all this is a way of getting the results ordered by relevance. Having the score value available from the select clause is very much secondary (although I could see some uses for it, e.g. displaying a %-age relevance score on a results page).

I'm sure I read somewhere that if there's no other ordering specified on the query and a fulltext MATCH is used, MySQL will order the result by descending relevance by default, but I can't seem to replicate that in manual query tests on my local db. I'm not sure if it's actually true, or just a rumour from something I Googled.

@jonnott
Copy link
Contributor Author

jonnott commented Jan 24, 2022

@tpetry Looking at the MySQL docs:

When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first as long as the following conditions are met:

There must be no explicit ORDER BY clause.

The search must be performed using a full-text index scan rather than a table scan.

If the query joins tables, the full-text index scan must be the leftmost non-constant table in the join.

So that confirms that needing the relevance score to actually be available as a result table column is really the only reason for a select-based MATCH..AGAINST.

I'm interested as to what you'd think an orderByFullText() would do that whereFullText() doesn't already, in light of the above..

@driesvints
Copy link
Member

We've fixed the build on 8.x so please rebase, thanks!

@jonnott jonnott force-pushed the add-fulltext-select branch from 189d76c to 7c4d0e0 Compare January 24, 2022 15:15
@tpetry
Copy link
Contributor

tpetry commented Jan 25, 2022

If you don't usw the MATCH(...) AGAINST(...) clause in the WHERE part then any database is calculating the score for any row.

Great point @tpetry .. and not a consideration I'd realised. How would you see the orderByFullText() operating - would it add the where condition AND the order by clause?

Just the ORDER BY.

@tpetry I guess my main goal in all this is a way of getting the results ordered by relevance. Having the score value available from the select clause is very much secondary (although I could see some uses for it, e.g. displaying a %-age relevance score on a results page).

The score is an implementation detail, it's not really a percentage value. A perfect document can be a 1.0 or a 0.12. There is no meaning in the value. It depends on many factors: dbms engine, language, text corpus, search term.

When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first as long as the following conditions are met:

There must be no explicit ORDER BY clause.

The search must be performed using a full-text index scan rather than a table scan.

If the query joins tables, the full-text index scan must be the leftmost non-constant table in the join.

I'm interested as to what you'd think an orderByFullText() would do that whereFullText() doesn't already, in light of the above..

As stated in the documentation, automatic ordering will not happen in some cases for joins. But reducing your question to it's basic statement would make this PR obsolete as MySQL is doing sorting completely automatic? Additionally, manual sorting together with rank-based sorting may be used.

@jonnott
Copy link
Contributor Author

jonnott commented Jan 25, 2022

@tpetry Thanks, and you're right, now I've realised the ordering happens automatically in MySQL, and the score value isn't really useful, this whole PR is probably redundant. I'll close!

Sorry for the time-waste @driesvints :(

@jonnott jonnott closed this Jan 25, 2022
@driesvints
Copy link
Member

@jonnott don't worry about it 👍

@tpetry
Copy link
Contributor

tpetry commented Jan 25, 2022

It's useful for PostgreSQL as it does not do any automatic sorting, I will work on a PR the next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants