Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Implement Full-Text Search for MySQL & PostgreSQL #40129

Merged
merged 12 commits into from
Jan 6, 2022
Merged

Conversation

driesvints
Copy link
Member

@driesvints driesvints commented Dec 21, 2021

This PR implements natural language Full-Text Searches for MySQL and PostgreSQL. The columns that are searched in the MATCH part always need to exist as a fulltext index.

Schema::create('articles', function (Blueprint $table) {
    $table->id('id');
    $table->string('title', 200);
    $table->text('body');
    $table->fulltext(['title', 'body']);
});
// Search for "databases" in the title and body fulltext index...
$articles = DB::table('articles')->whereFulltext(['title', 'body'], 'database')->get();

// Search for "databases" in the title and body fulltext index with boolean mode...
$articles = DB::table('articles')->whereFulltext(['title', 'body'], 'database', ['mode' => 'boolean'])->get();

// Search for "databases" in the title and body fulltext index with an expanded query...
$articles = DB::table('articles')->whereFulltext(['title', 'body'], 'database', ['expanded' => true])->get();

More info:

Thanks to @tpetry for also implementing PostgreSQL support! #40229

@driesvints driesvints changed the title Implement Full-Text Searches for MySQL [8.x] Implement Full-Text Searches for MySQL Dec 21, 2021
@driesvints driesvints marked this pull request as ready for review December 21, 2021 20:10
@Wulfheart
Copy link

Might also be related for SQL Server: https://docs.microsoft.com/en-us/sql/relational-databases/search/full-text-search

@deleugpn
Copy link
Contributor

I worked on 3 database migrations (EC2 to RDS, RDS to Aurora and Aurora 5.6 to 5.7) on databases ranging from 100GB to 350GB and all 3 migrations were a nightmare because of Mysql full text indexes. When using services like AWS DMS to copy data over without downtime, large table with fulltext indexes easily crash. MySQL 5.7 is very buggy with fulltext indexes and can lead to a lot of deadlocks and semaphore waits crashing after 600 seconds. Even a table with only 2 columns (id and comment) where the comment column is fulltext indexed lead to database crashes (Aurora MySQL 5.7 compatible). I don't know how 8.0 is now but I would avoid MySQL fulltext indexes like the plague.

@ostark
Copy link

ostark commented Dec 21, 2021

@driesvints
Nice addition

@deleugpn
In my experience mysql fulltext search is good enough for mid size datasets, like ~1M rows.

@GrahamCampbell
Copy link
Member

GrahamCampbell commented Dec 21, 2021

RDS Aurora supports adding indexes only to read-replicas. You should not experience locking, going that route. :P

@tpetry
Copy link
Contributor

tpetry commented Dec 22, 2021

As promised on Twitter, I would take a look at the implementation for porting it to PostgreSQL. The implementation looks solid but is very MySQL-specific (as expected). For introducing PostgreSQL support, I would have to add entirely new functions, as the current ones are exactly designed to work with MySQL.

I had already been working on a common implementation, but that's not simple. PostgreSQL's fulltext engine is wholly different from the MySQL one:

  • You need to specify a language for fulltext indexes and search because stemming on words is different for every language (a common ruleset like MySQL does will not work correctly for any non-english language)
  • There is no boolean or natural language mode because you are being handed all the options on how to do matching than just two mode
    • You can build very complex matching rules by combining text search operators
    • Some functions exist for common search behavior to build text search operators by a string
      • The plainto_tsquery function will transform a search string by and-ing the search terms: plainto_tsquery('english', 'The Fat Rats door') --> must match 'fat' & 'rat' & 'door'
      • The phraseto_tsquery function will transform a search string by needing the phrases to appear in the exact order one after another: phraseto_tsquery('english', 'The Fat Rats door') --> must match 'fat' <-> 'rat' <-> 'door'
      • The websearch_to_tsquery function is mostly like MySQL's boolean search mode, which allows positive/negative operators in the text. But contrary to MySQL the terms are combined with and instead of or: websearch_to_tsquery('english', 'fat +rat -door') --> must match 'fat' & 'rat' & !'door'

These are the most impactful differences. But still is not tackling the options PostgreSQL will provide. The article Fine Tuning Full Text Search with PostgreSQL 12 easily explains how many options you have to really fine-tune the fulltext search behavior of PostgreSQL in contrast to MySQL.


My advice would be to make the following changes to be able to add a very basic PostgreSQL implementation i could work on:

  • Remove the expanded and mode parameters from matchAgainst as they will only make sense for MySQL
    • They could be replaced with a single $options array which every implementation could interpret differently: ['expanded' => boolean, 'mode' => 'boolean' | 'natural'] could work for MySQL and ['language' => 'english', 'query' => 'websearch_to_tsquery'] and countless more options could work for example for PostgreSQL. That's the implementation I were working on.
  • Transforming multiple strings to a single one should be done by the database driver as PostgreSQL will need an entirely different one (imploding by space and not comma)

I am aware that supporting all PostgreSQL options for fulltext search are beyond the scope of Laravel. With my adviced changes i could add them one by one to tpetry/laravel-postgresql-enhanced.

@tpetry
Copy link
Contributor

tpetry commented Dec 22, 2021

Another question I have is the default mode for MySQL fulltext search by Laravel? Which mode will be used? It would be best aligning the default behavior for MySQL and PostgreSQL so they kind of behave similar. The results will not be exactly similar (search term combining differences in and and or logic).

I am not very experienced in MySQL's fulltext search and the documentation does not exactly describe how the default mode works. Is the default mode comparable to one of the PostgreSQL string functions i described? I guess making the default search behaviour the typical google one laravel database +postgresql would be the one everyone is expecting?

@driesvints driesvints marked this pull request as draft December 22, 2021 09:48
@driesvints
Copy link
Member Author

Placing this in draft while I talk to @tpetry on how to improve this PR.

@driesvints driesvints changed the title [8.x] Implement Full-Text Searches for MySQL [8.x] Implement Full-Text Search for MySQL Dec 22, 2021
@driesvints
Copy link
Member Author

We simplified the API to whereFulltext so we can implement this on different engines. The extra $options argument can be used to use engine specific features.

@driesvints driesvints changed the title [8.x] Implement Full-Text Search for MySQL [8.x] Implement Full-Text Search for MySQL & PostgreSQL Jan 3, 2022
@driesvints
Copy link
Member Author

Merged @tpetry's PostgreSQL PR into this one.

@otilor
Copy link

otilor commented Jan 3, 2022

Hi @driesvints, Redis also supports Fulltext search. I'd like to work on that, too. Checkout https://redis.com/redis-best-practices/indexing-patterns/full-text-search/

Could we discuss adding it to this PR?

@driesvints
Copy link
Member Author

driesvints commented Jan 3, 2022

@humaneguy since Redis isn't a direct database engine but a key/value store, I'd rather not mix up things in this PR.

@otilor
Copy link

otilor commented Jan 3, 2022

@driesvints Ok, great.

@taylorotwell
Copy link
Member

This is also missing orWhereFulltext.

@taylorotwell taylorotwell marked this pull request as ready for review January 5, 2022 20:38
@driesvints
Copy link
Member Author

@taylorotwell added orWhereFulltext.

@driesvints
Copy link
Member Author

I've adjusted the bindings to be back on the query builder. Please note that this now sets $language directly in the query.

A final alternative could be that we check the language option inside the query builder but then it could conflict with other grammars in the future.

@taylorotwell taylorotwell merged commit 26bfb14 into 8.x Jan 6, 2022
@taylorotwell taylorotwell deleted the match-against branch January 6, 2022 15:50
@jonnott
Copy link
Contributor

jonnott commented Jan 21, 2022

@driesvints @taylorotwell A common querying pattern with fulltext indexes is to use the MATCH .. AGAINST .. AS relevance in the SELECT clause, and then add WHERE relevance > 0.

e.g. as stated within https://www.cloudsavvyit.com/10172/how-to-use-full-text-searches-in-mysql/ ..

"When using MATCH ... AGAINST in a SELECT statement, you don’t need to repeat it in the WHERE clause. You could manually filter the results to include only records with a non-zero relevance score."

SELECT content, MATCH (content) AGAINST ('database engine') AS relevance FROM articles WHERE relevance > 0 ORDER BY relevance DESC

Is there any possibility of somehow making methods to create that kind of query a part of what's built-in to Eloquent?

I think otherwise for many uses-cases where query results need to be ordered by relevance, we'd end up still doing something like:

$query = Post
    ::selectRaw("*,MATCH (words) AGAINST (? in natural language mode) as relevance", [$search_words])
    ->having('relevance', '>', 0)
    ->orderByDesc('relevance');

Some like an addFullTextSelect() method maybe?

@driesvints
Copy link
Member Author

@jonnott I think it's best that you attempt a PR for that.

@tpetry
Copy link
Contributor

tpetry commented Jan 21, 2022

@jonnott you could start experimenting on a orderByFullText implementation. Keep me in the loop for PostgreSQL specifics.

@jonnott
Copy link
Contributor

jonnott commented Jan 23, 2022

@jonnott you could start experimenting on a orderByFullText implementation. Keep me in the loop for PostgreSQL specifics.

@tpetry I have a wip commit here https://github.com/jonnott/framework/commits/add-fulltext-select which could turn into a draft PR for this. I've added selectFullText() and addSelectFullText() methods. I have no clue on the PostGres side of things though..

@mshamaseen
Copy link

Great addition, but what can be sent in the $options argument? I can't find any documentation for that

@tpetry
Copy link
Contributor

tpetry commented Sep 28, 2023

Look at the changes of the MySQL grammar. My enhanced PostgreSQL driver also uses the options to add some PG specific stuff.

@mshamaseen
Copy link

@tpetry Thanks

So:
array{mode:string} for both MySql and Postgres
array{expanded:bool} for MySql
array{Language:string} for Postgres

It would be great if that could be added to the PHPDoc on the whereFullText method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.