Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Switch back from GIST to GIN #2753

Closed
ara4n opened this issue Jan 4, 2018 · 4 comments
Closed

Switch back from GIST to GIN #2753

ara4n opened this issue Jan 4, 2018 · 4 comments

Comments

@ara4n
Copy link
Member

ara4n commented Jan 4, 2018

Currently we create FTS indexes as GIN in the schema but then convert to GIST as a background job in the main search storage (why isn't this a schema delta script?). We want to try GIN again on matrix.org however, as with faster IO perhaps it will perform better for inserts. If it doesn't, there are some tricks we can try short of upgrading postgres to 9.6 or later

@ara4n
Copy link
Member Author

ara4n commented Jan 4, 2018

Very useful intel about this over at #freenode_postgresql:matrix.org from @RhodiumToad - thanks!

https://riot.im/develop/#/room/#freenode_#postgresql:matrix.org/$15150804542648117JqeFr:matrix.org

then it's worth trying GIN again, but either: turn off fast insert on the index, or set work_mem to a small value around the insert query
the second is less convenient but likely faster
you want work_mem during a GIN index insert to be no more than a few hundred kbytes
the detailed explanation is this: with fast inserts enabled, new values for a GIN index aren't immediately inserted into the index proper but into a "pending list"
when the pending list reaches a threshold size, the whole list is inserted in a batch
the problem is that older pg versions used work_mem for that threshold, which was a very bad idea since work_mem on modern systems is typically very large, leading to long pending lists that must be searched linearly
(and the one insert that hits the threshold has to spend a long time processing the list)
newer pg versions have a separate gin_pending_list_limit setting
also, vacuum processes the pending list, so depending on the workload you can arrange to do that
work_mem can be set per statement or at any level
per-transaction is particularly easy: begin; set local work_mem = ...; do stuff; commit; -- old setting is restored on commit
per-connection it can be set in the connection string or using SET

@ara4n
Copy link
Member Author

ara4n commented Jan 5, 2018

we just did this on matrix.org and it sped up search from 200,000ms to 6.71ms for a rare item (an eth account which appeared in 3 different msgs). so i think we should back out the bg conversion job and restore everyone's synapses to GINs...

@ara4n
Copy link
Member Author

ara4n commented Jan 9, 2018

We're seeing some massive (>1 minute) pauses every N hours with GIN, as predicted above:

2018-01-08 16:44:32.874 GMT [matrix] LOG:  duration: 84165.911 ms  statement: INSERT INTO event_search (event_id, room_id, key, vector, stream_ordering, origin_server_ts) VALUES ('$15154297881052308GgHpJ:matrix.org','!yBVfMfMeyivGcJPbvS:matrix.org','content.body'
2018-01-08 16:44:32.874 GMT [matrix] LOG:  duration: 82618.283 ms  statement: INSERT INTO event_search (event_id, room_id, key, vector, stream_ordering, origin_server_ts) VALUES ('$15154297881052316hpNnS:matrix.org','!gMDwyoyuEhzHNjElAw:matrix.org','content.body'
2018-01-08 16:44:32.874 GMT [matrix] LOG:  duration: 23046.208 ms  statement: INSERT INTO event_search (event_id, room_id, key, vector, stream_ordering, origin_server_ts) VALUES ('$15154297891052324ZUwTS:matrix.org','!pBjZyyNmXwLCBPUAlE:matrix.org','content.body'
2018-01-08 16:44:32.874 GMT [matrix] LOG:  duration: 82105.092 ms  statement: INSERT INTO event_search (event_id, room_id, key, vector, stream_ordering, origin_server_ts) VALUES ('$15154297891052318FwGMB:matrix.org','!PVQRUJMgyQViHjyVgl:matrix.org','content.body'

Looks like we need to set work_mem to a small value at the point of doing the INSER INTO event_search as suggested.

@richvdh
Copy link
Member

richvdh commented Nov 27, 2018

this got fixed by #2769, though as #4219 notes, we are still having problems.

@richvdh richvdh closed this as completed Nov 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants