Clean up orphaned real-time records after reindexing. #1192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a new internal attribute -
sphinx_updated_at
- to real-time indices, and sets that with the current timestamp whenever that record is inserted or updated. And then, when a reindex happens, any record that has not changed since the reindexing began is removed - as it's not in the known dataset. Likely, it was deleted but the callbacks failed for some reason.The advantage of this is that running
ts:index
will remove any records that are no longer indexed - thus keeping index sizes to what they should be, avoiding any bloat that may crop up from situations where records haven't been removed (bulk deletions, for example) - and this is done without the need for deleting all records to begin with (the behaviour ofts:rebuild
).This is currently only enabled with
real_time_tidy
is set to true for the appropriate environment inconfig/thinking_sphinx.yml
. I see it defaulting to true in the near future, but that might not be until v6.0.