-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better support for sharding in lifecycle callbacks #1166
Comments
Hi @lunaru This is a great point, and you're right that the existing solutions aren't ideal. I was thinking about whether a custom IndexSet class could help, or a subclass of If there are ideas for PRs, please do share, because I'd love your experience (especially given I'm not using a sharding approach for any of my apps). I'm currently wondering if changing IndexSet filtering to accept an instance (much like it accepts a model class, see line 46 of that delta callback file), and then that would allow picking and choosing specific shards based on that instance. Other ideas are welcome too! Also: I guess another way is to use real-time indices rather than deltas, and then you've got a touch more control over the callbacks in the model, and can invoke them a little more specifically. But that is a significant change if you're comfortable with SQL-backed indices. |
@pat thanks for sharing your thoughts. The idea that we have to solve this would be to allow the model to optionally implement Each of the Callback classes would then check If you think this approach is agreeable, we'll take a stab at the PR ASAP. |
Hi @lunaru - sorry for the delay in getting back to you, but I've just merged some changes into the class IndexSet < ThinkingSphinx::IndexSet
private
def indices
return super if instances.empty?
super.select do |index|
# this block below will return all core and delta indices for the
# instance's model, filtered by the shard id for the instance.
instances.any? { |instance| index.name[/_#{instance.shard_id}_$/] }
end
end
end
# This line would go in an initialiser:
ThinkingSphinx::Configuration.instance.index_set_class = IndexSet The |
@pat awesome, we’ll give this a try this week (hopefully Monday) and let you know. I take it that your refactor is already live on the |
|
@pat Good news: It looks like this works brilliantly. The only difference with the
Specifically |
I'll go ahead and close this issue and await the next release version |
Thanks for the confirmation, and the note about the initialiser. I'll try to get the new gem release out soon! |
I know I've already noted this in #1173, but wanted also to mention that the newly released v5.0 includes the change discussed here with |
@pat I wanted to create this issue to start this discussion to make sure we've exhausted all of the documented solutions.
Right now, we have a model with enough records that a single index is not practical, so instead, we have something like this:
However, this as the problem that when a record is modified, it ends up creating N delta jobs, one for each index. If N is large enough, this becomes a problem.
It looks like the offending code is here:
thinking-sphinx/lib/thinking_sphinx/active_record/callbacks/delta_callbacks.rb
Line 16 in 1adcf81
So the question is: Is there a better way to do sharding that avoids this landmine? Or conversely, if this is the best way to shard, then would a PR to help solve this problem be welcome?
The text was updated successfully, but these errors were encountered: