Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for sharding in lifecycle callbacks #1166

Closed
lunaru opened this issue Jun 4, 2020 · 9 comments
Closed

Better support for sharding in lifecycle callbacks #1166

lunaru opened this issue Jun 4, 2020 · 9 comments

Comments

@lunaru
Copy link
Contributor

lunaru commented Jun 4, 2020

@pat I wanted to create this issue to start this discussion to make sure we've exhausted all of the documented solutions.

Right now, we have a model with enough records that a single index is not practical, so instead, we have something like this:

N.times do |i|
  ThinkingSphinx::Index.define :document, name: "document_#{i}", :with => :active_record, :delta => ThinkingSphinx::Deltas::SidekiqDelta do
   
    where "MOD(...) = #{i}"
  end
end

However, this as the problem that when a record is modified, it ends up creating N delta jobs, one for each index. If N is large enough, this becomes a problem.

It looks like the offending code is here:

index.delta_processor.delete index, instance

So the question is: Is there a better way to do sharding that avoids this landmine? Or conversely, if this is the best way to shard, then would a PR to help solve this problem be welcome?

@pat
Copy link
Owner

pat commented Jun 5, 2020

Hi @lunaru

This is a great point, and you're right that the existing solutions aren't ideal. I was thinking about whether a custom IndexSet class could help, or a subclass of ThinkingSphinx::Deltas::SidekiqDelta, but neither is going to provide the ability to filter indices based on the current model instance, just the model.

If there are ideas for PRs, please do share, because I'd love your experience (especially given I'm not using a sharding approach for any of my apps). I'm currently wondering if changing IndexSet filtering to accept an instance (much like it accepts a model class, see line 46 of that delta callback file), and then that would allow picking and choosing specific shards based on that instance. Other ideas are welcome too!

Also: I guess another way is to use real-time indices rather than deltas, and then you've got a touch more control over the callbacks in the model, and can invoke them a little more specifically. But that is a significant change if you're comfortable with SQL-backed indices.

@lunaru
Copy link
Contributor Author

lunaru commented Jun 5, 2020

@pat thanks for sharing your thoughts. The idea that we have to solve this would be to allow the model to optionally implement ts_index_names.

Each of the Callback classes would then check instance.respond_to? :ts_index_names and if it does, it will select the indices against those names to narrow the indices that get update/delete/etc calls.

If you think this approach is agreeable, we'll take a stab at the PR ASAP.

@pat
Copy link
Owner

pat commented Jun 22, 2020

Hi @lunaru - sorry for the delay in getting back to you, but I've just merged some changes into the develop branch that should help solve this issue: I've reworked IndexSet so it accepts specific instances, and that's what the callbacks pass through. Which means that something like the following should help:

class IndexSet < ThinkingSphinx::IndexSet
  private

  def indices
    return super if instances.empty?

    super.select do |index|
      # this block below will return all core and delta indices for the
      # instance's model, filtered by the shard id for the instance.
      instances.any? { |instance| index.name[/_#{instance.shard_id}_$/] }
    end
  end
end

# This line would go in an initialiser:
ThinkingSphinx::Configuration.instance.index_set_class = IndexSet

The shard_id method in the example could be replaced with your MOD logic, and thus returns the integer of the index an instance is tied to. If you'd like to give this a shot and confirm if it works for you, that'd be great! Then I can look into getting a new gem release published.

@lunaru
Copy link
Contributor Author

lunaru commented Jun 22, 2020

@pat awesome, we’ll give this a try this week (hopefully Monday) and let you know. I take it that your refactor is already live on the master branch and we’d just need to pin our gemfile to master to use it?

@pat
Copy link
Owner

pat commented Jun 22, 2020

master is for released gem versions, so it's only in develop for now, but that's considered stable so there shouldn't be any issues using that branch as the source.

@lunaru
Copy link
Contributor Author

lunaru commented Jun 22, 2020

@pat Good news: It looks like this works brilliantly. The only difference with the index_set_class call had to be done differently because it looks like the configuration was getting wiped upon initialization. Here's how we ended up doing it:

class ShardedIndexSet < ThinkingSphinx::IndexSet
  private

  def indices
    return super if instances.empty?

    super.select do |index|
      # this block below will return all core and delta indices for the
      # instance's model, filtered by the shard id for the instance.
      instances.any? { |instance| !instance.respond_to?(:shard_id) || instance.shard_id.blank? || index.name[/_#{instance.shard_id}_/] }
    end
  end
end

# this needs to be set after initializers for the thinking-sphinx gem are run
Rails.application.config.to_prepare do
  ThinkingSphinx::Configuration.instance.index_set_class = ShardedIndexSet
end

Specifically Rails.application.config.to_prepare runs late enough to let us override the default index set class.

@lunaru
Copy link
Contributor Author

lunaru commented Jun 22, 2020

I'll go ahead and close this issue and await the next release version

@lunaru lunaru closed this as completed Jun 22, 2020
@pat
Copy link
Owner

pat commented Jun 23, 2020

Thanks for the confirmation, and the note about the initialiser. I'll try to get the new gem release out soon!

@pat
Copy link
Owner

pat commented Jul 20, 2020

I know I've already noted this in #1173, but wanted also to mention that the newly released v5.0 includes the change discussed here with instances being available in custom IndexSet subclasses :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants