-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning for Solr to improve indexing latency #529
Conversation
@defect @roymarantz RFR again? |
// TODO(gabe): if this log is indexed (serialized first) before solr is | ||
// updated with the asset document, this will throw! This can happen when | ||
// creating a new asset then immediately performing some attribute sets | ||
// which create logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue/ticket to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @evanelias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this could break things if we create a config asset and then immediately perform sets on it? I think jetpants may do that in some cases but can't recall by memory. What exactly would break / what are the implications / is this a new problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evanelias this isn't new; it's been lurking since inception. Implications are that the tag set will fall (can't find asset to modify in solr index) until solr populates. Not cataclysmic, but definitely not a great consistency model to expose
def defaultMaxConnectionsPerHost = getInt("defaultMaxConnectionsPerHost", 100) | ||
def assetBatchUpdateWindow = getInt("assetBatchUpdateWindowMs", 10) milliseconds | ||
def assetBatchUpdateWindow = getInt("assetBatchUpdateWindowMs", 30) milliseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did these magic numbers come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same place the first numbers came from (the ether). its discussed in the description of the PR why these numbers were chosen as a default
@@ -32,8 +32,9 @@ object SolrConfig extends Configurable { | |||
def socketTimeout = getInt("socketTimeout", 1000) | |||
def connectionTimeout = getInt("connectionTimeout", 5000) | |||
def maxTotalConnections = getInt("maxTotalConnections", 100) | |||
def commitWithin = getInt("commitWithinMs", 50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any docs explaining what commitWithinMs is about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, its discussed in bullet 3 in the description of the PR, but also here
fuckingJava.size, | ||
serializer.docType.name.toLowerCase, | ||
SolrConfig.commitWithin)) | ||
// dont explicitly hard commit, let solr figure it out and make docs available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any downsides to doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because a hard commit isnt actually synchronous because this is happening via actor in background thread
app/collins/solr/SolrUpdater.scala
Outdated
val scheduled = new AtomicBoolean(false) | ||
case object Reindex | ||
|
||
// TODO(gabe): perform batch updating on asset logs as well to prevent as much churn in solr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "asset logs" the same as log: AssetLog
If so is this todo note needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unneeded now
👍 |
@michaeljs1990 wanna give this a once-over before i land? |
also @defect, 👍 ? |
👍 on this. Curious about the jetpants issue mentioned in the summary though but didn't see any open issues about it on that repo. |
@michaeljs1990 if you set @evanelias @roymarantz @defect @bobpattersonjr ill land this in a day unless i hear any vocal objections? |
I just updated the container used in some jetpants integration testing, and wanted to report this made a direct improvement. To improve read after write consistency we've added a read-loop after each write, busy-waiting until we can read back what we just wrote. Even with this fix in place, sometimes we would update the secondary role on an asset from A to B and then search for assets where the secondary role is A, and the asset would be returned. After updating the container, under these circumstances with the RaW busy loop, the asset is no longer returned. Thank you! |
@grahamc glad to hear it! This should be a good first (faux) step towards a truly consistent RW interface :) |
Oh, I cc'ed @evanelias but it seems like @grahamc is already basking in the benefits of this :) |
Fixes #526
This PR has a bunch of fixes for how solr behaves to improve indexing latency for assets after write.
commit()
calls to the Solr server, which reduces the load on solr, given the default 10ms window of batching asset updates (only a handful of assets are reindexed at a time at this interval during mass tag updates).assetBatchUpdateWindowMs
from 10ms to 30ms to catch more assets in a single batch to add to solr.commitWithinMs
tunable (default 50ms) which lets solr decide on the best time to commit updates to the index, instead of us forcing commits (of possibly very small batches of docs). This means that an asset will be available for search inassetBatchUpdateWindowMs+commitWithinMs=80ms default
from update time.These tunables can be twiddled to balance throughput vs search after write latency for new attributes (i.e.
collins.set_attribute!(tag, :foo, :bar)
,collins.find(foo: :bar)
). Using the softAutoCommit feature will improve latency (but not entirely remove it!) when updating assets, then immediately searching for them.@evanelias @bobpattersonjr @defect @roymarantz would love to hear any suggestions! This isnt a perfect fix for the jetpants issue, but its a start.