-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import/tpcc/warehouses=4000/geo failed #71050
Comments
node 4 was oomkilled. What's interesting is prior to being killed there were some messages from store_send saying that some sst ingested has been delayed by 35 minutes, and for almost an hour prior to that the logs have intermittent warnings from KV of unavailable ranges. Something appears to have gone wrong at kv/storage level, and enough requests queued up that it was OOMkilled? |
This cluster looks sad almost from the get-go: just two minutes into the run, we start seeing see replica_write.go's log line about |
Going to look at this one tomorrow morning, moving to GA while I do. |
Looking at this with
Memory usage is interesting, looks like it's blowing up only towards the end: Whatever is going on on n4 is also happening to n6 (though n6 survives). There's also a spike in disk read iops: The overload dashboard doesn't show anything I would consider of interest. The import is still going on at that point (all the results are
so I'm not sure what would've triggered this behavior change. KV is indeed unhappy throughout the import, but I suspect we will see that every time we run this test (?). |
Well well well, this is interesting. Thanks to the heap profiler (🎖️ to @knz) we have heap profiles leading up right to the crash, inspected via variations on **edit: this is on n6, not n4 - my bad, still interesting **
We have a burst of profiles at 11:06 and a previous burst at 11:04. They all show the same thing and look basically identical to the last one (11:06:44): |
Scouring the logs for interesting things happening just after 11:05:30 (right before spike). This is interesting:
This is very very interesting and obviously relevant:
Uhm, 2493464442 bytes is >2.3GiB? First of all that's crazy and second of all, how did this get past the cluster setting
Ok, I have an idea: cockroach/pkg/kv/kvserver/raft_transport.go Lines 489 to 506 in 4304289
This code definitely lets us build batches of any size. What's worse, once it does that, it also doesn't properly release the memory (like 506 doesn't zero out the slice). That definitely seems pretty bad. I'll whip up a PR. |
Another few choice quotes from the logs:
34 minutes spent queuing on the semaphore behind this cluster setting: cockroach/pkg/kv/kvserver/store.go Lines 126 to 132 in 5b38e1e
Makes you wonder. |
I was wondering why n6's metrics didn't recover, but now I am realizing it also went belly-up. However, it did so while the debug.zip was pulled. The last log message is dated 11:06:55 so it really didn't last much longer than n4. Had it managed to handle the large raft command batch, my next question would've been why its RSS didn't come down. But it crashed, so the question is moot and the recv-side code that the large allocation would've gone out of scope and be gc'ed. |
Standard Go error - we were trying to avoid allocations by recycling a slice but weren't zeroing it out before. The occasional long slice that would reference a ton of memory would then effectively keep that large amount of memory alive forever. Touches cockroachdb#71050. Release note: None
In cockroachdb#71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the max_size chunking on the batching step before sending messages as well. Closes cockroachdb#71050. Release note: None
roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ 82e8782453b1cf14460d93b6bf8328a7b2964575:
Same failure on other branches
|
Haven't investigated the above failure (about to head out on vacation), but it could well be an instance of the same issue, which would then be fixed by #71132 as well. However, a question remains whether we're now more likely to create lots of large proposals than previously, and if that's something that should be looked into. |
roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ 3cf3b0ea3b08cb24e4d6b84c6c237b856ce6b411:
Same failure on other branches
|
Standard Go error - we were trying to avoid allocations by recycling a slice but weren't zeroing it out before. The occasional long slice that would reference a ton of memory would then effectively keep that large amount of memory alive forever. Touches cockroachdb#71050. Release note: None
…g batching In cockroachdb#71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the max-size chunking on the batching step before sending messages as well. Closes cockroachdb#71050. Release note: None
… msg batching In cockroachdb#71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the traget-size chunking on the batching step before sending messages as well. Closes cockroachdb#71050. Release note: None
71132: kvserver: apply a limit to outgoing raft msg batching r=erikgrinaker a=tbg In #71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the `max_size` setting) were merged into a giant blob. As of this commit, we apply the `max_size` chunking on the batching step before sending messages as well. Closes #71050. Release note: None Co-authored-by: Tobias Grieger <[email protected]>
Standard Go error - we were trying to avoid allocations by recycling a slice but weren't zeroing it out before. The occasional long slice that would reference a ton of memory would then effectively keep that large amount of memory alive forever. Touches #71050. Release note: None
… msg batching In #71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the target-size chunking on the batching step before sending messages as well. Closes #71050. Release note: None
Standard Go error - we were trying to avoid allocations by recycling a slice but weren't zeroing it out before. The occasional long slice that would reference a ton of memory would then effectively keep that large amount of memory alive forever. Touches #71050. Release note: None
… msg batching In #71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the target-size chunking on the batching step before sending messages as well. Closes #71050. Release note: None
This was closed prematurely, backport is still open: |
Standard Go error - we were trying to avoid allocations by recycling a slice but weren't zeroing it out before. The occasional long slice that would reference a ton of memory would then effectively keep that large amount of memory alive forever. Touches #71050. Release note: None
… msg batching In #71050, we saw evidence of very large (2.3+GiB) Raft messages being sent on the stream, which overwhelmed both the sender and the receiver. Raft messages are batched up before sending and so what must have happened here is that a large number of reasonably-sized messages (up to 64MiB in this case due to the max_size setting) were merged into a giant blob. As of this commit, we apply the target-size chunking on the batching step before sending messages as well. Closes #71050. Release note: None
backport merged, closing. |
roachtest.import/tpcc/warehouses=4000/geo failed with artifacts on release-21.2 @ d1231cff60125b397ccce6c79c9aeea771cdcca4:
Reproduce
See: roachtest README
Same failure on other branches
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: