-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: schemachange/index/tpcc/w=100 failed #36024
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/b5768aecd39461ab9a54e2e7db059a3fe8b00459 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1191957&tab=buildLog
|
#36011 needs to get into 19.1 |
SHA: https://github.com/cockroachdb/cockroach/commits/d03a34e92d2ee558fb6aedb0709b733a1fab97f4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1207666&tab=buildLog
|
an OOO failure on node 1 |
SHA: https://github.com/cockroachdb/cockroach/commits/1a5eabad4511a3371a6b2809d2bfc29e8aff66a6 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1224702&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/c03c307b1a4100a8a1edd2804bdfaf3903097756 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1229036&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/dd7c697e986fc528da7b12c6c10dcce7f64a486c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1252804&tab=buildLog
|
heap profile on the failure |
@jordanlewis any chance you can take a look at this. It looks like the kv fetcher is using a lot of memory and I do see a large number of distsql flows. Is this to be expected? |
The most recent failure looks similar to #37108 (comment), where the cluster is already in a bad state in the tpcc ramp period before any schema changes have run. The extremely high latency for the tpcc queries is a symptom:
There are also lots of warnings in the logs about everything being slow: heartbeats, raft processing, etc. |
Also, the heap profile Vivek posted looks extremely similar to the heap profile posted for that same issue, in #37108 (comment). |
36744: storage/rangefeed: use fine-grained locking around Processor, add metrics r=nvanbenschoten a=nvanbenschoten It is critical for correctness that operations like providing the processor with logical ops or informing the processor of closed timestamp updates be properly synchronized via the raftMu. This ensures that events published on the processors eventC accurately reflect the order of events in the Raft log. However, it is not critical that lifecycle events around starting a Rangefeed processor, stopping a Rangefeed processor, and registering with a Rangefeed processor be synchronized with Raft application. This change exploits this to break up the locking around rangefeed.Processor. Using more fine-grained locking opens up the opportunity to interact with the rangefeed Processor without needing to synchronize with the Range's raft application, which can be very expensive. The next commit uses this improvement to add a new `rangefeed_registrations` metric to `RangeInfo`. 36999: roachtest: add schemachange/bulkingest test r=lucy-zhang a=lucy-zhang Add a test to index the random `payload` column for the `bulkingest` workload, to test the index backfiller for an index on randomly ordered values relative to the primary key. Release note: None 37169: opt: don't use delegateQuery for ShowZoneConfig r=RaduBerinde a=RaduBerinde Part of a larger work item to remove `delegateQuery`, as it doesn't work with the optimizer. I could not use the new delegate infrastructure because it doesn't support hiding columns returned by the delegated query. Release note: None 37182: roachtest: provision more cpu for schemachange tpcc tests r=vivekmenezes a=vivekmenezes related to #36094 #36024 #36321 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Lucy Zhang <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Vivek Menezes <[email protected]>
SHA: https://github.com/cockroachdb/cockroach/commits/dcd4cc5e37ebbcebbbf0f01670811b176a58bf89 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1288281&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/7b2651400b2003d0a381cba9dbfc0b7bc0dfee00 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1293898&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/923a3b2a6f4a6492883141092280d1041de1381a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295056&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/cab299a0ef983f8b4ffe5d724e44587d9665d3a3 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295811&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/58c567a325056033b326cb9c4ed9ba490e8956da Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1296592&tab=buildLog
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Previous two issues addressed by #37701. |
SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1361643&tab=buildLog
|
Duplicate of #36094 (comment). |
SHA: https://github.com/cockroachdb/cockroach/commits/90841a6559df9d9a4724e1d30490951bbdb811b4 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364443&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/537767ac9daa52b0026bb957d7010e3b88b61071 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364821&tab=buildLog
|
For both of the above. |
SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog
|
Fixes cockroachdb#36024. Fixes cockroachdb#36094. 8b5bafb ensured that all transaction state was propagated by DistSender on errors. In doing so, it touched that fact that DistSender drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024 and cockroachdb#36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in DistSender. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the DistSender equivalent of 574e805, which is why they now share code. Release note: None
Fixes cockroachdb#36024. Fixes cockroachdb#36094. 8b5bafb ensured that all transaction state was propagated by DistSender on errors. In doing so, it touched that fact that DistSender drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024 and cockroachdb#36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in DistSender. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the DistSender equivalent of 574e805, which is why they now share code. Release note: None
38579: kv: prioritize severe errors when merging partial batches in DistSender r=andreimatei a=nvanbenschoten Fixes #36024. Fixes #36094. 8b5bafb ensured that all transaction state was propagated by `DistSender` on errors. In doing so, it touched that fact that `DistSender` drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`). This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the `TxnCoordSender`, triggering panics like the ones we see in #36024 and #36094. This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in `DistSender`. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the `DistSender` equivalent of 574e805, which is why they now share code. Co-authored-by: Nathan VanBenschoten <[email protected]>
SHA: https://github.com/cockroachdb/cockroach/commits/dfa23c01e4ea39b19ca8b2e5c8a4e7cf9b9445f4
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1189954&tab=buildLog
The text was updated successfully, but these errors were encountered: