engineccl: failure in BenchmarkTimeBoundIterate #110299

yuzefovich · 2023-09-09T02:02:53Z

On seemingly unrelated PR (here):

Bench //pkg/ccl/storageccl/engineccl:engineccl_test] --- FAIL: BenchmarkTimeBoundIterate/LoadFactor=0.50/TimeBoundIterator
[01:17:40][Bench //pkg/ccl/storageccl/engineccl:engineccl_test]     test_fixtures.go:90: using existing fixture "mvcc_data_vv1000023.1-22_100000_100_10_512" in "/home/roach/.cache/crdb-test-fixtures/mvcc_data_vv1000023.1-22_100000_100_10_512"
[01:17:40][Bench //pkg/ccl/storageccl/engineccl:engineccl_test]     bench_test.go:167: expected at least 50000 keys, but got 49960

Jira issue: CRDB-31361

The text was updated successfully, but these errors were encountered:

DrewKimball · 2023-09-11T15:43:40Z

This failed here as well.

DrewKimball · 2023-09-12T07:37:00Z

@jbowens this is flaking pretty frequently. Do you think we'll have a fix soon? I can put up a skip if need be.

rafiss · 2023-09-12T15:06:29Z

One more: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Ci_Tests_Bench/11735137?buildTab=log&focusLine=4201&logView=flowAware&linesState=563.568.3394

This benchmark's assertions have recently become flaky. Epic: none Informs: cockroachdb#110299 Release note: none

jbowens · 2023-09-12T15:57:52Z

Skipping in #110451.

I was unable to reproduce this on my gceworker which had previously constructed fixtures for this test. Once I removed the existing fixtures forcing regeneration, it started failing readily.

110150: cli: fix debug pebble commands on encrypted stores r=RaduBerinde a=RaduBerinde Currently the debug pebble commands only work correctly on an encrypted store if the encrypted store's path is `cockroach-data` or the store directory is passed using `--store` (in addition to being passed to the pebble subcommand itself). What's worse, knowledge of this subtle fact was lost among team members. The root cause is that we are trying to resolve encryption options using the server config. The difficulty is that there are a bunch of different commands and there is no unified way to obtain the store directory of interest To fix this, we create `autoDecryptFS`. This is a `vfs.FS` implementation which is able to automatically detect encrypted paths and use the correct unencrypted FS. It does this by having a list of known encrypted stores (the ones in the `--enterprise-encryption` flag), and looking for any of these paths as ancestors of any path in an operation. This new implementation replaces `swappableFS` and `absoluteFS`. We also improve the error message when we try to open an encrypted store without setting up the key correctly. Fixes: #110121 Release note (bug fix): `cockroach debug pebble` commands now work correctly with encrypted stores which don't use the default `cockroach-data` path without having to also pass `--store`. 110173: sql: optimize persistedsqlstats flush size check r=j82w a=j82w Problem: The `persistedsqlstats` size check to make sure the table is not 1.5x the max size is done on every flush which is done on every node every 10 minutes by default. This can cause serialization issues as it is over the entire table. The check is unnecessary most of the time, because it should only fail if the compaction job is failing. Solution: 1. Reduce the check interval to only be done once an hour by default, and make it configurable. 2. The system table is split in to 8 shards. Instead of checking the entire table count limit it to only one shard. This reduces the scope of the check and reduces the chance of serialization issues. This was preivously reverted because of a flakey test because the size check is only done on a single shard. The tests are updated to increase the limit and the number of statements to make sure every shard has data. Fixes: #109619 Release note (sql change): The persistedsqlstats table max size check is now done once an hour instead of every 10 minutes. This reduces the risk of serialization errors on the statistics tables. 110264: c2c: add region constraints replication test r=msbutler a=msbutler This patch adds a test that ensures that a replicating tenant's regional constraints are obeyed in the destination cluster. This test serves as an end to end test of the span config replication work tracked in #106823. This patch also sets the following source system tenant cluster settings in the c2c e2e framework: kv.rangefeed.closed_timestamp_refresh_interval: 200ms, kv.closed_timestamp.side_transport_interval: 50 ms. CDC e2e tests also set these cluster settings. Informs #109059 Release note: None 110334: roachtest: ensure c2c/shutdown tests start destination tenant with online node r=stevendanna a=msbutler An earlier patch #110033 introduced a change that starts the destination tenant from any destination node, but did not consider if that node was shut down. If the driver attempts to connect to the shut down node, the roachtest fails. This patch ensures that the tenant is started on a node that will be online. Fixes #110317 Release note: None 110364: upgrade: remove buggy TTL repair r=rafiss a=ecwall Fixes #110363 The TTL descriptor repair in FirstUpgradeFromReleasePrecondition incorrectly removes TTL fields from table descriptors after incorrectly comparing the table descriptor's TTL job schedule ID to a set of job IDs. This change removes the repair until tests are properly added. Release note (bug fix): Remove buggy TTL descriptor repair. Previously, upgrading from 22.2.X to 23.1.9 incorrectly removed TTL storage params from tables (visible via `SHOW CREATE TABLE <ttl-table>;`) while attempting to repair table descriptors. This resulted in the node that attempts to run the TTL job crashing due to a panic caused by the missing TTL storage params. Clusters currently on 22.2.X should NOT be upgraded to 23.1.9 and should be upgraded to 23.1.10 or later directly. 110431: workflows: stale.yml: update action version r=RaduBerinde a=RaduBerinde The stale bot closes issues as "completed" instead of "not planned". More recent versions have added a configuration setting for this, and it defaults to "not planned". This commit updates the action to the latest version. Epic: none Release note: None 110451: engineccl: skip BenchmarkTimeBoundIterate r=RaduBerinde a=jbowens This benchmark's assertions have recently become flaky. Epic: none Informs: #110299 Release note: none Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: RaduBerinde <[email protected]> Co-authored-by: Jackson Owens <[email protected]>

107966: rfc: system.region_liveness r=JeffSwenson a=JeffSwenson Create draft RFC for the `system.region_liveness` table. Informs: CC-24173 Release note: none 110608: engineccl: deflake BenchmarkTimeBoundIterate r=RaduBerinde a=jbowens This benchmark has always contained a bug. When constructing the data set, each batch is created with keys with MVCC timestamps in the range [t,t+d). Previously, construction of the database state used one-indexing, assigning the first batch a MVCC time window of `[d,d+d]`. The time-bound iteration benchmarks assumed zero-indexing, setting a MVCC incremental iterator time window of `[0,n*d]`. This meant that the iterator might not see the sstable corresponding to the final batch, depending on whether it contained any keys with the lowest possible timestamp for the corresponding batch. It's still unclear to me why this began to fail with some frequency only now. Epic: none Close #110299. Release note: none 110841: testutils: Use correct context in ExpectErrWithTimeout r=miretskiy a=miretskiy Use correct context in ExpectErrWithTimeout. Epic: None Release note: None 110845: testutils: Remove accidently added empty package r=miretskiy a=miretskiy Remove accidently added empty package Epic: None Release note: None Co-authored-by: Jeff <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>

yuzefovich added C-test-failure Broken test (automatically or manually discovered). T-storage Storage Team labels Sep 9, 2023

rafiss added branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 12, 2023

jbowens added a commit to jbowens/cockroach that referenced this issue Sep 12, 2023

engineccl: skip BenchmarkTimeBoundIterate

b7b111c

This benchmark's assertions have recently become flaky. Epic: none Informs: cockroachdb#110299 Release note: none

jbowens mentioned this issue Sep 12, 2023

engineccl: skip BenchmarkTimeBoundIterate #110451

Merged

yuzefovich added the skipped-test label Sep 13, 2023

jbowens mentioned this issue Sep 13, 2023

engineccl: deflake BenchmarkTimeBoundIterate #110608

Merged

jbowens removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 15, 2023

craig bot closed this as completed in fe07ce6 Sep 18, 2023

jbowens added this to [Deprecated] Storage Jun 4, 2024

jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engineccl: failure in BenchmarkTimeBoundIterate #110299

engineccl: failure in BenchmarkTimeBoundIterate #110299

yuzefovich commented Sep 9, 2023 •

edited by cockroach-jira-scripts

Loading

DrewKimball commented Sep 11, 2023

DrewKimball commented Sep 12, 2023

rafiss commented Sep 12, 2023

jbowens commented Sep 12, 2023

engineccl: failure in BenchmarkTimeBoundIterate #110299

engineccl: failure in BenchmarkTimeBoundIterate #110299

Comments

yuzefovich commented Sep 9, 2023 • edited by cockroach-jira-scripts Loading

DrewKimball commented Sep 11, 2023

DrewKimball commented Sep 12, 2023

rafiss commented Sep 12, 2023

jbowens commented Sep 12, 2023

yuzefovich commented Sep 9, 2023 •

edited by cockroach-jira-scripts

Loading