sql: bulk index backfill getting stuck #34212

vivekmenezes · 2019-01-24T03:13:44Z

make stress PKG=./pkg/sql TESTS=TestRaceWithBackfill

gets stuck after patching https://github.com/cockroachdb/cockroach/compare/master...vivekmenezes:schemachanger?expand=1 on master

vivekmenezes · 2019-01-24T14:05:34Z

This problem is occurring because the test uses a small chunk size = 100 thereby triggering a large number of SSTs = 40 all written at once. We know that rocksdb locks ups when this happens. Increasing the chunk size fixes the problem.

Leaving this issue open as the main issue tracking throttling of SST writes.

tbg · 2019-02-11T15:06:21Z

I used

make stress PKG=./pkg/sql TESTS=TestRaceWithBackfill STRESSFLAGS='-p 2 -timeout 1m -kill=false' TESTTIMEOUT=30s

so that stress would print PIDs instead of killing the process. It did so ~1min into the run:

3 runs so far, 0 failures, over 50s
3 runs so far, 0 failures, over 55s
3 runs so far, 0 failures, over 1m0s
process 52852 timed out
process 52862 timed out
3 runs so far, 0 failures, over 1m5s
3 runs so far, 0 failures, over 1m10s
3 runs so far, 0 failures, over 1m15s

I tried to dlv attach <pid> but couldn't get it to work:

dlv attach 52862
could not attach to pid 52862: decoding dwarf section info at offset 0x0: too short

(I'm on latest dlv).

Anyway, I tried the next best thing: kill -ABRT <pid>

Dump attached. It's funny, the dump shows the panic message coming from a 30s timeout as we wanted including a stack trace, but then it hung until I sent the ABRT ~7min later (so there's a second stack trace). Attached both, going to take a look at them now.

Timeout
ABRT

tbg · 2019-02-11T15:06:57Z

Oh, and I also had to put export COCKROACH_LOG_MAX_SYNC_DURATION=24h COCKROACH_ENGINE_MAX_SYNC_DURATION=24h in my env for otherwise slow I/O would give me spurious errors.

tbg · 2019-02-11T15:14:53Z

Maybe we can blame this on Rocks somehow? From the aborted dump:

goroutine 277 [syscall, 7 minutes]:
github.com/cockroachdb/cockroach/pkg/storage/engine._Cfunc_DBIngestExternalFiles(0xca0e1e0, 0xc002f86808, 0x1, 0x101, 0x0, 0x0)
        _cgo_gotypes.go:838 +0x4d
github.com/cockroachdb/cockroach/pkg/storage/engine.(*RocksDB).IngestExternalFiles.func2(0xc002f86808, 0x1, 0x1, 0xca0e1e0, 0xc002f86808, 0x1, 0x101, 0x0, 0x0)

Mind you, this is far from the only goroutine stuck (with 7min), but I think most of the others are waiting for a stopper.

I'm now also recalling that on my gceworker, I was hitting the 10s fatal error on in-memory stores -- which doesn't sound right -- so it's fairly likely that Rocks is doing something bad at this point.

I'll get a core dump.

tbg · 2019-02-11T15:20:22Z

BTW, we just need one process. It's really happening all the time.

export COCKROACH_LOG_MAX_SYNC_DURATION=24h COCKROACH_ENGINE_MAX_SYNC_DURATION=24h && make stress PKG=./pkg/sql TESTS=TestRaceWithBackfill STRESSFLAGS='-p 1 -timeout 1m -kill=false' TESTTIMEOUT=1h

tbg · 2019-02-11T15:24:43Z

$ dlv attach 31579 /tmp/go-build848480181/b001/sql.test
could not attach to pid 31579: could not find external debug info file

Boo. go-delve/delve#1409.

tbg · 2019-02-11T15:28:40Z

Let's try make testbuild PKG=./pkg/sql:

$ file ./pkg/sql/sql.test
./pkg/sql/sql.test: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f35ecffaeea260afdc0f711ccba022801d6964a2, not stripped

and stress -p 1 -timeout 1m -kill=false ./pkg/sql/sql.test -test.run=TestRaceWithBackfill -test.timeout=0 -test.v=true -test.timeout 1h

$ dlv attach 1833 ./pkg/sql/sql.test
Type 'help' for list of commands.
(dlv)

Hooray! Now let's see.

tbg · 2019-02-11T15:50:35Z

$ goroutines
...
 Goroutine 362 - User: _cgo_gotypes.go:698 github.com/cockroachdb/cockroach/pkg/storage/engine._Cfunc_DBIngestExternalFiles (0x1c054dd)

(dlv) goroutine 362
Switched from 0 to 362 (thread 1833)
$ bt
(dlv) bt
 0  0x0000000000645d52 in runtime.asmcgocall
    at /usr/local/go/src/runtime/asm_amd64.s:623
 1  0x00000000005eced2 in runtime.cgocall
    at /usr/local/go/src/runtime/cgocall.go:131
 2  0x0000000001c054dd in github.com/cockroachdb/cockroach/pkg/storage/engine._Cfunc_DBIngestExternalFiles
    at _cgo_gotypes.go:698
 3  0x0000000001c1eab6 in github.com/cockroachdb/cockroach/pkg/storage/engine.(*RocksDB).IngestExternalFiles.func2
    at ./pkg/storage/engine/rocksdb.go:2923
 4  0x0000000001c187ca in github.com/cockroachdb/cockroach/pkg/storage/engine.(*RocksDB).IngestExternalFiles
    at ./pkg/storage/engine/rocksdb.go:2923
 5  0x0000000001c206e9 in github.com/cockroachdb/cockroach/pkg/storage/engine.InMem.IngestExternalFiles
    at <autogenerated>:1
 6  0x0000000001d0599d in github.com/cockroachdb/cockroach/pkg/storage.addSSTablePreApply
    at ./pkg/storage/replica_proposal.go:451
 ...

Hmm 1833 is probably not the thread, that's the PID... yeah, it's giving me that thread for every goroutine. Well. Maybe better if I take a dump and terminate this thing with gdb right away.

Fast forward (I found the thread by picking one that wasn't in in runtime.Futex, which narrowed it down a lot):

(gdb) thread 21
[Switching to thread 21 (Thread 0x7fcc9c9ff700 (LWP 1854))]
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcc9dfb9574)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcc9dfb9574)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fcc9dfb9500, cond=0x7fcc9dfb9548) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fcc9dfb9548, mutex=0x7fcc9dfb9500) at pthread_cond_wait.c:655
#3  0x000000000231c79d in rocksdb::port::CondVar::Wait ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/port_posix.cc:91
#4  0x000000000224705c in rocksdb::InstrumentedCondVar::WaitInternal ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/monitoring/instrumented_mutex.cc:48
#5  rocksdb::InstrumentedCondVar::Wait ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/monitoring/instrumented_mutex.cc:41
#6  0x0000000002160e65 in rocksdb::DBImpl::WaitUntilFlushWouldNotStallWrites ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_compaction_flush.cc:1191
#7  0x0000000002164b9a in rocksdb::DBImpl::FlushMemTable ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_compaction_flush.cc:1121
#8  0x0000000002141a45 in rocksdb::DBImpl::IngestExternalFile ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl.cc:3077
#9  0x00000000020a5554 in rocksdb::DB::IngestExternalFile ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/libroach/../rocksdb/include/rocksdb/db.h:1033
#10 DBIngestExternalFiles () at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/libroach/db.cc:702
#11 0x000000000202621e in _cgo_ce18861870cd_Cfunc_DBIngestExternalFiles (v=0xc00439a6d8) at cgo-gcc-prolog:610
#12 0x0000000000645d80 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:637

(gdb) thread 9
[Switching to thread 9 (Thread 0x7fcca41ff700 (LWP 1842))]
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcca41fe5e8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcca41fe5e8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fcca41fe590, cond=0x7fcca41fe5c0) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fcca41fe5c0, mutex=0x7fcca41fe590) at pthread_cond_wait.c:655
#3  0x00007fccac44679c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0000000002231cf3 in wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*, uint8_t)::<lambda()> > () at /usr/include/c++/7/condition_variable:99
#5  rocksdb::WriteThread::BlockingAwaitState ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:46
#6  0x0000000002231e3a in rocksdb::WriteThread::AwaitState ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:184
#7  0x00000000022327ee in rocksdb::WriteThread::JoinBatchGroup ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:390
#8  0x000000000215d220 in rocksdb::DBImpl::WriteImpl ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:129
#9  0x000000000215f64d in rocksdb::DBImpl::Write ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:55
#10 rocksdb::DB::Put ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:1519
#11 rocksdb::DBImpl::Put ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:25
#12 0x00000000020b3418 in rocksdb::DB::Put ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/libroach/../rocksdb/include/rocksdb/db.h:248
#13 cockroach::DBImpl::Put ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/libroach/engine.cc:143
#14 0x000000000202690c in _cgo_ce18861870cd_Cfunc_DBPut (v=0xc00361cec0) at cgo-gcc-prolog:955
#15 0x0000000000645d80 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:637

(gdb) thread 12
[Switching to thread 12 (Thread 0x7fcca25fe700 (LWP 1845))]
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcca25fd688)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcca25fd688)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fcca25fd630, cond=0x7fcca25fd660) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fcca25fd660, mutex=0x7fcca25fd630) at pthread_cond_wait.c:655
#3  0x00007fccac44679c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0000000002231cf3 in wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*, uint8_t)::<lambda()> > () at /usr/include/c++/7/condition_variable:99
#5  rocksdb::WriteThread::BlockingAwaitState ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:46
#6  0x0000000002231e3a in rocksdb::WriteThread::AwaitState ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:184
#7  0x00000000022327ee in rocksdb::WriteThread::JoinBatchGroup ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/write_thread.cc:390
#8  0x000000000215d220 in rocksdb::DBImpl::WriteImpl ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:129
#9  0x000000000215f370 in rocksdb::DBImpl::Write ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_write.cc:55
#10 0x00000000020b77f7 in cockroach::DBImpl::ApplyBatchRepr ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/libroach/engine.cc:179
#11 0x00000000020258e9 in _cgo_ce18861870cd_Cfunc_DBApplyBatchRepr (v=0xc0069c52a8) at cgo-gcc-prolog:46
#12 0x0000000000645d80 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:637

(gdb) thread 27
[Switching to thread 27 (Thread 0x7fcc991fd700 (LWP 1861))]
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcc9dfb9574)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88	in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0  0x00007fccacccb9f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fcc9dfb9574)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7fcc9dfb9500, cond=0x7fcc9dfb9548) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7fcc9dfb9548, mutex=0x7fcc9dfb9500) at pthread_cond_wait.c:655
#3  0x000000000231c79d in rocksdb::port::CondVar::Wait ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/port_posix.cc:91
#4  0x000000000224705c in rocksdb::InstrumentedCondVar::WaitInternal ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/monitoring/instrumented_mutex.cc:48
#5  rocksdb::InstrumentedCondVar::Wait ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/monitoring/instrumented_mutex.cc:41
#6  0x0000000002135ea8 in rocksdb::DBImpl::WaitForIngestFile ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl.cc:3209
#7  0x000000000216c5d1 in rocksdb::DBImpl::BackgroundCallCompaction ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_compaction_flush.cc:1666
#8  0x000000000216c9b2 in rocksdb::DBImpl::BGWorkCompaction ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/db/db_impl_compaction_flush.cc:1465
#9  0x00000000023d6472 in std::function<void ()>::operator()() const () at /usr/include/c++/7/bits/std_function.h:706
#10 rocksdb::ThreadPoolImpl::Impl::BGThread ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/threadpool_imp.cc:265
#11 0x00000000023d664f in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper ()
    at /home/tschottdorf/go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/threadpool_imp.cc:303
#12 0x00007fccac44c57f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007fccaccc56db in start_thread (arg=0x7fcc991fd700) at pthread_create.c:463

There are ~10 more Rocks threads, but they all look like threadpools and background tasks waiting to be pinged.

PS I have the core dump (1.3g) and sql.test file around if anyone wants to take a look.

tbg · 2019-02-11T15:52:45Z

This does look worrying. The top goroutine is the ingestion, and it's blocked on flushing the memtable because flushing the memtable would stall writes because of overly large L0. That would resolve if compactions took care of L0. But the compaction thread (or one, at least) is stuck waiting for IngestExternalFile 💀 🔒

tbg · 2019-02-11T15:55:26Z

cc @petermattis. This deadlock looks.. very obvious. Am I missing something or are we just looking at an obvious bug?

dt · 2019-02-11T16:04:15Z

I haven’t looked closely at this and obviously would not say a lurking deadlock is OK, but for unrelated reasons (ie foreground traffic p99) we should probably try the no-flush ingest -> non-blocking memtable flush fallback trick anyway, and that might also mean we end up just avoiding this situation ? I was going to take a stab at that this week anyway — happy to try it today to see if it is indeed a workaround.

ajwerner · 2019-02-11T21:26:38Z

@awoods187 observed a seemingly similar situation doing a restore of TPCC data onto a 30 node cluster. The node was most likely running v2.2.0-alpha.20181217-1165-gb9edfcd (the cluster was already destroyed when we thought to look but it was created within a few minutes of this cluster also using roachprod stage).

I've attached a goroutine trace.
goroutine.txt

Unfortunately I failed to grab cores because I didn't point gcore at a larger storage device and when gcore fails to write the core dump it tears down the process.

dt · 2019-02-11T22:16:11Z

@vivekmenezes I took at stab at the background flush attempt in #34800 if you want to try again with that

ajwerner · 2019-02-11T22:33:36Z

Another wrinkle here is that it doesn't seem like Andy had enabled schemachanger.bulk_index_backfill.enabled and it does not seem to be enabled by default on master. I guess what he observed is something totally different. Bummer I failed to grab the core.

dt · 2019-02-11T22:35:59Z

fwiw, that setting wouldn't affect a RESTORE since only controls index backfill, however RESTORE does call AddSSTable, just typically it calls it with SSTs that overlap nothing, so I wouldn't not expect it to be forced to flush most of the time.

tbg · 2019-02-12T12:19:19Z

@ajwerner my expectation is that they're the same still, the setting just makes it much more likely for sstable ingestion to block on a compaction.

petermattis · 2019-02-12T14:13:26Z

Definitely looks like a deadlock in RocksDB. Also looks like this has already been fixed upstream: facebook/rocksdb@9be3e6b

Our version of the code in IngestExternalFile looks like:

    // Figure out if we need to flush the memtable first
    if (status.ok()) {
      bool need_flush = false;
      status = ingestion_job.NeedsFlush(&need_flush, cfd->GetSuperVersion());
      TEST_SYNC_POINT_CALLBACK("DBImpl::IngestExternalFile:NeedFlush",
                               &need_flush);
      if (status.ok() && need_flush) {
        mutex_.Unlock();
        status = FlushMemTable(cfd, FlushOptions(),
                               FlushReason::kExternalFileIngestion,
                               true /* writes_stopped */);
        mutex_.Lock();
      }
    }

Upstream master looks like:

    // Figure out if we need to flush the memtable first
    if (status.ok()) {
      bool need_flush = false;
      status = ingestion_job.NeedsFlush(&need_flush, cfd->GetSuperVersion());
      TEST_SYNC_POINT_CALLBACK("DBImpl::IngestExternalFile:NeedFlush",
                               &need_flush);
      if (status.ok() && need_flush) {
        FlushOptions flush_opts;
        flush_opts.allow_write_stall = true;
        if (immutable_db_options_.atomic_flush) {
          autovector<ColumnFamilyData*> cfds;
          SelectColumnFamiliesForAtomicFlush(&cfds);
          mutex_.Unlock();
          status = AtomicFlushMemTables(cfds, flush_opts,
                                        FlushReason::kExternalFileIngestion,
                                        true /* writes_stopped */);
        } else {
          mutex_.Unlock();
          status = FlushMemTable(cfd, flush_opts,
                                 FlushReason::kExternalFileIngestion,
                                 true /* writes_stopped */);
        }
        mutex_.Lock();
      }
    }

The atomic flush stuff can be ignored. The important part is flush_opts.allow_write_stall = true; which skips waiting for the write stall condition to clear.

Let me see about backporting this change to our RocksDB branch.

petermattis · 2019-02-12T14:24:24Z

cockroachdb/rocksdb#25

Fixes cockroachdb#34212 Release note (bug fix): Fix a deadlock that could occur during IMPORT and RESTORE, causing all writes on a node to be stopped.

34818: sql/pgwire: use OIDs when encoding some datum types r=mjibson a=mjibson Teach pgwire how to encode int and float with the various widths. Do this by saving the oids during SetColumns. Teach cmd/generate-binary to also record oids and use those in tests. Fix varbits to expect the same OID as postgres produces. Sadly, our int2 and int4 types don't yet propagate all the way down and so we still encode them as an int8. This commit is a precursor to supporting that. Release note: None 34830: c-deps: bump RocksDB to pick up ingest-external-file deadlock r=tbg a=petermattis Fixes #34212 Release note (bug fix): Fix a deadlock that could occur during IMPORT and RESTORE, causing all writes on a node to be stopped. Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Peter Mattis <[email protected]>

vivekmenezes · 2019-02-12T21:26:35Z

confirmed that this is indeed fixed!

tbg · 2019-02-13T09:38:04Z

👍 btw @vivekmenezes if you're seeing L0 fill up like that in this test it's a good indication that doing real-world workloads with it is going to potentially really mess up RocksDB. Just a heads up!

dt · 2019-02-13T14:28:46Z

The compactor kicks off at 2, so this deadlock was easy to hit long before it "fills up" (which I'd say is 20 since that's when the slowdown kicks in).
I have #34258 sitting around in case any prod testing indicates we actually hit it.

vivekmenezes assigned dt Jan 24, 2019

vivekmenezes added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-changes labels Jan 24, 2019

vivekmenezes mentioned this issue Feb 11, 2019

sql: enable bulk index backfill by default #34739

Merged

dt mentioned this issue Feb 11, 2019

libroach: try to ingest SSTs without write-stalls #34800

Merged

tbg mentioned this issue Feb 12, 2019

Hung import during tpc-c #34499

Closed

petermattis mentioned this issue Feb 12, 2019

c-deps: bump RocksDB to pick up ingest-external-file deadlock #34830

Merged

craig bot closed this as completed in #34830 Feb 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: bulk index backfill getting stuck #34212

sql: bulk index backfill getting stuck #34212

vivekmenezes commented Jan 24, 2019

vivekmenezes commented Jan 24, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019 •

edited

Loading

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

dt commented Feb 11, 2019

ajwerner commented Feb 11, 2019

dt commented Feb 11, 2019

ajwerner commented Feb 11, 2019

dt commented Feb 11, 2019

tbg commented Feb 12, 2019

petermattis commented Feb 12, 2019

petermattis commented Feb 12, 2019

vivekmenezes commented Feb 12, 2019

tbg commented Feb 13, 2019

dt commented Feb 13, 2019

sql: bulk index backfill getting stuck #34212

sql: bulk index backfill getting stuck #34212

Comments

vivekmenezes commented Jan 24, 2019

vivekmenezes commented Jan 24, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019 • edited Loading

tbg commented Feb 11, 2019

tbg commented Feb 11, 2019

dt commented Feb 11, 2019

ajwerner commented Feb 11, 2019

dt commented Feb 11, 2019

ajwerner commented Feb 11, 2019

dt commented Feb 11, 2019

tbg commented Feb 12, 2019

petermattis commented Feb 12, 2019

petermattis commented Feb 12, 2019

vivekmenezes commented Feb 12, 2019

tbg commented Feb 13, 2019

dt commented Feb 13, 2019

tbg commented Feb 11, 2019 •

edited

Loading