Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fatal error on windows related to rocksdb sst files while handling sideload event #37427

Closed
mdiazpsl opened this issue May 9, 2019 · 6 comments · Fixed by #41160
Closed
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. B-os-windows Issues specific to the Windows OS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Milestone

Comments

@mdiazpsl
Copy link

mdiazpsl commented May 9, 2019

Describe the problem

*
* ERROR: [n1,s1,r2180/1:/Table/2144/1/"CO"/"{Pr…-Sa…}] Reported as error ca555c8fe7bd4a4cb8846ee917ef91e3
*
F190509 17:10:28.696099 163 storage/replica_proposal.go:679  [n1,s1,r2180/1:/Table/2144/1/"CO"/"{Pr…-Sa…}] failed to update store after merging range: IO error: Failed to remove dir: C:\cockroach-v2.1.6.windows-6.2-amd64\cockroach-data\auxiliary\sideloading\r0XXXX\r2183: The system cannot find the file specified.
goroutine 163 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000088100, 0xc000088180, 0x54cd400, 0x1b)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1020 +0xdb
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x5736ea0, 0xc000000004, 0x54cd41d, 0x1b, 0x2a7, 0xc009764000, 0x103)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:876 +0x961
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3c2cb20, 0xc0095ad7a0, 0x4, 0x2, 0x35269d9, 0x2e, 0xc0097163c8, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2df
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3c2cb20, 0xc0095ad7a0, 0x1, 0x4, 0x35269d9, 0x2e, 0xc0097163c8, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x93
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x3c2cb20, 0xc0095ad7a0, 0x35269d9, 0x2e, 0xc0097163c8, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:182 +0x85
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleReplicatedEvalResult(0xc00586c500, 0x3c2cb20, 0xc0095ad7a0, 0x0, 0x0, 0x0, 0xc009153c80, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:679 +0x10ec
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleEvalResultRaftMuLocked(0xc00586c500, 0x3c2cb20, 0xc0095ad7a0, 0xc006190840, 0x1, 0x0, 0x0, 0xc009153c80, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:842 +0xb1
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).processRaftCommand(0xc00586c500, 0x3c2cb20, 0xc0095ad7a0, 0xc009593348, 0x8, 0x7, 0x15, 0x100000001, 0x1, 0xc, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:2006 +0x8f2
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc00586c500, 0x3c2cb20, 0xc0095ad830, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:816 +0x13e6
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReady(0xc00586c500, 0x3c2cb20, 0xc0095ad830, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:505 +0x13e
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processReady(0xc00026b800, 0x3c2cb20, 0xc0095ad830, 0x884)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3697 +0x12f
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc00041b180, 0x3c2cb20, 0xc00590c150)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:214 +0x25f
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x3c2cb20, 0xc00590c150)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc00585e640, 0xc0007925a0, 0xc00585e630)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:200 +0xe8
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:193 +0xaf

no idea how to reproduce(except with my app) i just run an app that make los of queries.

Environment:

  • CockroachDB v2.1.6.windows-6.2-amd64
  • Server OS: Windows 64
  • Client app JDBC,
    org.postgresql
    postgresql
    42.2.5
@jordanlewis jordanlewis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 9, 2019
@tim-o
Copy link
Contributor

tim-o commented May 20, 2019

#37315 looks related. we have an independent report on ZD as well I'll link. Should we close 37315 as duplicate?

@tim-o tim-o added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label May 20, 2019
@mdiazpsl
Copy link
Author

#37315 looks related. we have an independent report on ZD as well I'll link. Should we close 37315 as duplicate?

I think it is related, i also do lot of batch inserts in my process.

@mdiazpsl mdiazpsl reopened this May 20, 2019
@mdiazpsl
Copy link
Author

i do rerun my app with sql logs
cockroach start --insecure --http-addr=localhost:8081 --listen-addr=localhost --vmodule=exec_log=2
here are the tree.txt file of my cockroach directory and the full sql log files
i hope this will help
tree.txt
cockroach.MDIAZ.PSLCOL_mdiaz.2019-05-24T15_56_19Z.004720.log
cockroach.MDIAZ.PSLCOL_mdiaz.2019-05-24T15_58_59Z.004720.log

@tim-o
Copy link
Contributor

tim-o commented May 24, 2019

@mdiazpsl to clarify: are you running a single node when this happens?

@mdiazpsl
Copy link
Author

yes this time it was a single node

@bdarnell bdarnell added the B-os-windows Issues specific to the Windows OS. label Jul 10, 2019
@kenliu kenliu added the A-kv Anything in KV that doesn't belong in a more specific category. label Jul 10, 2019
@kenliu kenliu added this to the 19.2 milestone Jul 10, 2019
@kenliu kenliu added the O-community Originated from the community label Jul 10, 2019
@darinpp
Copy link
Contributor

darinpp commented Sep 20, 2019

This is currently blocked by #40918. I wasn't able to reproduce before I got the #40918 bug

@knz knz changed the title fatal error storage: fatal error on windows related to rocksdb sst files while handling sideload event Sep 20, 2019
darinpp added a commit to darinpp/cockroach that referenced this issue Sep 27, 2019
RocksDB provides different delete folder methods depending on the
environment (windows, posix etc). Unfortunatelly the implementations
treat any error returned as I/O error and convert the error
codes to strings. So on CockroachDB side we can only distinguish a
serious error (disk corruption) from a harmless error
(file or folder doesn't exist) by parsing the error as a string.
Different platforms will return different strings and in case of Windows
the string wasn't in the list that we used to recognize folder not found.

Fixes cockroachdb#37819
Fixes cockroachdb#37427

Release justification: bug fix for existing functionality

Release note: None
craig bot pushed a commit that referenced this issue Sep 27, 2019
41160: rocksdb: incorrectly identifying not found folder error on windows r=darinpp a=darinpp

RocksDB provides different delete folder methods depending on the
environment (windows, posix etc). Unfortunatelly the implementations
treat any error returned as I/O error and convert the error
codes to strings. So on CockroachDB side we can only distinguish a
serious error (disk corruption) from a harmless error
(file or folder doesn't exist) by parsing the error as a string.
Different platforms will return different strings and in case of Windows
the string wasn't in the list that we used to recognize folder not found.

Fixes #37819
Fixes #37427

Release justification: bug fix for existing functionality

Release note: None

Co-authored-by: Darin <[email protected]>
@craig craig bot closed this as completed in 1005192 Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. B-os-windows Issues specific to the Windows OS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants