Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected crash, no longer able to start #113131

Closed
Kukulkano opened this issue Oct 26, 2023 · 4 comments
Closed

Unexpected crash, no longer able to start #113131

Kukulkano opened this issue Oct 26, 2023 · 4 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community

Comments

@Kukulkano
Copy link

Kukulkano commented Oct 26, 2023

Describe the problem

From one moment to the other, the cockroach service stoped and no longer can start. Even a system restart does not bring it back.

To Reproduce

I run it on Kubuntu 22.04 LTS (x64). I sadly don't know what I may have done. I was not even in a sql console window... I use a tool written in golang that is using "github.com/jackc/pgx" to access the database. But I've never seen issues with that.

Additional data / screenshots
In journalctl I only find this repeatedly after the usual startup messages. The same if I start it manually:

Oct 26 12:18:59 vsdevel cockroach[16504]: *
Oct 26 12:18:59 vsdevel cockroach[16504]: * ERROR: Queued as error ca79310b51184ad4a4137ab9d0cff216
Oct 26 12:18:59 vsdevel cockroach[16504]: *

After a longer search, I found a lot of error messages in the ~/cockroach-data/logs/cockroach.log

cockroach.log

It looks like some strange errors...

Environment:

  • Build Tag: v23.1.11
  • Build Time: 2023/09/27 01:53:43
  • Distribution: CCL
  • Platform: linux amd64 (x86_64-pc-linux-gnu)
  • Go Version: go1.19.10
  • C Compiler: gcc 6.5.0
  • Build Commit ID: 62ad175
  • Build Type: release
  • Enabled Assertions: false
  • Kubuntu 22.04 LTS (x64)

Jira issue: CRDB-32762

@Kukulkano Kukulkano added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 26, 2023
@blathers-crl
Copy link

blathers-crl bot commented Oct 26, 2023

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Oct 26, 2023
@Kukulkano
Copy link
Author

Due to the lack of time, I was forced to downgrade to cockroach DB v22.2.15.

The older engine was not able to start, too. Something wrong with pebble or so. Therefore, I think the database got corrupt?

So I was forced to delete ~/cockroach-data/ and rebuild the whole database. Now it is working again. Luckily, it was just a test system (running one instance, no production data).

If you still like to inspect, I did a snapshot of the system with the status of this error report. So I can restore a clone if needed...

@yuzefovich
Copy link
Member

yuzefovich commented Oct 27, 2023

If you squint at the error, you'll see the following:

...
F231026 10:19:31.008409 191 kv/kvserver/store_raft.go:647 ⋮ [T1,n1,s1,r1/1:‹/{Min-System/NodeL…}›,raft] 44 +Wraps: (3) raft closed timestamp regression in cmd: 7d842a04d4457d2d (term: 7, index: 31082); batch state: 1698314902.881773236,0, command: 1698314663.393816002,0, lease: repl=(n1,s1):1 seq=1 start=0,0 exp=1698315132.867573352,0 pro=1698315126.867573352,0, req: <unknown; not leaseholder>, applying at LAI: 279.
F231026 10:19:31.008409 191 kv/kvserver/store_raft.go:647 ⋮ [T1,n1,s1,r1/1:‹/{Min-System/NodeL…}›,raft] 44 +  | Closed timestamp was set by req: <unknown; not leaseholder or not lease request> under lease: <nil>; applied at LAI: 0. Batch idx: 0.
F231026 10:19:31.008409 191 kv/kvserver/store_raft.go:647 ⋮ [T1,n1,s1,r1/1:‹/{Min-System/NodeL…}›,raft] 44 +  | This assertion will fire again on restart; to ignore run with env var COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED=false
...

As described in the error message and in this comment, you should be able to restart CRDB (with some risks) using COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED=false. I'll close this as a dup of #90682, and if you need any assistance - please comment on that issue. (Also note that I'm personally not an expert on this part of the system.)

@yuzefovich yuzefovich removed the X-blathers-untriaged blathers was unable to find an owner label Oct 27, 2023
@Kukulkano
Copy link
Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community
Projects
None yet
Development

No branches or pull requests

2 participants