Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: merge-logs: change the auto-detection logic to ensure the format is auto-detected separately for each file #73374

Closed
tbg opened this issue Dec 2, 2021 · 7 comments
Labels
A-logging In and around the logging infrastructure. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity X-stale

Comments

@tbg
Copy link
Member

tbg commented Dec 2, 2021

Describe the problem

cockroach debug merge-logs trips up over cockroach-stderr.log files when they contain a stack trace (or something like that).

To Reproduce

Download this file:
cockroach-stderr.teamcity-3824348-1638343289-69-n7cpu16-geo-0001.ubuntu.2021-12-01T12_33_16Z.014678.log

$ ./cockroach debug merge-logs cockroach-stderr.teamcity-3824348-1638343289-69-n7cpu16-geo-0001.ubuntu.2021-12-01T12_33_16Z.014678.log

ERROR: decoding on line 8: malformed log entry

It also reproduces using this reduced file (which simply truncates lines 9 and above). Even removing the panic line entirely (replacing it by a blank line) causes the issue. Only when line 8 is removed entirely does the problem go away.

I211201 12:33:16.486003 1 util/log/file_sync_buffer.go:238 ⋮ [config]   file created at: 2021/12/01 12:33:16
I211201 12:33:16.486021 1 util/log/file_sync_buffer.go:238 ⋮ [config]   running on machine: ‹teamcity-3824348-1638343289-69-n7cpu16-geo-0001›
I211201 12:33:16.486033 1 util/log/file_sync_buffer.go:238 ⋮ [config]   binary: CockroachDB CCL v22.1.0-alpha.00000000-1667-ga80cfbee82-dirty (x86_64-unknown-linux-gnu, built 2021/12/01 07:20:49, go1.17.3)
I211201 12:33:16.486040 1 util/log/file_sync_buffer.go:238 ⋮ [config]   arguments: [‹./cockroach› ‹start› ‹--insecure› ‹--store› ‹path=/mnt/data1/cockroach,attrs=store1› ‹--log› ‹file-defaults: {dir: 'logs', exit-on-error: false}› ‹--cache=25%› ‹--max-sql-memory=25%› ‹--listen-addr=:26257› ‹--http-addr=:26258› ‹--advertise-addr=10.128.15.205:26257› ‹--locality=cloud=gce,region=us-central1,zone=us-central1-f› ‹--join=35.232.66.205:26257›]
I211201 12:33:16.486071 1 util/log/file_sync_buffer.go:238 ⋮ [config]   log format (utf8=✓): crdb-v2
I211201 12:33:16.486076 1 util/log/file_sync_buffer.go:238 ⋮ [config]   line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid [chan@]file:line redactionmark \[tags\] [counter] msg
I211201 12:33:16.485759 1 util/log/flags.go:202  [-] 1  stderr capture started
panic: child not present in parent [recovered]

The problem does not occur with --format=crdb-v1.

Environment:

  • CockroachDB version: master (log was created using same SHA)

cc @cameronnunez as the resident merge-logs expert.

gz#10792

Jira issue: CRDB-11562

@tbg tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Dec 2, 2021
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Dec 2, 2021
@dhartunian
Copy link
Collaborator

I think this is related to #66684

@dhartunian
Copy link
Collaborator

This issue appears to also cause problems w/ the logs HTTP endpoint which likely requires deserialization of the JSON logs.

@tbg
Copy link
Member Author

tbg commented Dec 15, 2021

The file I provided doesn't use JSON encoding though? There might be more than one problem, or there's a connection I am not seeing.

@cameronnunez
Copy link
Contributor

cameronnunez commented Dec 15, 2021

Looks like this panic is being improperly encoded for some reason – debug merge-logs should complain here. The panic entry should have a prefix like the other log entries. I'll take a look into it. Regarding the other issue, it is a different one for which there needs to be a separate conversation.

@knz
Copy link
Contributor

knz commented Dec 16, 2021

We just discussed the issue with Cameron.

  • The very root cause of the issue is that a crdb log directory can contain files written in different log formats, and the merge-log utility mistakenly tries to use the same format (auto-detected from the 1st file) to decode all the files.

    The solution to that proximate cause is to ensure the format is auto-detected for each file separately. If we were to do this, we'd auto-detect that the cockroach-stderr.log file has a different format (crdb-v1) than the others and read it with the proper decoder.

  • Cameron then points out that if we want to handle directories containing multiple log formats, it's dubious what the semantics of a single --format flag in the merge-logs utility should be.

    Indeed, if the user specifies --format=X but one of the log files is in format Y, what should we do? Erroring out is a bit disruptive, but silently skipping over the files is disruptive too.

    I (knz) personally think the error should be the better solution. This is because:

    • in the majority of cases, auto-detection (per file) should be sufficient.
    • when a user needs --format it's because they're working on log file snippets obtained by non-standard means, which is a less usual case; in that case IMHO it's OK to assume all the snippets are in the same format.
  • I'm personally not excited about creating a feature that requires (or allows) the user to specify the log format separately for each input file. The UX would be cumbersome.

  • Separately, Cameron reminds us we still don't have a good way to specify the output format of merge-logs. That's true but should be a topic for another issue.

@knz
Copy link
Contributor

knz commented Dec 16, 2021

So the action item here is to change the auto-detection logic to ensure the format is auto-detected separately for each file.

then verify that the auto-detection determines crdb-v1 for the cockroach-stderr.log files.

@cameronnunez cameronnunez changed the title cli: merge-logs: panic in stderr logs causes failure to parse cli: merge-logs: change the auto-detection logic to ensure the format is auto-detected separately for each file Feb 17, 2022
@knz knz added A-logging In and around the logging infrastructure. and removed T-server-and-security DB Server & Security labels Jun 16, 2022
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-logging In and around the logging infrastructure. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

4 participants