Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chunks-inspect): support structured metadata #11506

Merged
merged 2 commits into from
Jul 6, 2024

Conversation

agebhar1
Copy link
Contributor

What this PR does / why we need it:

Support chunk format v4 (aka structured metadata) for chunks-inspect.

Which issue(s) this PR fixes:
Fixes #10767

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@agebhar1 agebhar1 requested a review from a team as a code owner December 18, 2023 09:16
Copy link
Contributor

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-724a841 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-724a841 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-724a841 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-724a841 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

Comment on lines 129 to 71
c, _ := chunkenc.NewByteChunk(data, 0, 0)
bs := c.Blocks(from, through)
fmt.Println("Blocks: ", len(bs))

pipeline := log.NewNoopPipeline()
for idx, block := range bs {
fmt.Println("Block : ", idx)
fmt.Println("MinTime: ", time.Unix(0, block.MinTime()).UTC())
fmt.Println("MaxTime: ", time.Unix(0, block.MaxTime()).UTC())
fmt.Println("Offset : ", block.Offset())
iter := block.Iterator(context.Background(), pipeline.ForStream(nil))
for iter.Next() {
e := iter.Entry()
fmt.Println(e.Timestamp.UTC(), " ", e.Line)
for _, meta := range e.StructuredMetadata {
fmt.Println("\t", meta.Name, "=", meta.Value)
}
}
}

err := c.Close()
if err != nil {
return nil, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a reasonable change, but looking at the structure of the rest of the code I think printing of all these details would be done in main.go, it looks to me like process just builds the details of the chunk into something that main.go prints

I haven't been keeping up with structured metadata. If it's separate from a chunk then imo we should either have a separate function to gather and return the structured metadata to main.go or an additional return value in parseLokiChunk for the metadata, perhaps with a parameter to choose whether or not you want to receive info about the structured metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cstyan for your feedback! I will restructure the code and go forward.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch 2 times, most recently from 79694b3 to 2423fba Compare February 4, 2024 18:25
@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 4, 2024
@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from 2423fba to d803d44 Compare February 4, 2024 18:30
@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 4, 2024

Hi @cstyan! I've moved the output to the printFile function. The parseLokiChunk now returns only the chunk data. Is this the right direction?

@cstyan
Copy link
Contributor

cstyan commented Feb 6, 2024

I can do another review sometime this week 👍 for now it looks like there's some linting failures

cmd/chunks-inspect/main.go:30:59: unused-parameter: parameter 'storeBlocks' seems to be unused, consider removing or renaming it as _ (revive)
func printFile(filename string, blockDetails, printLines, storeBlocks bool) {
                                                          ^
cmd/chunks-inspect/time.go:37:36: unnecessary conversion (unconvert)
		v, err := strconv.ParseInt(string(p[0]), 10, 64)
		                                 ^
cmd/chunks-inspect/time.go:44:36: unnecessary conversion (unconvert)
		v, err := strconv.ParseInt(string(p[0]), 10, 64)

@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 7, 2024

Thanks a lot @cstyan.

I've removed temporarily the option to store the binary (raw) block data. I suggest to add an option to save the original log data to file.

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the diff it seems like there's a lot of code removal that doesn't seem relevant to the change you're wanting to make?

"encoding/binary"
"fmt"
"hash/crc32"
"github.com/grafana/loki/pkg/chunkenc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also have a lint rule that will say that this needs to be at the bottom of the import list with a newline in between it and the stdlib imports.

@cstyan
Copy link
Contributor

cstyan commented Feb 7, 2024

There's still some lint failures

cmd/chunks-inspect/header.go:1: : # github.com/grafana/loki/cmd/chunks-inspect
cmd/chunks-inspect/loki.go:11:19: undefined: chunkenc
cmd/chunks-inspect/loki.go:14:21: undefined: chunkenc
cmd/chunks-inspect/loki.go:45:10: undefined: chunkenc
cmd/chunks-inspect/main.go:73:14: undefined: logql (typecheck)

I think those are viewable if you click details on the checks item here on github?

@agebhar1
Copy link
Contributor Author

agebhar1 commented Feb 8, 2024

I'm sorry @cstyan. Will check and fix it hopefully in a couple of days.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch 2 times, most recently from 4f6dae9 to 9006fc1 Compare February 10, 2024 18:00
@agebhar1
Copy link
Contributor Author

Hello @cstyan,

I fixed the lint issues. The chunk-inspect tries to use as much as possible from the existing implementation. This should improve it's maintainability for added features like structured metadata. Reusing the implementation comes with some downsides such as the checksum for the blocks are not known anymore and also the raw block data. Instead of saving the block data it's now possible to export the original log data.

I also had a look into reading the header of the chunk with existing implementation but did not found the piece of code yet. I will try with the local storage client.

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to merge the latest main into your branch and then we're good to go

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from 9006fc1 to a471ec9 Compare February 20, 2024 17:30
@agebhar1
Copy link
Contributor Author

Sounds great @cstyan. I re-based the commit to latest main and updated the commit message.

@agebhar1 agebhar1 changed the title [WIP] feat(chunks-inspect): support structured metadata feat(chunks-inspect): support structured metadata Feb 20, 2024
@cstyan
Copy link
Contributor

cstyan commented Feb 21, 2024

sorry @agebhar1 it looks like there was a CI change merged yesterday afternoon NA time so I think we need one more rebase on top of main 🙇

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from a471ec9 to ad92220 Compare February 22, 2024 06:23
@agebhar1
Copy link
Contributor Author

No worries @cstyan. I did the rebase.

@cstyan
Copy link
Contributor

cstyan commented Feb 23, 2024

Thanks for your patience @agebhar1, the build is green but one last thing, I don't think we should be removing the go.sum and go.mod for chunks-inspect. Could you add those back/update the go.mod to use a go 1.20 version?

@agebhar1
Copy link
Contributor Author

Hi @cstyan, if I try to add add the go.mod to the directory cmd/chunks-inspect with

module github.com/grafana/loki/cmd/chunks-inspect

go 1.20

and run go mod tidy I get the following error:

$ go mod tidy
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: finding module for package github.com/golang/snappy
go: finding module for package github.com/grafana/loki/pkg/chunkenc
go: found github.com/golang/snappy in github.com/golang/snappy v0.0.4
go: found github.com/grafana/loki/pkg/chunkenc in github.com/grafana/loki v1.6.1
go: finding module for package github.com/grafana/loki/pkg/logql/log
go: github.com/grafana/loki/cmd/chunks-inspect imports
        github.com/grafana/loki/pkg/logql/log: module github.com/grafana/loki@latest found (v1.6.1), but does not contain package github.com/grafana/loki/pkg/logql/log

All other (loki, logcli, ...) usees the top level module description.

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from ad92220 to dd0bb42 Compare May 16, 2024 17:02
@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from b5fce50 to d65a7c4 Compare June 1, 2024 14:01
@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 1, 2024

Hi @cstyan! The export of compressed/uncompressed data (-s) is back by copying a bit of the internal code. The output for a v4 chunk contains now the structured metadata per line:

$ ./chunks-inspect -b -l MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Chunks file: MThjODE0NzYwY2I6MThjODE0NzYwY2M6N2QzYjVmYzU=
Metadata length: 257
Data length: 259
UserID: fake
From: 2023-12-19 08:53:07.915000 UTC
Through: 2023-12-19 08:53:07.916000 UTC (1ms)
Labels:
         __name__ = logs
         filename = /mnt/log/structured_metadata.log
         foo = bar
         job = various
         stream = stderr
Format (Version): 4
Encoding: snappy
Found 1 block(s)
Minimum time (from first block): 2023-12-19 08:53:07.915170 UTC
Maximum time (from last block): 2023-12-19 08:53:07.915170 UTC

Block    0: position:       50, original length:    145, minT: 2023-12-19 08:53:07.915170 UTC maxT: 2023-12-19 08:53:07.915170 UTC
2023-12-19 08:53:07.915170 UTC  { "log": "log message\n", "stream": "stderr", "traceID": "0242ac120002", "time": "2019-04-30T02:12:41.8443515Z" }
Structured Metadata:
         traceID = 0242ac120002
Total chunk size of uncompressed data: 140 compressed data: 259 ratio: 0.541

@cstyan
Copy link
Contributor

cstyan commented Jun 5, 2024

@agebhar1 looks like you have some linting failures

I'll let @slim-bean review the chunk parsing changes since he opened that other PR as well.

@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 6, 2024

@cstyan will fix it

@agebhar1 agebhar1 force-pushed the feature/gh10767-chunks-inspect branch from d65a7c4 to d7ae1b9 Compare June 6, 2024 16:23
@agebhar1
Copy link
Contributor Author

agebhar1 commented Jun 6, 2024

I fixed the lint error @cstyan.

@cstyan
Copy link
Contributor

cstyan commented Jul 5, 2024

sorry, missed this for a while, lets merge and make follow up improvements if needed 👍

@cstyan cstyan closed this Jul 5, 2024
@cstyan cstyan reopened this Jul 5, 2024
@cstyan cstyan merged commit 1834065 into grafana:main Jul 6, 2024
119 checks passed
@agebhar1
Copy link
Contributor Author

agebhar1 commented Jul 6, 2024

That's awesome @cstyan 👍. I'm ready to help for future improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add structured_metadata support to the output of chunks-inspect cli
2 participants