[libbeat] Track frame counts in saved segments in the disk queue #22970

faec · 2020-12-07T21:21:45Z

What does this PR do?

This PR adds the plumbing to keep track of how many frames (events) are stored on disk at a given time, by adding a frameCount field to the segment file header and a frameIndex field to the queuePosition structure stored in the queue state. The initial version of the queue only tracked byte counts and positions, which don't convert easily to frame counts.

On startup, the queue now attempts to load the segment header from any preexisting segments. If the segment header has no frame count (either because it's from a previous version or because the segment was not closed cleanly), it attempts to calculate the value manually with a linear scan of the segment's frame headers.

Since the pipeline metrics are not yet accessible to the queue, this PR has no user-visible changes except for a few log messages.

Why is it important?

This is necessary preparation for reporting the "real" active event count for the disk queue as required by #22602

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
~~I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

elasticmachine · 2020-12-07T21:21:48Z

Pinging @elastic/integrations (Team:Integrations)

elasticmachine · 2020-12-07T22:38:18Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #22970 updated
Start Time: 2021-05-25T16:48:23.194+0000
Duration: 138 min 20 sec
Commit: 27c5f09

Test stats 🧪

Test	Results
Failed	0
Passed	47538
Skipped	5248
Total	52786

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	47538
Skipped	5248
Total	52786

fearful-symmetry

LGTM, with the caveat that I don't know nearly enough about this as you.

libbeat/publisher/queue/diskqueue/reader_loop.go

libbeat/publisher/queue/diskqueue/segments.go

libbeat/publisher/queue/diskqueue/writer_loop.go

botelastic · 2021-01-17T15:15:53Z

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic · 2021-02-26T19:16:03Z

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic · 2021-03-28T19:16:16Z

Hi!
This PR has been stale for a while and we're going to close it as part of our cleanup procedure.
We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team.
Feel free to re-open this PR if you think it should stay open and is worth rebasing.
Thank you for your contribution!

botelastic · 2021-04-28T15:16:16Z

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

…ad of relative

urso · 2021-05-25T14:29:00Z

libbeat/publisher/queue/diskqueue/core_loop.go

+		// If the segment is still in the writing list, we can't discard it
+		// until the writer loop is done with it, but we can hope that advancing
+		// to the current write position will get us out of our error state.
+		dq.segments.nextReadPosition = segment.byteCount


Given the way readers and writers are coordinated, an error in a segment that is currently written would likely indicate a bug in the business logic, a bug in the framing done by the writer, or an unexpected race condition. The system will most likely recover once we did start a new segment file. In this case we might want to use logger.Criticalf, in order to encourage users to report bugs.

Errors on older, already closed segment files might indicate a broken/invalid segment file, or a bug in the framing. The former is to be expected if the system was not shutdown cleanly. Error level would be enough I think.

urso · 2021-05-25T14:41:06Z

libbeat/publisher/queue/diskqueue/segments.go

+				fullPath := path.Join(pathStr, file.Name())
+				header, err := readSegmentHeaderWithFrameCount(fullPath)
+				if header == nil {
+					logger.Errorf("couldn't load segment file '%v': %v", fullPath, err)


I'd like to encourage us to use structured logging more. e.g. the loop could introduce its own logger like logger := logger.With("segment", segmentID(id)). Would we need the full path if we have had the segment ID?

The full path is theoretically redundant since every segment is in the same directory, and the directory depends on the user configuration, but I thought it would be nice for the error to be explicit since I wouldn't expect whoever sees the message to know where the queue path is or how to find it.

urso · 2021-05-25T15:07:26Z

/test

) (cherry picked from commit 4b14493)

) (#26482) (cherry picked from commit 4b14493) Co-authored-by: Fae Charlton <[email protected]>

faec added 3 commits December 3, 2020 16:55

segment frame count tracking

ddd29bd

track frame index within segments

2012b5e

fix manual frame counting

257fd0c

faec added enhancement libbeat Team:Integrations Label for the Integrations team labels Dec 7, 2020

faec requested review from urso and fearful-symmetry December 7, 2020 21:21

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 7, 2020

fearful-symmetry approved these changes Dec 9, 2020

View reviewed changes

urso reviewed Dec 9, 2020

View reviewed changes

botelastic bot added the Stalled label Jan 17, 2021

botelastic bot removed the Stalled label Jan 27, 2021

botelastic bot added the Stalled label Feb 26, 2021

botelastic bot closed this Mar 28, 2021

urso reopened this Mar 29, 2021

botelastic bot removed the Stalled label Mar 29, 2021

botelastic bot added the Stalled label Apr 28, 2021

Merge branch 'master' into disk-queue

c2b60df

botelastic bot removed the Stalled label May 13, 2021

faec added 12 commits May 17, 2021 13:44

fixing unused variables

35f4f81

clarify / replace 'nextWriteOffset'

24eec45

(doesn't build yet) checkpoint: making segment offsets absolute inste…

2977318

…ad of relative

(doesn't build yet) another checkpoint, most build errors resolved

94ce717

(doesn't build yet) checkpoint, all builds except tests

8170aba

everything builds again

39de5a1

tests pass again

9718747

add comments

1f28438

remove old code

72c9319

minimize direct access of queueSegment.header

0d03cd9

remove the queueSegment.header field, keeping only the schema version

7ed87b9

clarify comments

06e2b96

urso reviewed May 25, 2021

View reviewed changes

Merge branch 'master' into disk-queue

27c5f09

faec mentioned this pull request May 25, 2021

Disk Queue GA meta-issue #22602

Closed

9 tasks

faec merged commit 4b14493 into elastic:master May 25, 2021

faec deleted the disk-queue branch May 25, 2021 19:47

faec added the backport-v7.14.0 Automated backport with mergify label Jun 24, 2021

mergify bot pushed a commit that referenced this pull request Jun 24, 2021

[libbeat] Track frame counts in saved segments in the disk queue (#22970

9f5ff30

) (cherry picked from commit 4b14493)

mergify bot mentioned this pull request Jun 24, 2021

[7.x](backport #22970) [libbeat] Track frame counts in saved segments in the disk queue #26482

Merged

faec mentioned this pull request Jun 24, 2021

[libbeat] Fix encoding and file offset issues in the disk queue #26484

Merged

6 tasks

faec added a commit that referenced this pull request Jun 24, 2021

[libbeat] Track frame counts in saved segments in the disk queue (#22970

5bc2ba3

) (#26482) (cherry picked from commit 4b14493) Co-authored-by: Fae Charlton <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libbeat] Track frame counts in saved segments in the disk queue #22970

[libbeat] Track frame counts in saved segments in the disk queue #22970

faec commented Dec 7, 2020

elasticmachine commented Dec 7, 2020

elasticmachine commented Dec 7, 2020 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

fearful-symmetry left a comment

botelastic bot commented Jan 17, 2021

botelastic bot commented Feb 26, 2021

botelastic bot commented Mar 28, 2021

botelastic bot commented Apr 28, 2021

urso May 25, 2021

urso May 25, 2021

faec May 25, 2021

urso commented May 25, 2021

[libbeat] Track frame counts in saved segments in the disk queue #22970

[libbeat] Track frame counts in saved segments in the disk queue #22970

Conversation

faec commented Dec 7, 2020

What does this PR do?

Why is it important?

Checklist

elasticmachine commented Dec 7, 2020

elasticmachine commented Dec 7, 2020 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

fearful-symmetry left a comment

Choose a reason for hiding this comment

botelastic bot commented Jan 17, 2021

botelastic bot commented Feb 26, 2021

botelastic bot commented Mar 28, 2021

botelastic bot commented Apr 28, 2021

urso May 25, 2021

Choose a reason for hiding this comment

urso May 25, 2021

Choose a reason for hiding this comment

faec May 25, 2021

Choose a reason for hiding this comment

urso commented May 25, 2021

elasticmachine commented Dec 7, 2020 •

edited by jenkins-beats-ci bot

Loading