[receiver/filelog] Threadpooling for maxconcurrency #19448

VihasMakwana · 2023-03-10T20:05:47Z

Description:
Changes in core logic.
Major changes:

makeReaders -> makeReader: Instead of creating all the readers at once, we'll go through the paths and create readers one by one. We also move known fingerprints and known files to the struct Manager and we reset this every poll cycle.
Removing the maxBatches limit: As we've gone asynchronous, there's no need for such a limit. We can add a new limit like maxFilesPerCycle which limits the number of files matched per cycle. I have also removed the test cases for maxBatches
Many new fields were added to Manager. pathHashLock: For atomically updating the path map. readerLock: For atomically tracking the lostReaders list, which will be used to detect lost files. knownFilesLock to atomically update the m.knownFiles.

Link to tracking Issue: #18908

Testing: The core functionality remains the same, but the approach is different. I have thus updated the test cases taking the asynchronous behavior into consideration.

Documentation: Updated design.md

…concurrency

linux-foundation-easycla · 2023-03-10T20:05:57Z

The committers listed above are authorized under a signed CLA.

✅ login: vihas-splunk (76d9480, e8bc5f0, e5f14db, fce9251, b2331ec, e73367b, 7738cda, 10124c9b575ceb056e8258903f2982f10ba05bc2)

dmitryax · 2023-03-12T03:45:05Z

@vihas-splunk, please sign the CLA

runforesight · 2023-03-12T03:47:03Z

Foresight Summary

Major Impacts

TestReadRotatingFiles ❌ failed 4 times in 8 runs (50% fail rate).

TestReadRotatingFiles/CopyTruncateTimestamped ❌ failed 4 times in 8 runs (50% fail rate).

build-and-test duration(19 minutes 38 seconds) has decreased 26 minutes 46 seconds compared to main branch avg(46 minutes 24 seconds).

View More Details

⭕ build-and-test-windows workflow has finished in 10 seconds (31 minutes 33 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
windows-unittest-matrix	- 🔗	N/A	See Details
windows-unittest	- 🔗	N/A	See Details

✅ telemetrygen workflow has finished in 1 minute 12 seconds and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
build-dev	- 🔗	N/A	See Details
publish-latest	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details

✅ check-links workflow has finished in 1 minute 40 seconds (⚠️ 42 seconds more than `main` branch avg.) and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
changed files	- 🔗	N/A	See Details
check-links	- 🔗	N/A	See Details

✅ changelog workflow has finished in 2 minutes 52 seconds and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
changelog	- 🔗	N/A	See Details

✅ prometheus-compliance-tests workflow has finished in 3 minutes 27 seconds (2 minutes 55 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
prometheus-compliance-tests	- 🔗	N/A	See Details

✅ load-tests workflow has finished in 6 minutes 16 seconds (4 minutes 6 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
setup-environment	- 🔗	N/A	See Details
loadtest (TestIdleMode)	- 🔗	N/A	See Details
loadtest (TestBallastMemory\|TestLog10kDPS)	- 🔗	N/A	See Details
loadtest (TestMetric10kDPS\|TestMetricsFromFile)	- 🔗	N/A	See Details
loadtest (TestMetricResourceProcessor\|TestTrace10kSPS)	- 🔗	N/A	See Details
loadtest (TestTraceNoBackend10kSPS\|TestTrace1kSPSWithAttrs)	- 🔗	N/A	See Details
loadtest (TestTraceBallast1kSPSWithAttrs\|TestTraceBallast1kSPSAddAttrs)	- 🔗	N/A	See Details
loadtest (TestTraceAttributesProcessor)	- 🔗	N/A	See Details

✅ e2e-tests workflow has finished in 12 minutes 44 seconds and finished at 7th Apr, 2023.

Job	Failed Steps	Tests
kubernetes-test (v1.26.0)	- 🔗	N/A	See Details
kubernetes-test (v1.25.3)	- 🔗	N/A	See Details
kubernetes-test (v1.24.7)	- 🔗	N/A	See Details
kubernetes-test (v1.23.13)	- 🔗	N/A	See Details

❌ build-and-test workflow has finished in 19 minutes 38 seconds (26 minutes 46 seconds less than `main` branch avg.) and finished at 7th Apr, 2023. 6 jobs failed.

Job	Failed Steps	Tests
setup-environment	- 🔗	N/A	See Details
check-codeowners	- 🔗	N/A	See Details
lint-matrix (receiver-0)	- 🔗	N/A	See Details
lint-matrix (receiver-1)	- 🔗	N/A	See Details
lint-matrix (processor)	- 🔗	N/A	See Details
lint-matrix (exporter)	- 🔗	N/A	See Details
lint-matrix (extension)	- 🔗	N/A	See Details
lint-matrix (connector)	- 🔗	N/A	See Details
lint-matrix (internal)	- 🔗	N/A	See Details
lint-matrix (other)	Lint 🔗	N/A	See Details
build-examples	- 🔗	N/A	See Details
check-collector-module-version	- 🔗	N/A	See Details
checks	Porto 🔗	N/A	See Details
correctness-metrics	- 🔗	N/A	See Details
correctness-traces	- 🔗	N/A	See Details
integration-tests	- 🔗	N/A	See Details
unittest-matrix (1.20, receiver-0)	Run Unit Tests 🔗	N/A	See Details
unittest-matrix (1.20, receiver-1)	- 🔗	N/A	See Details
unittest-matrix (1.20, processor)	- 🔗	N/A	See Details
unittest-matrix (1.20, exporter)	- 🔗	N/A	See Details
unittest-matrix (1.20, extension)	- 🔗	N/A	See Details
unittest-matrix (1.20, connector)	- 🔗	N/A	See Details
unittest-matrix (1.20, internal)	- 🔗	N/A	See Details
unittest-matrix (1.20, other)	- 🔗	N/A	See Details
unittest-matrix (1.19, receiver-0)	- 🔗	N/A	See Details
unittest-matrix (1.19, receiver-1)	- 🔗	N/A	See Details
unittest-matrix (1.19, processor)	- 🔗	N/A	See Details
unittest-matrix (1.19, exporter)	- 🔗	N/A	See Details
unittest-matrix (1.19, extension)	- 🔗	N/A	See Details
unittest-matrix (1.19, connector)	- 🔗	N/A	See Details
unittest-matrix (1.19, internal)	- 🔗	N/A	See Details
unittest-matrix (1.19, other)	- 🔗	N/A	See Details
unittest (1.20)	Interpret result 🔗	N/A	See Details
unittest (1.19)	Interpret result 🔗	N/A	See Details
lint	Interpret result 🔗	N/A	See Details
cross-compile	- 🔗	N/A	See Details
windows-msi	- 🔗	N/A	See Details
build-package	- 🔗	N/A	See Details
publish-dev	- 🔗	N/A	See Details
publish-check	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details
rotate-milestone	- 🔗	N/A	See Details

🔎 See details on Foresight

^{*You can configure Foresight comments in your organization settings page.}

pkg/stanza/fileconsumer/trie.go

djaglowski · 2023-04-04T14:06:34Z

pkg/stanza/fileconsumer/design.md

    1. Each file handle is wrapped into a `Reader` along with some metadata. (See Reader section above)
        - During the creation of a `Reader`, the file's fingerprint is cross referenced with previously known fingerprints.
        - If a file's fingerprint matches one that has recently been seen, then metadata is copied over from the previous iteration of the Reader. Most importantly, the offset is accurately maintained in this way.
        - If a file's fingerprint does not match any recently seen files, then its offset is initialized according to the `start_at` setting.
-9. Detection of Lost Files
+8. Detection of Lost Files


I think behavior described here has been changed in this PR. Moreover, the term "lost files" no longer seems to be an apt description of how we're managing file handles.

What do you think about dropping this terminology and just describing how we hold on to file handles as long as possible, and then read to the end one last time before closing them?

Sounds good to me.

djaglowski · 2023-04-04T14:13:18Z

pkg/stanza/fileconsumer/trie.go

+		// We have reached end of the current path and all the previous characters have matched
+		// Return if current node is leaf and it is not root
+		if node.isLeaf() && node != trie {
+			return r


Isn't r just the last byte?

Suggested change

return r

return key

pkg/stanza/fileconsumer/trie.go

pkg/stanza/fileconsumer/util_test.go

Co-authored-by: Daniel Jaglowski <[email protected]>

djaglowski

Hi @VihasMakwana, thanks for your patience on this. I'd still very much like to see this PR through but I've had little time to focus on it.

I will try my best to give a more thorough review later this week, but a couple thoughts at this time:

This is a significant change to the way this component works and we have many people depending on it in production. As a result, I think we should enable this functionality initially behind a feature gate. This means we should be able to run the component just as it was before these changes, so we'll need to identify the point of departure within the codebase.
Any small parts of this PR that can be submitted separately could potentially be accepted without a feature gate, since they can be evaluated more carefully in isolation. We can do this before or after the feature gate is added. A couple ideas that immediately come to mind are the test refactoring and potentially a minimal switch to makeReader.

djaglowski · 2023-04-11T19:28:45Z

pkg/stanza/fileconsumer/trie.go

+		// We have reached end of the current path and all the previous characters have matched
+		// Return if current node is leaf and it is not root
+		if node.isLeaf() && node != trie {
+			return r


VihasMakwana · 2023-04-29T17:09:39Z

Hi @VihasMakwana, thanks for your patience on this. I'd still very much like to see this PR through but I've had little time to focus on it.

I will try my best to give a more thorough review later this week, but a couple thoughts at this time:

This is a significant change to the way this component works and we have many people depending on it in production. As a result, I think we should enable this functionality initially behind a feature gate. This means we should be able to run the component just as it was before these changes, so we'll need to identify the point of departure within the codebase.

Any small parts of this PR that can be submitted separately could potentially be accepted without a feature gate, since they can be evaluated more carefully in isolation. We can do this before or after the feature gate is added. A couple ideas that immediately come to mind are the test refactoring and potentially a minimal switch to makeReader.

Sounds good to me.
A feature gate makes much more sense as this is a significant change.
I'll separate the parts which can be accepted without a feature gate.

dmitryax · 2023-04-30T16:18:41Z

@VihasMakwana, did you run this build to confirm that it resolves the original issue?

Also, it'd be great to highlight what other benefits it provides for the end user. Does it improve logs collection throughput? Can you please add benchmarks?

VihasMakwana · 2023-05-03T05:26:49Z

@VihasMakwana, did you run this build to confirm that it resolves #17846?

Also, it'd be great to highlight what other benefits it provides for the end user. Does it improve logs collection throughput? Can you please add benchmarks?

Yeah, I did test this out and it resolved the original issue.
I'll add some benchmarks and highlight the benefits.

VihasMakwana · 2023-05-03T05:29:35Z

@djaglowski what I'm thinking is to create some other PRs before finalizing this one,
one to refactor, and one to update the test cases.
I'll put this PR in draft mode till then I guess.
Can you please share your thoughts?

djaglowski · 2023-05-08T20:19:41Z

@djaglowski what I'm thinking is to create some other PRs before finalizing this one, one to refactor, and one to update the test cases. I'll put this PR in draft mode till then I guess. Can you please share your thoughts?

@VihasMakwana, that sounds great. Thank you again.

github-actions · 2023-05-26T22:03:24Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2023-06-10T05:20:24Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Vihas makwana added 7 commits February 16, 2023 05:20

Make a new reader method

76d9480

Create a new reader and thread pooling

e8bc5f0

Updated go mod

e5f14db

Undo changes

fce9251

Fix the errors

b2331ec

Update rotation and benchmark testcases

e73367b

Merge remote-tracking branch 'origin/main' into threadpooling_for_max…

7738cda

…concurrency

VihasMakwana requested a review from a team March 10, 2023 20:05

VihasMakwana requested a review from djaglowski as a code owner March 10, 2023 20:05

github-actions bot assigned evan-bradley Mar 10, 2023

github-actions bot added pkg/stanza receiver/filelog labels Mar 10, 2023

VihasMakwana marked this pull request as draft March 10, 2023 20:06

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch 2 times, most recently from 944cd17 to 10124c9 Compare March 10, 2023 20:09

djaglowski changed the title ~~Threadpooling for maxconcurrency~~ [receiver/filelog] Threadpooling for maxconcurrency Mar 10, 2023

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch 2 times, most recently from 7700de9 to 046888d Compare March 13, 2023 05:31

VihasMakwana marked this pull request as ready for review March 13, 2023 05:32

github-actions bot assigned TylerHelmuth Mar 13, 2023

Make changes in documentation

f301bb4

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch from 046888d to f301bb4 Compare March 13, 2023 05:52

Vihas Splunk and others added 5 commits March 13, 2023 17:52

fix linting issues

a50ab10

Merge branch 'open-telemetry:main' into threadpooling_for_maxconcurrency

c61c7fd

Fix pipeline errors

5a675e5

Add chloggen entry

a794b9e

Copy the fingerprint instead of referencing

c72df6b

Fix the test cases

74fe33f

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch 2 times, most recently from d980857 to b1fa948 Compare March 25, 2023 16:27

VihasMakwana requested a review from djaglowski March 28, 2023 10:15

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch from b1fa948 to ac96f12 Compare March 31, 2023 06:23

Add trie.go

1c27447

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch from ac96f12 to 1c27447 Compare April 1, 2023 10:00

djaglowski reviewed Apr 4, 2023

View reviewed changes

VihasMakwana and others added 5 commits April 7, 2023 21:10

Update pkg/stanza/fileconsumer/trie.go

fd36a75

Co-authored-by: Daniel Jaglowski <[email protected]>

Update pkg/stanza/fileconsumer/trie.go

a788893

Co-authored-by: Daniel Jaglowski <[email protected]>

Update pkg/stanza/fileconsumer/trie.go

ecd135b

Co-authored-by: Daniel Jaglowski <[email protected]>

Remove redundant variables

eb8b2f9

Refactor buildTestManager

282242c

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch from e7d3380 to a5d6824 Compare April 11, 2023 12:56

VihasMakwana requested a review from djaglowski April 11, 2023 12:56

Increase timeout

4e381b5

VihasMakwana force-pushed the threadpooling_for_maxconcurrency branch 2 times, most recently from 668b4d1 to 4e381b5 Compare April 11, 2023 17:01

djaglowski reviewed Apr 25, 2023

View reviewed changes

This was referenced May 10, 2023

[pkg/stanza/fileconsumer] Refactor makeReaders into makeReader #21734

Merged

[pkg/stanza/fileconsumer] Refactor test cases #21792

Merged

github-actions bot added the Stale label May 26, 2023

github-actions bot closed this Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/filelog] Threadpooling for maxconcurrency #19448

[receiver/filelog] Threadpooling for maxconcurrency #19448

VihasMakwana commented Mar 10, 2023

linux-foundation-easycla bot commented Mar 10, 2023 •

edited

Loading

dmitryax commented Mar 12, 2023

runforesight bot commented Mar 12, 2023 •

edited

Loading

`TestReadRotatingFiles` ❌ failed 4 times in 8 runs (50% fail rate).

`TestReadRotatingFiles/CopyTruncateTimestamped` ❌ failed 4 times in 8 runs (50% fail rate).

`build-and-test` duration(19 minutes 38 seconds) has decreased 26 minutes 46 seconds compared to main branch avg(46 minutes 24 seconds).

⭕ build-and-test-windows workflow has finished in 10 seconds (31 minutes 33 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

✅ telemetrygen workflow has finished in 1 minute 12 seconds and finished at 7th Apr, 2023.

✅ check-links workflow has finished in 1 minute 40 seconds (⚠️ 42 seconds more than `main` branch avg.) and finished at 7th Apr, 2023.

✅ changelog workflow has finished in 2 minutes 52 seconds and finished at 7th Apr, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 27 seconds (2 minutes 55 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

✅ load-tests workflow has finished in 6 minutes 16 seconds (4 minutes 6 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

✅ e2e-tests workflow has finished in 12 minutes 44 seconds and finished at 7th Apr, 2023.

❌ build-and-test workflow has finished in 19 minutes 38 seconds (26 minutes 46 seconds less than `main` branch avg.) and finished at 7th Apr, 2023. 6 jobs failed.

djaglowski Apr 4, 2023

VihasMakwana Apr 9, 2023

djaglowski Apr 4, 2023

djaglowski Apr 11, 2023

djaglowski left a comment

djaglowski Apr 11, 2023

VihasMakwana commented Apr 29, 2023

dmitryax commented Apr 30, 2023 •

edited

Loading

VihasMakwana commented May 3, 2023 •

edited

Loading

VihasMakwana commented May 3, 2023

djaglowski commented May 8, 2023

github-actions bot commented May 26, 2023

github-actions bot commented Jun 10, 2023

[receiver/filelog] Threadpooling for maxconcurrency #19448

[receiver/filelog] Threadpooling for maxconcurrency #19448

Conversation

VihasMakwana commented Mar 10, 2023

linux-foundation-easycla bot commented Mar 10, 2023 • edited Loading

dmitryax commented Mar 12, 2023

runforesight bot commented Mar 12, 2023 • edited Loading

Foresight Summary

TestReadRotatingFiles ❌ failed 4 times in 8 runs (50% fail rate).

TestReadRotatingFiles/CopyTruncateTimestamped ❌ failed 4 times in 8 runs (50% fail rate).

build-and-test duration(19 minutes 38 seconds) has decreased 26 minutes 46 seconds compared to main branch avg(46 minutes 24 seconds).

⭕ build-and-test-windows workflow has finished in 10 seconds (31 minutes 33 seconds less than main branch avg.) and finished at 7th Apr, 2023.

✅ telemetrygen workflow has finished in 1 minute 12 seconds and finished at 7th Apr, 2023.

✅ check-links workflow has finished in 1 minute 40 seconds (⚠️ 42 seconds more than main branch avg.) and finished at 7th Apr, 2023.

✅ changelog workflow has finished in 2 minutes 52 seconds and finished at 7th Apr, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 27 seconds (2 minutes 55 seconds less than main branch avg.) and finished at 7th Apr, 2023.

✅ load-tests workflow has finished in 6 minutes 16 seconds (4 minutes 6 seconds less than main branch avg.) and finished at 7th Apr, 2023.

✅ e2e-tests workflow has finished in 12 minutes 44 seconds and finished at 7th Apr, 2023.

❌ build-and-test workflow has finished in 19 minutes 38 seconds (26 minutes 46 seconds less than main branch avg.) and finished at 7th Apr, 2023. 6 jobs failed.

djaglowski Apr 4, 2023

Choose a reason for hiding this comment

VihasMakwana Apr 9, 2023

Choose a reason for hiding this comment

djaglowski Apr 4, 2023

Choose a reason for hiding this comment

djaglowski Apr 11, 2023

Choose a reason for hiding this comment

djaglowski left a comment

Choose a reason for hiding this comment

djaglowski Apr 11, 2023

Choose a reason for hiding this comment

VihasMakwana commented Apr 29, 2023

dmitryax commented Apr 30, 2023 • edited Loading

VihasMakwana commented May 3, 2023 • edited Loading

VihasMakwana commented May 3, 2023

djaglowski commented May 8, 2023

github-actions bot commented May 26, 2023

github-actions bot commented Jun 10, 2023

linux-foundation-easycla bot commented Mar 10, 2023 •

edited

Loading

runforesight bot commented Mar 12, 2023 •

edited

Loading

`TestReadRotatingFiles` ❌ failed 4 times in 8 runs (50% fail rate).

`TestReadRotatingFiles/CopyTruncateTimestamped` ❌ failed 4 times in 8 runs (50% fail rate).

`build-and-test` duration(19 minutes 38 seconds) has decreased 26 minutes 46 seconds compared to main branch avg(46 minutes 24 seconds).

⭕ build-and-test-windows workflow has finished in 10 seconds (31 minutes 33 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

✅ check-links workflow has finished in 1 minute 40 seconds (⚠️ 42 seconds more than `main` branch avg.) and finished at 7th Apr, 2023.

✅ prometheus-compliance-tests workflow has finished in 3 minutes 27 seconds (2 minutes 55 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

✅ load-tests workflow has finished in 6 minutes 16 seconds (4 minutes 6 seconds less than `main` branch avg.) and finished at 7th Apr, 2023.

❌ build-and-test workflow has finished in 19 minutes 38 seconds (26 minutes 46 seconds less than `main` branch avg.) and finished at 7th Apr, 2023. 6 jobs failed.

dmitryax commented Apr 30, 2023 •

edited

Loading

VihasMakwana commented May 3, 2023 •

edited

Loading