Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target Allocator implementation (Part 2 - Target Allocator Image logic) #354

Merged
merged 29 commits into from
Sep 15, 2021

Conversation

alexperez52
Copy link
Contributor

@alexperez52 alexperez52 commented Jul 22, 2021

Description

Follows the changes in #351

This PR addresses issue open-telemetry/wg-prometheus#60. This is part 2 of a 3 part update in which enhancements are presented to the OTel Operator to support the deployment of a Target Allocator. This update contains the logic for the Operator Managed target allocator image which will be used to perform target sharding for the Prometheus scrape targets and expose them to the collector(s) for consumption using http sd config.

The Design document for this part 2 can be found here:
Target Allocator Design Document

Type of change

New feature

Testing

Unit tests were added to test the least connection algorithm flow as well as updating and removing targets.
More testing details can be found on the doc mentioned above.

cc: @Raul9595 @alolita

@alexperez52 alexperez52 requested review from a team and dmitryax July 22, 2021 19:39
@alexperez52
Copy link
Contributor Author

@dashpole @Aneurysm9 @alolita

@alolita
Copy link
Member

alolita commented Jul 26, 2021

@jpkrohling @Aneurysm9 can you please review. Look forward to any feedback you may have. ty!

@jpkrohling
Copy link
Member

I'm planning on review this by the end of this week.

@@ -0,0 +1,128 @@
package allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these packages are used outside of this program, correct? Can we move them to cmd/otel-allocator/internal/*? That way we can be confident their scope is contained and not worry about breaking any external packages that have depended on them.

This can be done separately and doesn't need to hold up this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been addressed?

cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/config/config.go Outdated Show resolved Hide resolved
cmd/otel-allocator/config/config.go Outdated Show resolved Hide resolved
cmd/otel-allocator/discovery/discovery.go Outdated Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Member

Similar to the other PR: I left a couple of comments in the design doc. Let me know if I should go ahead and review this, or if you want to settle the open points first.

Aneurysm9
Aneurysm9 previously approved these changes Aug 3, 2021
@alolita
Copy link
Member

alolita commented Aug 10, 2021

@alexperez52 @Raul9595 can you rerun the tests to make sure the E2E tests are passing.

cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
@alexperez52 alexperez52 requested a review from dashpole August 11, 2021 14:07
Copy link

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall. Mostly suggestions to make the code easier to read.

cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/http.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/http.go Outdated Show resolved Hide resolved
Aneurysm9
Aneurysm9 previously approved these changes Aug 12, 2021
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
@mergify mergify bot dismissed Aneurysm9’s stale review August 13, 2021 14:48

Pull request has been modified.

@alexperez52 alexperez52 requested a review from Aneurysm9 August 13, 2021 14:57
Aneurysm9
Aneurysm9 previously approved these changes Aug 16, 2021
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I couldn't complete the review, but will make another pass later today or tomorrow.

@@ -0,0 +1,128 @@
package allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been addressed?

cmd/otel-allocator/allocation/allocator.go Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator.go Outdated Show resolved Hide resolved
cmd/otel-allocator/collector/collector.go Outdated Show resolved Hide resolved
cmd/otel-allocator/collector/collector_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/collector/collector_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/collector/collector_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/collector/collector_test.go Outdated Show resolved Hide resolved
@mergify mergify bot dismissed Aneurysm9’s stale review August 18, 2021 19:38

Pull request has been modified.

@jpkrohling
Copy link
Member

I'll review this again soon, but please, try to get a PR fixing the intermittent tests that the part 1 introduced.

@alexperez52
Copy link
Contributor Author

I'll review this again soon, but please, try to get a PR fixing the intermittent tests that the part 1 introduced.

Seems like the PR merged today solved the issue with the e2e tests

cmd/otel-allocator/Dockerfile Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/allocator_test.go Outdated Show resolved Hide resolved
}

// Tests that the delta in number of targets per collector is less than 15% of an even distribution
func TestCollectorBalanceWhenAddingAndRemovingAtRandom(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What advantages is this bringing in comparison to the test TestAddingAndRemovingTargets? Is this exercising a different code path than TestCollectorBalanceWhenAddingTargets when it comes to the balancing itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestAddingAndRemovingTargets tests that targets are successfully added with the expected with the expected key
TestCollectorBalanceWhenAddingTargets tests the the balance on an easily divisible number for the purpose of even allocation
TestCollectorBalanceWhenAddingAndRemovingAtRandom uses a more random approach where targets are both added and removed while verifying that at each step there isnt above a 15% difference in workload from collector to collector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get the value-add between the second and the third, but assuming there's enough value justifying the added complexity, isn't it covering the third case covering the second? Meaning, do we need the second? And the first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a straight coverage perspective, I think the third test will cover what the first two do as well. But they may still have value in enabling testing and reasoning about each capability somewhat more independently.

I'd like to take another pass at this allocation logic in the future, but I think for now it is a solid starting point.

}
}

func TestCollectorBalanceWhenCollectorsChanged(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment: what's the difference between this and the previous test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation will do a re allocation from scratch when collector information is modified so this test is for the purposes of verifying that targets were re allocated from scratch. Using a divisible ratio of targets to collectors, its verified that the targets are allocated evenly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer, but do we still need both tests? Isn't one of them covering what the other is doing?

cmd/otel-allocator/allocation/allocator_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/go.mod Show resolved Hide resolved
cmd/otel-allocator/go.mod Outdated Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Member

Ping me when this is ready for review again.

@jpkrohling
Copy link
Member

I understand that @Aneurysm9 will be maintaining this code, so, I'll wait for his review before doing a final round myself.

@jpkrohling jpkrohling requested a review from Aneurysm9 September 8, 2021 09:37
@alolita
Copy link
Member

alolita commented Sep 8, 2021

Hi @jpkrohling thanks for taking a look.

@Aneurysm9 can you do a final review and let's get this merged. Ty!!!

@alolita
Copy link
Member

alolita commented Sep 14, 2021

Ping @Aneurysm9 please review so that @jpkrohling can merge. Thx.

Copy link
Member

@Aneurysm9 Aneurysm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some things I'd like to refactor, particularly in the allocation package and the main() function, but I think this is a solid starting point and provides a usable, and useful, service.

@jpkrohling if you'd like to see that refactoring done before merging this I'd understand, but I also think it could be good to land it as it is to make the capability available and see what happens when people attempt to put it to use.

}

// Tests that the delta in number of targets per collector is less than 15% of an even distribution
func TestCollectorBalanceWhenAddingAndRemovingAtRandom(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a straight coverage perspective, I think the third test will cover what the first two do as well. But they may still have value in enabling testing and reasoning about each capability somewhat more independently.

I'd like to take another pass at this allocation logic in the future, but I think for now it is a solid starting point.

@jpkrohling
Copy link
Member

The e2e tests for Kubernetes 1.19 failed and I just restarted. I'll merge this as soon as the tests pass.

@jpkrohling jpkrohling enabled auto-merge (squash) September 15, 2021 07:02
@jpkrohling jpkrohling merged commit acab291 into open-telemetry:main Sep 15, 2021
shree007 pushed a commit to shree007/opentelemetry-operator that referenced this pull request Dec 12, 2021
…ic) (open-telemetry#354)

* Target Allocation server logic

Co-Authored-By: Alexis Perez <[email protected]>
Co-Authored-By: JBD <[email protected]>

* Added cmd to indicate executable

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Updated discovery manager, collector component and added testing file for collector.go

Updated code to parse config using default Prometheus config and added testing file for collector component.

* Removed unnecessary struct in config.go

* Added load testing

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

* Update cmd/otel-allocator/allocation/allocator.go

* Removed nextCollector and modified locks

* Updated collector.go to reflect new namespace

* Refactored display map logic & updated locking convention

* Updated container port

* Change initialized empty collector to nil collector

Co-authored-by: Anthony Mirabella <[email protected]>

* Updated collector test logic

* Updated allocation files

* Updated allocation import in main.go

* Updated collector & discovery files

* Updated unit tallocator unit tests

* Updated runWatch to prevent panic

* Seperated http logic from allocator logic

* Integrated logr

* Updated collector test to use channels

* Update use of logger and fix error messages

* Update test files

Co-authored-by: Rahul Varma <[email protected]>
Co-authored-by: JBD <[email protected]>
Co-authored-by: Anthony Mirabella <[email protected]>
Co-authored-by: Rahul Varma <[email protected]>
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
…ic) (open-telemetry#354)

* Target Allocation server logic

Co-Authored-By: Alexis Perez <[email protected]>
Co-Authored-By: JBD <[email protected]>

* Added cmd to indicate executable

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Updated discovery manager, collector component and added testing file for collector.go

Updated code to parse config using default Prometheus config and added testing file for collector component.

* Removed unnecessary struct in config.go

* Added load testing

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

Co-authored-by: Anthony Mirabella <[email protected]>

* Update cmd/otel-allocator/allocation/allocator.go

* Update cmd/otel-allocator/allocation/allocator.go

* Removed nextCollector and modified locks

* Updated collector.go to reflect new namespace

* Refactored display map logic & updated locking convention

* Updated container port

* Change initialized empty collector to nil collector

Co-authored-by: Anthony Mirabella <[email protected]>

* Updated collector test logic

* Updated allocation files

* Updated allocation import in main.go

* Updated collector & discovery files

* Updated unit tallocator unit tests

* Updated runWatch to prevent panic

* Seperated http logic from allocator logic

* Integrated logr

* Updated collector test to use channels

* Update use of logger and fix error messages

* Update test files

Co-authored-by: Rahul Varma <[email protected]>
Co-authored-by: JBD <[email protected]>
Co-authored-by: Anthony Mirabella <[email protected]>
Co-authored-by: Rahul Varma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants