Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the case when the request to CC /state endpoint returns the async task instead of the result #974

Conversation

aguzovatii
Copy link
Contributor

@aguzovatii aguzovatii commented May 16, 2023

Description

This PR attempts to fix #828.

Currently marked this PR as draft until the change in the API module introduced with the #975 PR is merged.

When the request on CC /state endpoint takes longer than webserver.request.maxBlockTimeMs, then CC will convert the request into an async task and will respond with the taskID instead of the result. The Koperator should handle this case by waiting for CC to complete the task.

The approach is the following:

  1. When getting the state of Cruise Control we can have 2 possible scenarios
    1. Cruise Control returns the actual state, in this case (as before) we just return the state and continue the reconciliation
    2. Cruise Control converts the Status request into an async task and returns just the taskID, then:
      1. We create a new CruiseControlOperation to track the progress of the Status request and requeue the reconciliation for later
      2. Next time the reconciliation is started, we notice that there is already a Status CruiseControlOperation in progress and just request the latest status of this task. If not ready, then we requeue again. This step is repeated until the CC task is completed

Notes:

  • we will have no more than 1 Status CruiseControlOperation in progress for a kafkaClusterReference.
  • I added a default timeout of 5 minutes, meaning that if we get the Status of CC in more than 5 minutes, then we will consider this outdated and retry

Type of Change

  • Bug Fix

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

@CLAassistant
Copy link

CLAassistant commented May 16, 2023

CLA assistant check
All committers have signed the CLA.

pkg/scale/scale.go Outdated Show resolved Hide resolved
pkg/scale/scale.go Outdated Show resolved Hide resolved
@aguzovatii aguzovatii force-pushed the handle_async_response_from_cc_state_endpoint branch from a4b68c1 to 9cb5644 Compare May 19, 2023 12:59
@aguzovatii
Copy link
Contributor Author

Hey @bartam1, thank you for your review!
I addressed a few of the easier suggestions

I will be in vacation for a week, so I will not be responsive during this time.
The rest of suggestions I will be able to address starting with 29th of May.

@bartam1
Copy link
Contributor

bartam1 commented Jun 2, 2023

New api tag has been created: https://github.com/banzaicloud/koperator/tree/api/v0.28.1
cc @aguzovatii

@aguzovatii aguzovatii force-pushed the handle_async_response_from_cc_state_endpoint branch from a53d75b to aae0bd7 Compare June 6, 2023 11:53
@aguzovatii aguzovatii marked this pull request as ready for review June 6, 2023 12:02
@aguzovatii aguzovatii requested a review from a team as a code owner June 6, 2023 12:02
@aguzovatii
Copy link
Contributor Author

Thank you, @bartam1!
I upgraded github.com/banzaicloud/koperator/api to v0.28.1 and marked this pr as ready for review, let me know if there are other things to address

panyuenlau
panyuenlau previously approved these changes Jun 7, 2023
Copy link
Member

@panyuenlau panyuenlau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation LGTM.

Just one question (I might be nit-picking here):

I found the naming of this OperationStatus CruiseControlTaskOperation = "status" in #975 is a bit misleading: the CruiseControlTaskOperation type was created to represent the actual CC operation. So when I saw OperationStatus CruiseControlTaskOperation = "status", I would assume there is an actual CC operation called "status", but I checked the CC wiki and source code and I couldn't really find such an operation.

But this could be due to my limited knowledge on cruise control

@panyuenlau panyuenlau requested a review from bartam1 June 7, 2023 15:28
@aguzovatii
Copy link
Contributor Author

Hi @panyuenlau, there is the CC GET state endpoint that is corresponding with the newly added OperationStatus CruiseControlTaskOperation = "status", check CC REST API docs.

I used the status name instead of state just to be consistent with how it was used in Koperator until now (check CruiseControlScaler):

Status(ctx context.Context) (CruiseControlStatus, error)

I don't know the reason why it's status in Koperator, but we can change everywhere from status to state to not create confusion in the future if you think it's better.

@panyuenlau
Copy link
Member

@aguzovatii Thank you for the explanation behind this!

I think it is fine in terms of this PR. We will refactor the naming in a separate PR if our team decide to go that way.

bartam1
bartam1 previously approved these changes Jun 10, 2023
@@ -295,6 +295,21 @@ func (mr *MockCruiseControlScalerMockRecorder) Status(ctx interface{}) *gomock.C
return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Status", reflect.TypeOf((*MockCruiseControlScaler)(nil).Status), ctx)
}

// StatusTask mocks base method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-run make mock-generate to generate this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, generated the file in 193a716

@aguzovatii aguzovatii dismissed stale reviews from bartam1 and panyuenlau via 193a716 June 12, 2023 07:16
panyuenlau
panyuenlau previously approved these changes Jun 12, 2023
…nc task instead of the result

When the request on /state endpoint of CC takes longer than webserver.request.maxBlockTimeMs,
then CC will convert the request into an async task and will respond with the taskID instead
of the result. The Koperator should handle this case by waiting for CC to complete the task.
@aguzovatii aguzovatii force-pushed the handle_async_response_from_cc_state_endpoint branch from 193a716 to 02c7fc3 Compare June 14, 2023 15:22
@panyuenlau panyuenlau merged commit 7be5b82 into banzaicloud:master Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Koperator panics when CruiseControl responds with a progress/async response
5 participants