Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore excluded processes from minimum uptime calculation when doing a rolling bounce #1333

Conversation

ltsampros
Copy link
Contributor

@ltsampros ltsampros commented Aug 25, 2022

Description

Ignore processes that are excluded when calculating the minimum uptime.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Discussion

Testing

Unit tests

Documentation

Follow-up

@ltsampros ltsampros marked this pull request as ready for review August 25, 2022 15:22
@ltsampros ltsampros force-pushed the feature/ltsampros/ignore-excluded-processes-when-bouncing branch from 5181f16 to c02da4a Compare August 25, 2022 16:35
Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the early approval had an outdated view of the code.

addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]] = append(addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]], process.Address)

if process.UptimeSeconds < minimumUptime {
if process.UptimeSeconds < minimumUptime && !process.Excluded {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of checking here that the process is not excluded? If the process is excluded we will continue the loop anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was actually referring to the statement in line 62-64 where we check if the process is excluded and if so we will continue the for loop (which is correct) but in line 68 we check again if the process is not excluded which is a case we can't hit based on the current code.

I think there is still one case we have to solve: If a process was manually excluded (but never included again) we potentially hit the case that len(missingAddress) > 0 is true and we requeue forever until the process is included again or marked as removal. I believe the correct solution to this is to remove the if statement in line 62-64 and having the statement process.UptimeSeconds < minimumUptime && !process.Excluded should be enough for our use case. Could you add a unit test to ensure we handle this case properly (I hope I explained it well enough).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was supposed to be removed. Amended.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 5768d98dd4ab77975a63eb387b79a73c2701ee39
  • Duration 2:25:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 5181f169793a8fa9b968002165de9a102e9297e1
  • Duration 2:04:05
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c02da4a05c0808c17604ebab2b13ebcdc47dbf2c
  • Duration 3:44:32
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@ltsampros
Copy link
Contributor Author

Sorry for the early approval had an outdated view of the code.

Apologies for that, I read the code more after submitting and realised the other approach is better.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 11020a6556cc7a1116ce2c7db977f876ff70baeb
  • Duration 1:54:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI: You can trigger the e2e pipeline by closing and reopening the PR.

addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]] = append(addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]], process.Address)

if process.UptimeSeconds < minimumUptime {
if process.UptimeSeconds < minimumUptime && !process.Excluded {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was actually referring to the statement in line 62-64 where we check if the process is excluded and if so we will continue the for loop (which is correct) but in line 68 we check again if the process is not excluded which is a case we can't hit based on the current code.

I think there is still one case we have to solve: If a process was manually excluded (but never included again) we potentially hit the case that len(missingAddress) > 0 is true and we requeue forever until the process is included again or marked as removal. I believe the correct solution to this is to remove the if statement in line 62-64 and having the statement process.UptimeSeconds < minimumUptime && !process.Excluded should be enough for our use case. Could you add a unit test to ensure we handle this case properly (I hope I explained it well enough).

@ltsampros ltsampros requested a review from johscheuer August 26, 2022 10:12
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 09ea09832f88ea3a85fd6ea1ae7642404041a0c3
  • Duration 2:12:25
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@ltsampros ltsampros closed this Aug 26, 2022
@ltsampros ltsampros reopened this Aug 26, 2022
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 06b885034c3076b978f24764b97f1919322b6ed4
  • Duration 1:52:50
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c9d5e7bc27fd85ed62f01c4052a626c1d44d0285
  • Duration 1:54:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@ltsampros ltsampros changed the title Ignore excluded processes when doing a rolling bounce Ignore excluded processes from minimum uptime calculation when doing a rolling bounce Aug 26, 2022
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c9d5e7bc27fd85ed62f01c4052a626c1d44d0285
  • Duration 1:57:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Please remove the comment before merging.

for _, process := range status.Cluster.Processes {
addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]] = append(addressMap[process.Locality[fdbv1beta2.FDBLocalityInstanceIDKey]], process.Address)

if process.UptimeSeconds < minimumUptime {
// comment to trigger PR builder
Copy link
Member

@johscheuer johscheuer Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this comment again?

@ltsampros ltsampros force-pushed the feature/ltsampros/ignore-excluded-processes-when-bouncing branch from c605fcd to fbd6fee Compare August 31, 2022 11:14
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: c605fcd8ab58a9cc989105e8e1b3df0a273a4e90
  • Duration 1:54:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

@ltsampros ltsampros merged commit 42ffb7b into FoundationDB:main Aug 31, 2022
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fbd6fee
  • Duration 1:52:33
  • Result: ❌ FAILED
  • Error: Error while executing command: if $(grep -q -- "--- FAIL:" logs/*.log); then echo "TESTS FAILED SEE THESE LOGS:"; echo ; grep -l -- "--- FAIL:" logs/*.log; exit 1; fi. Reason: exit status 1
  • Build Logs (available for 30 days)
  • Build Artifact (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants