Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Docker Compose container execution #687

Merged
merged 6 commits into from
Feb 9, 2022
Merged

Improve Docker Compose container execution #687

merged 6 commits into from
Feb 9, 2022

Conversation

endorama
Copy link
Member

@endorama endorama commented Feb 4, 2022

This PR has been extracted from #665.

When elastic-package runs docker containers it relies on WaitForHealthy() to ensure expected containers are running.

This functions has a missing feature: timing out if the condition is not met.

This PR adds this feature, adding a 10 minute timeout after which an error is returned.

To prevent waiting time waiting for a condition that can never be met a check is added for unrecoverable termination condition (container is reported as exited and with an exit code greater than 0) to immediately exit with an error, skipping waiting until timeout expiration.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 4, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-09T09:47:30.939+0000

  • Duration: 30 min 25 sec

Test stats 🧪

Test Results
Failed 0
Passed 520
Skipped 0
Total 520

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

continue
}

if containerDescription.State.Status == "exited" && containerDescription.State.ExitCode > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same line have been added in 8563a1c#diff-8019656c843609e3cb15ca57ab96dc41fdadae8215d0ff22d9b989f795aef4a4 so this PR will need to be rebased on top of main once #688 is merged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

`WaitForHealthy()` does not implement any logic to timeout if the
healthy condition is never met.

By adding a timeout we make sure this function ends if the expected
condition cannot be met in a certain (known) time frame.
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticker needs to be stopped, for the rest it LGTM 👍

@@ -275,54 +275,58 @@ func (p *Project) WaitForHealthy(opts CommandOptions) error {
return err
}

timeout := time.After(10 * time.Minute)
ticker := time.NewTicker(1 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ticker := time.NewTicker(1 * time.Second)
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()

internal/compose/compose.go Show resolved Hide resolved
@@ -275,54 +275,58 @@ func (p *Project) WaitForHealthy(opts CommandOptions) error {
return err
}

timeout := time.After(10 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest to use context.WithTimeout(), so we can also interrupt calls to docker if they hang, but leveraging this context would require more changes in internal/docker.

Comment on lines 290 to 291
case <-ticker.C:
healthy := true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Do nothing and continue in this case, to have the main part of the loop in the upper level.

Suggested change
case <-ticker.C:
healthy := true
case <-ticker.C:
}
healthy := true

internal/compose/compose.go Show resolved Hide resolved
@@ -275,54 +275,58 @@ func (p *Project) WaitForHealthy(opts CommandOptions) error {
return err
}

timeout := time.After(10 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this timeout to const at the top of the file and name it accordingly.

}
select {
case <-timeout:
return errors.New("time out waiting for healthy container")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: timeout waiting...

if err != nil {
return err
}
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me challenge this and consider another approach.

We already have "for" loop with a 1-sec delay per iteration. The function can store time.Now() and just compare it if the startTime.Add(timeout) is greater than time.Now. We don't need to add channels/select/ticker. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time searching differences between time.Sleep and time.Tick, and the only thing I found is that a tick tries to guarantee the interval, while sleep is unaware of execution times. There seems to be no difference in performance for the 2 implementations.

In general I prefer the ticker approach as it's behaviour is explicit, but in this case I don't see particular benefits as guaranteeing the interval is not a concern.

I'll rework it with your approach.

Copy link
Member

@jsoriano jsoriano Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the ticker and select approach too. It also works better than time arithmetic when you have multiple sources of events, as may happen if/when we have contexts here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can wait to have contexts here, not a strong opinion 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here as adding a for-select block will introduce another nesting level and make the function hard to read. TBH I would follow the KISS rule, so if it's possible to solve with a single if-clause, I would go for it.

In general, we can consider refactoring of this area, as there is a similar logic in the SIGINT function. It would be great, If we can refactor all parts (SIGINT, timeout, waiter logic, contexts). You have my sword :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By running a quick search I found there are 5 for { loops in the codebase. I think it would make a lot of sense to generalize the approach, so interval/timeout/signals are handled consistently.

Are you ok to keep it for a future PR? I already have a ticker-based Waiter struct implementation to use as a baseline, but I'm blocked on other work streams so I would prefer to push this refactor to the back burner for a bit.

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you ok to keep it for a future PR? I already have a ticker-based Waiter struct implementation to use as a baseline, but I'm blocked on other work streams so I would prefer to push this refactor to the back burner for a bit.

Sure, we can leave it in the backlog, that what I meant. Let's focus on making this PR healthy.

timeout := time.Duration(waitForHealthyTimeout)
startTime := time.Now()
timeoutTime := startTime.Add(timeout)
interval := time.Duration(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can push this one to consts too.

@@ -275,20 +277,23 @@ func (p *Project) WaitForHealthy(opts CommandOptions) error {
return err
}

timeout := time.Duration(waitForHealthyTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can push this one as var to the top of the file and name accordingly.

}

if !healthy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about shuffling around with healthy var? I think it broke down system tests. You can look at the original implementation and look for the bug or revert it, and leave the timeout improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bug is obvious, I misinterpreted the logic. healthy must be reinitialized at each loop, otherwise once it become false the logic does not change it's value anymore. I tried to use it to detect the timeout, but I stumbled upon the bug. The current implementation is clearer and uses times instead of a side effect on the healthy value.

healthy := true

logger.Debugf("Wait for healthy containers: %s", strings.Join(containerIDs, ","))
descriptions, err := docker.InspectContainers(containerIDs...)
if err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's leave this empty line

Comment on lines 26 to 30
// waitForHealthyTimeout is the maximum duration for WaitForHealthy().
var waitForHealthyTimeout = time.Duration(10 * time.Minute)

// waitForHealthyInterval is the check interval for WaitForHealthy().
const waitForHealthyInterval = time.Duration(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can make those consistent:

const (
   waitForHealthyTimeout = ...
   waitForHealthyInterval = ...
)

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything against for {s, I am ok with leaving them if they make sense 🙂

@@ -23,6 +23,12 @@ import (
"github.com/elastic/elastic-package/internal/signal"
)

// waitForHealthyTimeout is the maximum duration for WaitForHealthy().
var waitForHealthyTimeout = time.Duration(10 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. No need to convert to time.Duration, a number multiplying a duration is already a duration.

containerIDs := strings.Split(strings.TrimSpace(b.String()), "\n")
for {
for time.Now().Before(timeout) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Break the loop directly with a return so we don't have to remember to check for the timeout again after the loop? (And we are consistent on how we handle other end conditions as SIGINT).

Suggested change
for time.Now().Before(timeout) {
for {
if time.Now().After(timeout) {
return errors.New("timeout waiting for healthy container")
}

@mtojek mtojek requested review from jsoriano and mtojek February 9, 2022 10:04
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsoriano jsoriano merged commit b5bc909 into elastic:main Feb 9, 2022
@endorama endorama deleted the compose-wait-for-healthy branch February 9, 2022 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants