Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that info.NCPU is not zero #29302

Merged
merged 2 commits into from
Jan 18, 2022
Merged

Check that info.NCPU is not zero #29302

merged 2 commits into from
Jan 18, 2022

Conversation

Tacklebox
Copy link
Contributor

What does this PR do?

This adds a check for 0 when determining the concurrency limit used by mage during a build.

Why is it important?

docker info -f '{{ json .}}' exits with 0 even if there was an error, so the check for err == nil doesn't catch errors, and the default int value of 0 is used when trying to read info.NCPU. This leads to the confusing behaviour where the build stops and hangs forever with no errors and no indication that anything has gone wrong unless you build with --verbose.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Stop the docker daemon systemctl stop docker and ensure you don't have it set to automatically start up, either by disabling the socket or masking the service.
Ensure you haven't exported MAX_PARALLEL in your shell.
Test to make sure the daemon isn't running by executing docker info -f '{{ json .}}' and validating that the output looks like

{"ServerErrors":["Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"]...

Attempt to build the elastic agent. DEV=true SNAPSHOT=true PLATFORMS=linux/amd64 TYPES=docker mage package

On main it will silently hang forever.

On this branch, print the error docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock when it is unable to run the build.

Related issues

Use cases

Screenshots

Logs

@Tacklebox Tacklebox added the bug label Dec 6, 2021
@Tacklebox Tacklebox requested a review from a team as a code owner December 6, 2021 20:22
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2021
@mergify
Copy link
Contributor

mergify bot commented Dec 6, 2021

This pull request does not have a backport label. Could you fix it @Tacklebox? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 6, 2021
@Tacklebox Tacklebox added the Team:Elastic-Agent Label for the Agent team label Dec 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2021
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-12-06T20:22:28.873+0000

  • Duration: 120 min 8 sec

  • Commit: e835ec4

Test stats 🧪

Test Results
Failed 0
Passed 48241
Skipped 4267
Total 52508

Steps errors 4

Expand to view the steps failures

Sleep
  • Took 0 min 1 sec . View more details here
  • Description: 5
x-pack/metricbeat-unitTest - mage build unitTest
  • Took 4 min 42 sec . View more details here
  • Description: mage build unitTest
Google Storage Download
  • Took 1 min 20 sec . View more details here
Checks if running on a Unix-like node
  • Took 0 min 4 sec . View more details here

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@botelastic
Copy link

botelastic bot commented Jan 5, 2022

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 5, 2022
Copy link
Contributor

@fearful-symmetry fearful-symmetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change LGTM. We'll have to re-merge with master to check CI

@botelastic botelastic bot removed the Stalled label Jan 6, 2022
@kvch kvch merged commit 10f850e into elastic:master Jan 18, 2022
@kvch kvch added the backport-v8.0.0 Automated backport with mergify label Jan 18, 2022
mergify bot pushed a commit that referenced this pull request Jan 18, 2022
## What does this PR do?

This adds a check for 0 when determining the concurrency limit used by mage during a build.

## Why is it important?

`docker info -f '{{ json .}}'` exits with 0 even if there was an error, so the check for `err == nil` doesn't catch errors, and the default int value of 0 is used when trying to read `info.NCPU`. This leads to the confusing behaviour where the build stops and hangs forever with no errors and no indication that anything has gone wrong unless you build with --verbose.

(cherry picked from commit 10f850e)
kvch pushed a commit that referenced this pull request Jan 18, 2022
## What does this PR do?

This adds a check for 0 when determining the concurrency limit used by mage during a build.

## Why is it important?

`docker info -f '{{ json .}}'` exits with 0 even if there was an error, so the check for `err == nil` doesn't catch errors, and the default int value of 0 is used when trying to read `info.NCPU`. This leads to the confusing behaviour where the build stops and hangs forever with no errors and no indication that anything has gone wrong unless you build with --verbose.

(cherry picked from commit 10f850e)

Co-authored-by: Maxwell Borden <[email protected]>
yashtewari pushed a commit to build-security/beats that referenced this pull request Jan 30, 2022
## What does this PR do?

This adds a check for 0 when determining the concurrency limit used by mage during a build.

## Why is it important?

`docker info -f '{{ json .}}'` exits with 0 even if there was an error, so the check for `err == nil` doesn't catch errors, and the default int value of 0 is used when trying to read `info.NCPU`. This leads to the confusing behaviour where the build stops and hangs forever with no errors and no indication that anything has gone wrong unless you build with --verbose.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify backport-v8.0.0 Automated backport with mergify bug Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants