Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More resilient DRA packaging #39332

Merged
merged 1 commit into from
May 1, 2024

Conversation

dliappis
Copy link
Contributor

@dliappis dliappis commented May 1, 2024

Proposed commit message

Occasionally packaging steps from the DRA pipeline may get stuck1. This causes a breach of the global pipeline timeout (currently 1hr) and cancels the job.

This commit increases the global timeout to 90min, adds one retry per step and limits the runtime per step to 40min (so that a single stuck step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover for cases where the retries didn't help and there were no subsequent commits to trigger a new build.

Tests

Test BK build using this PR targetting main: https://buildkite.com/elastic/beats-packaging-pipeline/builds/119

which eventually succeeded despite a stuck step:

image

Related issues

Footnotes

  1. https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114
@dliappis dliappis requested review from alexsapran and pazone May 1, 2024 09:59
@dliappis dliappis self-assigned this May 1, 2024
@dliappis dliappis requested a review from a team as a code owner May 1, 2024 09:59
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 1, 2024
Copy link
Contributor

mergify bot commented May 1, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @dliappis? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 15 min 42 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@dliappis dliappis added backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.13.0 Automated backport with mergify backport-v8.14.0 Automated backport with mergify labels May 1, 2024
@dliappis dliappis merged commit 726f6e9 into elastic:main May 1, 2024
12 checks passed
mergify bot pushed a commit that referenced this pull request May 1, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

# Conflicts:
#	.buildkite/packaging.pipeline.yml
#	catalog-info.yaml
mergify bot pushed a commit that referenced this pull request May 1, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

# Conflicts:
#	.buildkite/packaging.pipeline.yml
#	catalog-info.yaml
mergify bot pushed a commit that referenced this pull request May 1, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

# Conflicts:
#	catalog-info.yaml
dliappis added a commit that referenced this pull request May 1, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

---------

Co-authored-by: Dimitrios Liappis <[email protected]>
dliappis added a commit that referenced this pull request May 3, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

---------

Co-authored-by: Dimitrios Liappis <[email protected]>
dliappis added a commit that referenced this pull request May 7, 2024
Occasionally packaging steps from the DRA pipeline may get stuck[^1].
This causes a breach of the global pipeline timeout (currently 1hr) and
cancels the job.

This commit increases the global timeout to 90min, adds one retry per
step and limits the runtime per step to 40min (so that a single stuck
step doesn't exhaust the entire global timeout).

Finally, we shush slack notifications if the retry recovered the step.

In a future PR we will consider also adding a daily DRA build to cover
for cases where the retries didn't help and there were no subsequent
commits to trigger a new build.

[^1]: https://buildkite.com/elastic/beats-packaging-pipeline/builds/114

(cherry picked from commit 726f6e9)

---------

Co-authored-by: Dimitrios Liappis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.13.0 Automated backport with mergify backport-v8.14.0 Automated backport with mergify ci Team:Ingest-EngProd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants