Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flake deploymentconfigs with test deployments [Conformance] [It] should run a deployment to completion and then scale to zero #13980

Closed
abutcher opened this issue May 1, 2017 · 16 comments
Assignees
Labels
component/apps kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1

Comments

@abutcher
Copy link
Member

abutcher commented May 1, 2017

• Failure [26.555 seconds]
deploymentconfigs
/go/src/github.com/openshift/origin/test/extended/deployments/deployments.go:908
  with test deployments [Conformance]
  /go/src/github.com/openshift/origin/test/extended/deployments/deployments.go:309
    should run a deployment to completion and then scale to zero [It]
    /go/src/github.com/openshift/origin/test/extended/deployments/deployments.go:308

    Expected
        <string>: --> pre: Running hook pod ...
        test pre hook executed
        --> pre: Success
        --> Scaling deployment-test-1 to 2
        --> Waiting up to 10m0s for pods in rc deployment-test-1 to become ready
    to contain substring
        <string>: --> Success

    /go/src/github.com/openshift/origin/test/extended/deployments/deployments.go:256

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/87/consoleFull#-210271339656cbb9a5e4b02b88ae8c2f77

@enj
Copy link
Contributor

enj commented May 16, 2017

@bparees
Copy link
Contributor

bparees commented May 19, 2017

@smarterclayton
Copy link
Contributor

This is now the #1 break in the queue and testing on master

@mfojtik
Copy link
Contributor

mfojtik commented May 29, 2017

@smarterclayton ok, this code does oc logs -f pods/deployer to collect the logs. It stores the output of that command into out variable when the -f drops (which happen when the deployer pod is gone/complete). It seems like:

  1. We miss buff.sync() somewhere to flush all logs into out
  2. Something changed in oc logs -f and we won't get the complete logs from the pod or we miss the last line of the log output (did something changed in upstream re: logging?)

@mfojtik
Copy link
Contributor

mfojtik commented May 29, 2017

I ran this test for some time in a loop and no luck reproducing this :/

From the log it also seems that the rollout completed successfully, there is no error in the third RC and the rollout took reasonable time as well.

@mfojtik
Copy link
Contributor

mfojtik commented May 29, 2017

@Kargakis @smarterclayton ok, the only reasonable way to fix it and unblock the queue is to use the deployment config condition to check the success (i can check the NewRcAvailableReason).

I will add FIXME there for now and will continue investigating.

PR: #14395

@mfojtik
Copy link
Contributor

mfojtik commented May 30, 2017

This is journald rate-limiter issue, as a band-aid we are going to disable the --> Success check in logs (we already check the conditions/reason to make sure the rollout completed successfully).

Moving this to P2 just to track this once the rate-limiter is fixed.

@smarterclayton
Copy link
Contributor

Still happening

@bparees
Copy link
Contributor

bparees commented Aug 31, 2017

Moving this to P2 just to track this once the rate-limiter is fixed.

@mfojtik was the rate-limiter ever fixed?

@mfojtik
Copy link
Contributor

mfojtik commented Aug 31, 2017

@bparees it was tweaked for some tests but not for deployment tests... so no, it was not fixed (the original issues is still opened).

@tnozicka
Copy link
Contributor

@mfojtik close in favor of #17747 ?

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 12, 2018
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/apps kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/P1
Projects
None yet
Development

No branches or pull requests

10 participants