Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e database pod never comes up #14090

Closed
stevekuznetsov opened this issue May 6, 2017 · 11 comments
Closed

e2e database pod never comes up #14090

stevekuznetsov opened this issue May 6, 2017 · 11 comments
Assignees
Labels
component/apps kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2

Comments

@stevekuznetsov
Copy link
Contributor

Running test/end-to-end/core.sh:15: executing 'oc get -n test pods -l name=database' expecting any result and text 'Running'; re-trying every 0.2s until completion or 60.000s...
FAILURE after 59.904s: test/end-to-end/core.sh:15: executing 'oc get -n test pods -l name=database' expecting any result and text 'Running'; re-trying every 0.2s until completion or 60.000s: the command timed out
Standard output from the command:
Standard error from the command:
No resources found.
... repeated 131 times

Seen in this job. @bparees can't really make heads or tails of this but we have all the pod logs and the master log in the artifacts so hopefully we can determine what went wrong.

@bparees
Copy link
Contributor

bparees commented May 8, 2017

@stevekuznetsov the etcd dump is pretty much empty. do we have confidence that the etcd dump normally works?

@knobunc can you comment on these errors as seen in the origin.log:

W0506 05:02:49.178089    2247 docker_sandbox.go:263] Couldn't find network status for test/ruby-sample-build-1-build through plugin: invalid network status for

? are they benign? anything we (as the build pod owners) should be doing differently to avoid them?

the database deploy log appears to show things being stuck in running the mid hook pod (despite the DC in question not having a mid-hook.. this is the application-template-stibuild.json template), and there are no logs for those hooks.

@mfojtik seems like a deploy issue, based on the oc get pods output it seems like the DB pod is never even getting created.

@stevekuznetsov
Copy link
Contributor Author

@deads2k @sttts did we implement etcdv3 dump for these tests? Or are we still just doing v2/keys?recursive=true?

@deads2k
Copy link
Contributor

deads2k commented May 8, 2017

@deads2k @sttts did we implement etcdv3 dump for these tests? Or are we still just doing v2/keys?recursive=true?

I don't remember doing it.

@stevekuznetsov
Copy link
Contributor Author

I'll add it to the post-rebase tasks

@sttts sttts mentioned this issue May 8, 2017
32 tasks
@deads2k
Copy link
Contributor

deads2k commented May 8, 2017

I'll add it to the post-rebase tasks

Not a great place for it. etcd3 has been around for ages.

@stevekuznetsov
Copy link
Contributor Author

Wasn't it turned on by default with the rebase?

@deads2k
Copy link
Contributor

deads2k commented May 8, 2017

Wasn't it turned on by default with the rebase?

I thought that happened back in 1.4 or 1.5 and we rolled it back for unrelated reasons.

@stevekuznetsov
Copy link
Contributor Author

At the time it was first introduced I also asked for this dump, but it became unnecessary when it was turned off. Whoever pulls the trigger to turn it on owns this issue -- you can't turn it on and walk away without making sure the tests are outputting reasonable sets of debugging artifacts. I'm not particularly interested in arguing about who should or should not own this but if you feel strongly and want to make an issue, triage it to someone else and make sure they understand that is is a high priority I have no issue with removing the item from the post-rebase task list.

@deads2k
Copy link
Contributor

deads2k commented May 8, 2017

An issue already exists: #11837 . It's been pre-existing for 6 months. It isn't a 1.6 rebase blocker.

@stevekuznetsov
Copy link
Contributor Author

Sure, an issue was made at the time to track the issue. It hasn't been relevant to the product or the tests since then because we have not been using v3. I don't understand your point. It is valid and important now. I don't think I'd say it was a release blocker, as test infrastructure is not necessarily ever going to be in that category, but as I said before -- the engineers responsible for turning on v3 by default should also do the right thing for the community and update the test infrastructure so other engineers like @bparees can be effective when investigating test failures.

@mfojtik
Copy link
Contributor

mfojtik commented Oct 11, 2017

Closing as dupe and due to age.

@mfojtik mfojtik closed this as completed Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/apps kind/test-flake Categorizes issue or PR as related to test flakes. priority/P2
Projects
None yet
Development

No branches or pull requests

4 participants