#2488 Update vendoring to remove engine-api #2492

michelvocks · 2016-09-24T17:01:44Z

This PR is a huge one. Tell me if I should split it up somehow.

What I've done

Changed imports from engine-api to docker/api/type or docker/clients
Updated whole vendoring (not sure if this was a good idea)
docker/docker/pkg/version has been replaced upstream with docker/api/types/versions
Docker upstream docker/pkg/tlsconfig has been updated and does not work anymore with golang 1.5 (only 1.6 and higher is supported). Function "Clone" is only implemented for 1.6 or higher. Therefore, I upgraded godeps master version to golang 1.7.
cluster/event_monitor.go was changed to be compatible with the changed Events-API interface.

What needs to be done?

Some tests are currently failing. Working on this.
Golang needs to be changed to 1.7 in the docs.

This fixes #2488

ping @nishanttotla @dongluochen @allencloud

Signed-off-by: Michel Vocks [email protected]

michelvocks · 2016-09-26T18:53:34Z

Fixed all unit tests. Most CIs are failing because go 1.6 or higher is required (by upstream). PTAL @nishanttotla

nishanttotla · 2016-10-13T20:41:17Z

Ping @michelvocks sorry for the delay on this. We'd like to pick it up again and get it ready soon. Could you please fix the merge conflict?

michelvocks · 2016-10-17T07:38:12Z

@nishanttotla No issue. I've fixed the merge conflict.

dongluochen · 2016-10-18T02:02:11Z

Build is failing. I'm tracking similar problem in #2522.

++ time go build -o /tmp/tmp.1rpt7MLm4W ../..
# github.com/docker/swarm/vendor/github.com/docker/docker/client
../../vendor/github.com/docker/docker/client/hijack.go:144: undefined: "github.com/docker/swarm/vendor/github.com/docker/docker/pkg/tlsconfig".Clone
Build step 'Execute shell' marked build as failure

dongluochen · 2016-10-18T06:46:18Z

github.com/docker/docker/pkg/tlsconfig defines Clone() for go1.6+. Docker/swarm is using go1.5.4. #2522 bumps it to go1.7.1. After that we can use Clone() in github.com/docker/docker/pkg/tlsconfig.

nishanttotla · 2016-10-19T23:42:02Z

@michelvocks sorry for the delay on this, we're updating CI and Go version right now. This is needed to fix some failing tests. I'll ping you when that is done. You'll have to rebase one more time.

dongluochen · 2016-10-20T00:59:13Z

@michelvocks We have updated Go to 1.7.1. Please rebase your PR.

michelvocks · 2016-10-20T09:26:27Z

Thanks. Will do that now.

michelvocks · 2016-10-20T09:50:36Z

I've rebased my PR.

dongluochen · 2016-10-20T21:46:49Z

Something is not right with godep. Godep is using go1.6 in your PR. But I think it's something else causes the failure.

I can run the following command locally with your PR to get the error.

dchen@vm2:~/go/src/github.com/docker/swarm$ ./test/integration/run.sh
Using default tag: latest
...
++ chmod +x /usr/local/bin/docker-compose
++ time go build -o /tmp/tmp.4eCkwdBxZb ../..
# github.com/docker/swarm/vendor/github.com/docker/docker/client
../../vendor/github.com/docker/docker/client/hijack.go:144: undefined: "github.com/docker/swarm/vendor/github.com/docker/docker/pkg/tlsconfig".Clone

michelvocks · 2016-10-21T13:49:48Z

@dongluochen I've set godep to golang 1.6 (like described in my initial PR message). That was before you decided to switch golang to 1.7. Should I change godep to 1.7?

I'm also not able to reproduce the error you have mentioned. Are you sure that you use golang 1.6 or higher?

dongluochen · 2016-10-21T17:48:10Z

@michelvocks Yes please upgrade it to go1.7.1 as docker/docker is already on it.

If you open the Jenkins Details links below you can see the same error.

michelvocks · 2016-10-22T09:46:03Z

@dongluochen I've changed godeps to 1.7 golang version.

The CI jobs are still failing (like you said). I think this is related to the dockerswarm environment image. This image is used by the CI jobs and there is still golang 1.5 installed. I think we have to upgrade this.

dongluochen · 2016-10-24T18:44:01Z

@michelvocks You are right. We have updated dockerswarm environment image.

michelvocks · 2016-10-24T19:04:01Z

Great. Can you restart the CI jobs?

dongluochen · 2016-10-25T00:46:02Z

@michelvocks There is something wrong with this change. I can reproduce the test failure in my machine. It looks like node-0 failure would not move the node to Unhealthy state.

# time="2016-10-24T19:08:09Z" level=debug msg="Engine refresh failed" id="QE24:GSA5:RVLO:VV2Z:2F6Y:OXYK:CLBH:HOQ2:WTNO:M5DQ:6BHP:PYQ4" name=node-0 
# time="2016-10-24T19:08:09Z" level=debug msg="Engine update succeeded" id="2AAU:NJNU:Q4NK:M7W4:BBHU:3MDW:TLGM:LAA4:M3RU:6LJQ:ZH7V:LJTD" name=node-1 
# time="2016-10-24T19:08:09Z" level=debug msg="Start monitoring events" id="QE24:GSA5:RVLO:VV2Z:2F6Y:OXYK:CLBH:HOQ2:WTNO:M5DQ:6BHP:PYQ4" name=node-0 
# time="2016-10-24T19:08:09Z" level=debug msg="Start monitoring events" id="QE24:GSA5:RVLO:VV2Z:2F6Y:OXYK:CLBH:HOQ2:WTNO:M5DQ:6BHP:PYQ4" name=node-0 
# Command "eval docker_swarm info | grep -q 'Unhealthy'" failed 5 times. Output:

michelvocks · 2016-10-25T12:12:18Z

@dongluochen Thanks. I found the bug. It seems like the string comparison in the file cluster/engine.go was not correct anymore:

// dockerclient defines ErrConnectionRefused error. but if http client is from swarm, it's not using
    // dockerclient. We need string matching for these cases. Remove the first character to deal with
    // case sensitive issue.
    // docker/api returns ErrConnectionFailed error, so we check for that as long as dockerclient exists
    if err == dockerclient.ErrConnectionRefused ||
        err == engineapi.ErrConnectionFailed ||
        strings.Contains(err.Error(), "onnection refused") ||
        strings.Contains(err.Error(), "annot connect to the docker engine endpoint") {
        // each connection refused instance may increase failure count so
        // engine can fail fast. Short engine freeze or network failure may result
        // in engine marked as unhealthy. If this causes unnecessary failure, engine
        // can track last error time. Only increase failure count if last error is
        // not too recent, e.g., last error is at least 1 seconds ago.
        e.incFailureCount()
        // update engine error message
        e.setErrMsg(err.Error())
        return
    }

It compares the error message (e.g. cannot connect to the docker engine endpoint) with a hardcoded string. This string (the error message) has been changed within the last API change from "cannot connect to the docker engine endpoint" to "Cannot connect to the Docker daemon at...". Therefore, the compare was always wrong.

Edit: Still have to work for the master CI. Will try to investigate the issue.

michelvocks · 2016-11-23T17:34:02Z

@nishanttotla Welcome! ☺️
Yes, correct. It seems like the master-branch has problems with the changes from the PR #2541.
If you want I can have a look at this but it will take time to figure out what's going on there.

I fixed the issue with the engine-1.9, engine-1.10 and engine-1.11. I also fixed an issue with one of the tests on the master-branch. A new line "Minimum API version" has been added to the output of docker version command which led to a failure for one of the tests. I'm not quite sure why this test is still failing. Manually reproducing this test within an docker development environment works now. Maybe the test environment/image needs an update?

nishanttotla · 2016-11-23T21:52:41Z

@michelvocks actually it looks great! There was some issue with the engine-master tests, so I've restarted the CI.

nishanttotla · 2016-11-23T22:26:38Z

@michelvocks in addition to the API version test and leader election, several other engine-master tests are failing as well. Do you have some idea why?

michelvocks · 2016-11-24T11:47:04Z

@nishanttotla I was able to fix the api version test. I had to remove whitespaces before doing the comparison.
Additionally, I was able to figure out why the other tests are failing. The image affinity scheduling does not work. I'm currently investigating this issue.

…API changes included. Signed-off-by: Michel Vocks <[email protected]>

michelvocks · 2016-11-25T23:12:16Z

@nishanttotla @dongluochen
After another full day of work on this PR I think it is now ready to merge (at least from my side).
I was able to solve all issues related to engine-master especially the leader-election tests.

leader-election tests
replica.go holds an array of localRoutes which should be handled locally. If a request arrives which is not in the list it will be hijacked and routed to the swarm primary manager. Due to the fact that the docker engine 1.13 client now uses first the /_ping endpoint before serving the /info endpoint, the /info request has been always routed to the primary swarm manager. The solution was quite easy, I just added the /_ping endpoint to the localRoutes array. Therefore, the test about checking if this daemon is a replica was always false because the request has been always hijacked and forwarded to the primary swarm manager.

image scheduling affinity
Some tests were failing because the scheduler couldn't apply the image affinity. I found out that the cluster.go file included an error message check which has been changed within docker 1.13 from image xyz not found to repository xyz not found. That check was hindering the scheduler to find the correct node where the image is already existing. The solution was to add another check for the new expected error message.

Docker run with --rm flag
There was one check where a container has been started with the --rm-flag. I found out that the docker daemon always throws the following error when running a simple container error removing container: Error response from daemon: Container 52839ba6b7f16d2ff8a92f19eeb2077205e7d3d2aa04fe1941836387c127ed42 not found. I think this error comes from the fact that the container execution is really fast and the container stops itself. This error is redirected to the client and with docker 1.13 it will be also printed to the user for some reason. The expected test output was before Hello Args and now with the new client it is Hello ArgsERR[0001] error removing container.... For now I just enhanced the test that it can handle this behavior because I'm not sure why error messages now are directly printed to the user.

Tests which are still failing
engine-master 38 strategy binpack
This is a flaky test. If you compare this test with the results from the last runs you will see that it is randomly failing.

wsong · 2016-11-28T17:24:39Z

@michelvocks Thanks for looking into the leader-election tests. I've spent a fair bit of time looking into those too, and I've seen precisely the HTTP hijacking issue you've seen with /info. I actually had a PR out to fix it: #2526

But that PR itself had some issues that were difficult to resolve. I'm glad that you've also seen this issue, though, since it gives us some more data points.

Anyways, the reason why /_ping isn't in localRoutes is because of #2539, where I changed /_ping to forward to the primary so that it could return an error code if the primary was down. I've sent out #2554 to maintain this behavior while avoiding the primary-forwarding behavior that's messing you up here.

nishanttotla · 2016-11-28T17:28:51Z

@michelvocks 🎉 🎉 🎉 great job, thanks a lot! The binpack test has been flaky in the past, and I'm not surprised it's being flaky now.

@wsong we'll review your PR and merge it as well, thanks!

cc @dongluochen

dongluochen · 2016-11-28T21:50:51Z

Thanks @michelvocks.

LGTM.

@wsong We will need to prioritize this PR. We can fix /_ping after this.

michelvocks · 2016-11-29T17:50:20Z

@wsong Good point! Didn't noticed that. 😊
@dongluochen @nishanttotla Welcome! 😇

nishanttotla · 2016-11-29T22:43:18Z

I kicked the tests one more time just to be sure, and it seems like there's a failure on engine-1.11

not ok 5 label affinity in parallel
# (from function `retry' in file ./helpers.bash, line 80,
#  from function `wait_until_reachable' in file ./helpers.bash, line 85,
#  from function `start_docker' in file ./helpers.bash, line 222,
#  from function `start_docker_with_busybox' in file ./helpers.bash, line 181,
#  in test file ./affinities.bats, line 148)
#   `start_docker_with_busybox 2' failed
# Command "docker -H 127.0.0.1:5014 info" failed 15 times. Output: Cannot connect to the Docker daemon. Is the docker daemon running on this host?
# Stopping 9a8c01894077df74d3ba042b15685f1e61b40e42f8ddab3e10cd57ac96e7db5b
# Stopping ea9d6fd34cd76e40effa2cb34328687e71d8b23a88de9e6321f3c876aedb7a77

I can't remember if this was seen before?

It might be another error message mismatch, so we'll see, but otherwise LGTM.

michelvocks · 2016-11-30T11:34:54Z

@nishanttotla This must be a flaky test. The test fails at the beginning with the helper function start_docker_with_busybox 2. This helper function is used in 80% of all tests (and also in the test before and after this test). If you restart the tests I think this one will not fail again.

nishanttotla · 2016-11-30T21:34:07Z

@michelvocks you're right. Most of the failures are flaky at this point. I think we'll merge this PR.

nishanttotla · 2016-11-30T21:36:21Z

Merged 🎉 🎉 🎉

@michelvocks once again thanks a lot for your effort.

@wsong please rebase your other PRs on top of this one and we'll look at them.

wsong · 2016-11-30T21:37:27Z

@michelvocks Thanks for the hard work!

GordonTheTurtle added the status/0-triage label Sep 24, 2016

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from 8d15522 to f315db7 Compare September 26, 2016 18:51

michelvocks changed the title ~~[WIP] #2488 Update vendoring to remove engine-api~~ #2488 Update vendoring to remove engine-api Sep 26, 2016

nishanttotla added this to the 1.2.6 milestone Sep 26, 2016

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from f315db7 to 0d6faa1 Compare September 28, 2016 14:44

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from 0d6faa1 to 704dc98 Compare October 17, 2016 07:36

allencloud added area/API status/1-design-review and removed status/0-triage labels Oct 17, 2016

dongluochen mentioned this pull request Oct 19, 2016

Bump Go to 1.7.1 #2521

Merged

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from 704dc98 to a9fd789 Compare October 20, 2016 09:48

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from a9fd789 to 07d1cec Compare October 22, 2016 09:37

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from 07d1cec to 149c072 Compare October 25, 2016 12:14

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from cd51b3e to a742008 Compare November 24, 2016 11:07

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch 4 times, most recently from ee880c7 to 5bf3bd5 Compare November 25, 2016 21:02

Fixes docker-archive#2488. Updated godep golang version to 1.7. Some …

8c1f769

…API changes included. Signed-off-by: Michel Vocks <[email protected]>

michelvocks force-pushed the 2488-update-vendoring-to-remove-engine-api branch from 5bf3bd5 to 8c1f769 Compare November 25, 2016 22:45

wsong mentioned this pull request Nov 28, 2016

Don't forward _ping requests to the primary #2554

Closed

nishanttotla added the priority/P1 label Nov 28, 2016

nishanttotla added priority/P0 and removed priority/P1 labels Nov 29, 2016

nishanttotla merged commit 5a254ec into docker-archive:master Nov 30, 2016

wsong mentioned this pull request Nov 30, 2016

Make sure to catch connection errors properly when checking engine he… #2550

Closed

nishanttotla mentioned this pull request Dec 1, 2016

Add a node whitelist filter #2551

Merged

This was referenced Dec 28, 2016

docker: Error response from daemon: json: cannot unmarshal object into Go value of type [2]int. #2530

Closed

fix event monitor restart failed #2547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#2488 Update vendoring to remove engine-api #2492

#2488 Update vendoring to remove engine-api #2492

michelvocks commented Sep 24, 2016 •

edited

Loading

michelvocks commented Sep 26, 2016

nishanttotla commented Oct 13, 2016

michelvocks commented Oct 17, 2016

dongluochen commented Oct 18, 2016

dongluochen commented Oct 18, 2016

nishanttotla commented Oct 19, 2016

dongluochen commented Oct 20, 2016

michelvocks commented Oct 20, 2016 •

edited

Loading

michelvocks commented Oct 20, 2016

dongluochen commented Oct 20, 2016

michelvocks commented Oct 21, 2016

dongluochen commented Oct 21, 2016

michelvocks commented Oct 22, 2016

dongluochen commented Oct 24, 2016

michelvocks commented Oct 24, 2016

dongluochen commented Oct 25, 2016

michelvocks commented Oct 25, 2016 •

edited

Loading

michelvocks commented Nov 23, 2016 •

edited

Loading

nishanttotla commented Nov 23, 2016

nishanttotla commented Nov 23, 2016

michelvocks commented Nov 24, 2016 •

edited

Loading

michelvocks commented Nov 25, 2016 •

edited

Loading

wsong commented Nov 28, 2016

nishanttotla commented Nov 28, 2016

dongluochen commented Nov 28, 2016

michelvocks commented Nov 29, 2016

nishanttotla commented Nov 29, 2016

michelvocks commented Nov 30, 2016

nishanttotla commented Nov 30, 2016

nishanttotla commented Nov 30, 2016

wsong commented Nov 30, 2016

#2488 Update vendoring to remove engine-api #2492

#2488 Update vendoring to remove engine-api #2492

Conversation

michelvocks commented Sep 24, 2016 • edited Loading

michelvocks commented Sep 26, 2016

nishanttotla commented Oct 13, 2016

michelvocks commented Oct 17, 2016

dongluochen commented Oct 18, 2016

dongluochen commented Oct 18, 2016

nishanttotla commented Oct 19, 2016

dongluochen commented Oct 20, 2016

michelvocks commented Oct 20, 2016 • edited Loading

michelvocks commented Oct 20, 2016

dongluochen commented Oct 20, 2016

michelvocks commented Oct 21, 2016

dongluochen commented Oct 21, 2016

michelvocks commented Oct 22, 2016

dongluochen commented Oct 24, 2016

michelvocks commented Oct 24, 2016

dongluochen commented Oct 25, 2016

michelvocks commented Oct 25, 2016 • edited Loading

michelvocks commented Nov 23, 2016 • edited Loading

nishanttotla commented Nov 23, 2016

nishanttotla commented Nov 23, 2016

michelvocks commented Nov 24, 2016 • edited Loading

michelvocks commented Nov 25, 2016 • edited Loading

wsong commented Nov 28, 2016

nishanttotla commented Nov 28, 2016

dongluochen commented Nov 28, 2016

michelvocks commented Nov 29, 2016

nishanttotla commented Nov 29, 2016

michelvocks commented Nov 30, 2016

nishanttotla commented Nov 30, 2016

nishanttotla commented Nov 30, 2016

wsong commented Nov 30, 2016

michelvocks commented Sep 24, 2016 •

edited

Loading

michelvocks commented Oct 20, 2016 •

edited

Loading

michelvocks commented Oct 25, 2016 •

edited

Loading

michelvocks commented Nov 23, 2016 •

edited

Loading

michelvocks commented Nov 24, 2016 •

edited

Loading

michelvocks commented Nov 25, 2016 •

edited

Loading