-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint/Restore test fixes #11955
Checkpoint/Restore test fixes #11955
Conversation
@edsantiago PTAL |
My only concern is that perhaps the 3-second timeout (w/ 5 retries) is too long. However, the code also isn't waiting for a 'ready' signal from the container (can it?), and the |
ee739d7
to
40ef16c
Compare
Switched to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one suggestion.
LGTM |
40ef16c
to
b33845f
Compare
/lgtm |
/hold |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues I see are mostly modifying a known good port. Those new ports could be used elsewhere which cause CI flakes. Generate all "new" port numbers using GetRandomPort()
test/e2e/checkpoint_test.go
Outdated
fmt.Fprintf(os.Stderr, "Trying to connect to redis server at localhost:%d", orginalPort) | ||
// Open a network connection to the redis server via initial port mapping | ||
conn, err := net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort), time.Duration(3)*time.Second) | ||
Expect(err).To(BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect(err).To(BeNil()) | |
Expect(err).ShouldNot(HaveOccurred()) |
test/e2e/checkpoint_test.go
Outdated
} | ||
fmt.Fprintf(os.Stderr, "Trying to reconnect to redis server at localhost:%d", orginalPort+100) | ||
conn, err = net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort+100), time.Duration(3)*time.Second) | ||
Expect(err).To(BeNil()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect(err).To(BeNil()) | |
Expect(err).ShouldNot(HaveOccurred()) |
test/e2e/checkpoint_test.go
Outdated
localRunString := getRunString([]string{"-p", "1234:6379", "--rm", redis}) | ||
port, err := utils.GetRandomPort() | ||
Expect(err).ShouldNot(HaveOccurred()) | ||
orginalPort := port + GinkgoParallelProcess() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, you get a random port (that is "safe") and then add something to it?
test/e2e/checkpoint_test.go
Outdated
} | ||
|
||
fmt.Fprintf(os.Stderr, "Trying to connect to redis server at localhost:%d", orginalPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, the test was hanging just after this point, so knowing what it was trying to do may also be helpful in the future. Ginkgo doesn't give us this detail otherwise 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. It was not added for debugging just for better feedback during the test to know what is happening.
test/e2e/checkpoint_test.go
Outdated
@@ -958,7 +966,7 @@ var _ = Describe("Podman checkpoint", func() { | |||
Expect(podmanTest.NumberOfContainers()).To(Equal(0)) | |||
|
|||
// Restore container with different port mapping | |||
result = podmanTest.Podman([]string{"container", "restore", "-p", "1235:6379", "-i", fileName}) | |||
result = podmanTest.Podman([]string{"container", "restore", "-p", fmt.Sprintf("%d:6379", orginalPort+100), "-i", fileName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, adding a number to a "safe" port?
test/e2e/checkpoint_test.go
Outdated
@@ -967,13 +975,12 @@ var _ = Describe("Podman checkpoint", func() { | |||
|
|||
// Open a network connection to the redis server via initial port mapping | |||
// This should fail | |||
conn, err = net.Dial("tcp", "localhost:1234") | |||
conn, err = net.DialTimeout("tcp4", fmt.Sprintf("localhost:%d", orginalPort), time.Duration(3)*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to steal this one. :-)
Oooohhhh, I understand. So the concern is if this test goes south/hangs/leaks, another test might also ask for a port and receive the same one (since it was |
Moving to Fedora 35 showed test failures (time outs) in the test "podman checkpoint and restore container with different port mappings" The test starts a container and maps the internal port 6379 to the local port 1234 ('-p 1234:6379') and then tries to connect to localhost:1234 On Fedora 35 this failed and blocked the test because the container was not yet ready. The test was trying to connect to localhost:1234 but nothing was running there. So the error was not checkpointing related. Before trying to connect to the container the test is now waiting for the container to be ready. Another problem with this test and running ginkgo in parallel was that it was possible that the port was already in use. Now for each run a random port is selected to decrease the chance of collisions. Signed-off-by: Adrian Reber <[email protected]>
b33845f
to
8439a6d
Compare
@jwhonce Thanks for the review. I adapted the PR to fix what you commented on. Besides the debug output which was added to know that the test is trying to connect to the container. I can remove it but I think it can be helpful understanding what is going on. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianreber, cevich, jwhonce The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks everyone. |
Moving to Fedora 35 showed test failures (time outs) in the test
"podman checkpoint and restore container with different port mappings"
The test starts a container and maps the internal port 6379 to the local
port 1234 ('-p 1234:6379') and then tries to connect to localhost:1234
On Fedora 35 this failed and blocked the test because the container was
not yet ready. The test was trying to connect to localhost:1234 but
nothing was running there. So the error was not checkpointing related.
Another problem with this test and running ginkgo in parallel was that
it was possible that the port was already in use. Now for each run a
random port is selected to decrease the chance of collisions.
This tries to fix problems seen in #11795
CC: @cevich