Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix runc events error in cgroup v2 #2352

Merged
merged 1 commit into from
May 8, 2020

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Apr 25, 2020

fix #2351

Because there is no oom notifier in cgroup v2 now.
I think we should use Inotify to watch OOM in memory.events.

Signed-off-by: lifubang [email protected]

@AkihiroSuda
Copy link
Member

Thanks, please update https://github.com/opencontainers/runc/blob/master/tests/integration/events.bats as well

@lifubang lifubang marked this pull request as draft April 25, 2020 06:23
@lifubang lifubang force-pushed the eventsv2 branch 6 times, most recently from b0eba0c to 760cf31 Compare April 25, 2020 16:11
@lifubang lifubang marked this pull request as ready for review April 25, 2020 16:31
@lifubang
Copy link
Member Author

please update https://github.com/opencontainers/runc/blob/master/tests/integration/events.bats as well

enable cgroup v2 tests in events.bats now.

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 25, 2020

LGTM

Approved with PullApprove

@lifubang lifubang marked this pull request as ready for review May 1, 2020 15:32
@lifubang
Copy link
Member Author

lifubang commented May 1, 2020

Can you add a separate patch testing for OOM event? It's pretty easy:

I have add a test now, but not easy, we should disable memory swap, or else it’s very hard to cause oom.
And the meaning of swap value now is different in v1 and v2, it’s quite inconvenient to use. So I have post an issue.

retry 10 1 eval "grep -q 'oom' events.log"
__runc delete -f test_busybox
) &
wait # wait for the above sub shells to finish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a small timeout in case runc won't ever exit?
timeout 10 wait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #2370 merged, I'll update it together with some other changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang #2370 is merged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin timeout and wait can't be used together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.

I am still concerned that this case might hang forever. Maybe we can run both __runc invocations with a timeout then. Something big, say 1 minute. Obviously, if OOM has not happened within a minute, the test has failed.

Copy link
Member Author

@lifubang lifubang May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis CI will return an error if there is no output after 150 seconds.
I met this error before I disable swap.

If you worry about this, I can write a simple C program to ensure causing oom kill.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin use sh -c 'test=$(dd if=/dev/urandom ibs=5120k)' to ensure oom kill.
Because the memory limit is 4M.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting much better and is almost ready. Just a few nits.

@lifubang lifubang force-pushed the eventsv2 branch 2 times, most recently from 0789d76 to 284d9b2 Compare May 5, 2020 09:32
@kolyshkin
Copy link
Contributor

kolyshkin commented May 8, 2020

LGTM, thanks!

Approved with PullApprove

@AkihiroSuda
Copy link
Member

AkihiroSuda commented May 8, 2020

LGTM

Approved with PullApprove

@AkihiroSuda AkihiroSuda merged commit 492cfd8 into opencontainers:master May 8, 2020
@lifubang lifubang deleted the eventsv2 branch May 28, 2020 01:31
AkihiroSuda added a commit to AkihiroSuda/containerd that referenced this pull request Jun 1, 2020
How to test (from opencontainers/runc#2352 (comment)):
  (host)$ sudo swapoff -a
  (host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
  (container)$ sh -c 'VAR=$(seq 1 100000000)'

An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.

Signed-off-by: Akihiro Suda <[email protected]>
fahedouch pushed a commit to fahedouch/containerd that referenced this pull request Aug 7, 2020
How to test (from opencontainers/runc#2352 (comment)):
  (host)$ sudo swapoff -a
  (host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
  (container)$ sh -c 'VAR=$(seq 1 100000000)'

An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.

Signed-off-by: Akihiro Suda <[email protected]>
tussennet pushed a commit to tussennet/containerd that referenced this pull request Sep 11, 2020
How to test (from opencontainers/runc#2352 (comment)):
  (host)$ sudo swapoff -a
  (host)$ sudo ctr run -t --rm --memory-limit $((1024*1024*32)) docker.io/library/alpine:latest foo
  (container)$ sh -c 'VAR=$(seq 1 100000000)'

An event `/tasks/oom {"container_id":"foo"}` will be displayed in `ctr events`.

Signed-off-by: Akihiro Suda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroupv2: runc events is broken
3 participants