Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans to support podman 3.0.0 #89

Closed
chris-rock opened this issue Feb 14, 2021 · 25 comments
Closed

Any plans to support podman 3.0.0 #89

chris-rock opened this issue Feb 14, 2021 · 25 comments

Comments

@chris-rock
Copy link

Podman 3.0.0 has just landed and I was wondering if nomad is going to support the latest version?

@towe75
Copy link
Collaborator

towe75 commented Feb 15, 2021

Hi @chris-rock , thank you for reaching out.

Tests with podman 3.0 rc where successful and but it seems that a last minute API change is a show-stopper. See containers/podman#9351

Also there are various ubuntu related setup/upgrade problems which is worse given the fact that only 3.0 is available in their repository, see containers/podman#9358 . For now we're waiting for a point release to see if it improves the API situation. Otherwise we can, of course, react with some version checking and if/else API adapters.

@chris-rock
Copy link
Author

Thank you for the quick feedback. I using alpine edge and it already ships the latest podman 3 as a package.

@matejvasek
Copy link

@chris-rock @towe75 The incompatible change was introduced accidentally. It should be fixed in v3.0.1.

@towe75
Copy link
Collaborator

towe75 commented Feb 19, 2021

@matejvasek : thank you for keeping us posted!

@chris-rock
Copy link
Author

That is great. Thank you @matejvasek @towe75 Once podman 3.0.1 is alpine, I am going to do another try...

@towe75 towe75 reopened this Feb 22, 2021
@towe75
Copy link
Collaborator

towe75 commented Feb 22, 2021

@matejvasek : 3.0.1 is out and i checked if our plugin "just works". I can see that the podman gh issue is resolved but it's a long, convoluted thread and i can not see if the API request works now as planed.

Fact is that i still have to adopt our codebase. We're using "/v1.0.0/libpod/containers/%s/wait?condition=%s" but only "/v1.0.0/libpod/containers/%s/wait?condition[]=%s" (note the brackets!) seems to do the job. Should i open another issue or is it what you want to do in this API? It's a bit annoying to add the brackets even if i only wait on one state.

@matejvasek
Copy link

@towe75 hmm, let me check, I tried it with something like http://127.0.0.1:1234/v1.0.0/libpod/containers/festive_poitras/wait?condition=running and it worked.

@matejvasek
Copy link

the brackets break functionality for me: http://127.0.0.1:1234/v1.0.0/libpod/containers/festive_poitras/wait?condition[]=running immediately returns on non running container.

@matejvasek
Copy link

matejvasek commented Feb 22, 2021

how did you figure to put it there, I mean brackets?

@matejvasek
Copy link

I think that condition[] is discarded == it waits for default condition which is stopped.

@matejvasek
Copy link

I see only call is err = c.ContainerWait(timeout, name, "running") right?

@towe75
Copy link
Collaborator

towe75 commented Feb 22, 2021

@matejvasek : my bad, sorry. I ran into a combination of:

  • earlier failed tests due to broken API in 3.0.0
  • a automated build failed with 3.0.1 but on a different test (did not compare it before this post, sorry)
  • a misinterpretation of your API spec. It says "Array of string" and my somewhat ancient PHP background translated this to ?condition[] and i gave it a try
  • condition[] returns immediately which, in turn, lets my flaky test pass

I will improve the unit test and recheck. Thank you again for checking! 🍻

@matejvasek
Copy link

AFAIK collection via URI can be passed as /wait?condition=running&condition=unknown
or /wait?condition=running,unknown.

@matejvasek
Copy link

How long running it the container in question? Maybe it starts and ends so fast the wait cannot detect that.

@matejvasek
Copy link

IMO the ContainerStart() function should call ContainerWait() in separate gorutine before calling res, err := c.Post(ctx, fmt.Sprintf("/v1.0.0/libpod/containers/%s/start", name), nil).

@matejvasek
Copy link

Something like:

// ContainerStart starts a container via id or name
func (c *API) ContainerStart(ctx context.Context, name string) error {

	waitErrChan := make(chan error, 1)
	go func() {
		// wait max 10 seconds for running state
		// TODO: make timeout configurable
		timeout, cancel := context.WithTimeout(ctx, time.Second*10)
		defer cancel()

		waitErrChan <- c.ContainerWait(timeout, name, "running")
	}()

	res, err := c.Post(ctx, fmt.Sprintf("/v1.0.0/libpod/containers/%s/start", name), nil)
	if err != nil {
		return err
	}

	defer res.Body.Close()

	if res.StatusCode != http.StatusNoContent {
		body, _ := ioutil.ReadAll(res.Body)
		return fmt.Errorf("unknown error, status code: %d: %s", res.StatusCode, body)
	}


	return <-waitErrChan
}

@matejvasek
Copy link

If the issues is short living container then you could try adding &interval=1ns to your wait query.

@towe75
Copy link
Collaborator

towe75 commented Feb 22, 2021

@matejvasek : yes, my problem is indeed related to (very) short living test containers.
I like your proposal and it will for sure improve the situation.

@matejvasek
Copy link

Important: podman uses polling to check state, this is not reliable (and it's not new to podman v3 AFAIK). For really short living containers (--entry-point '["true"]') this works poorly.

@matejvasek
Copy link

It's bug but not new to podman v3, as far as I can tell.

@matejvasek
Copy link

@towe75 How short living the container is? ms, ns?

@towe75
Copy link
Collaborator

towe75 commented Feb 24, 2021

@matejvasek : i improved the code slightly via #91 . The real fix is part of the fundamental changes in #80 where we subscribe to the podman event stream instead of polling stats etc.

@chris-rock : sorry for some noise in this issue. You can give it another try with podman 3.0.1 and the latest 0.2.0 plugin release.

@towe75 towe75 closed this as completed Feb 24, 2021
@Procsiab
Copy link
Contributor

Procsiab commented Mar 4, 2021

Hello, I'm using the 0.2.0 release of the plugin, and Podman version 3.0.1: still, I'm having issues starting a container. The error message I get is the following:
rpc error: code = Unknown desc = failed to start task, could not start container: unknown error, status code: 500: {"cause":"OCI runtime error","message":"the requested cgroup controller cpu is not available: OCI runtime error","response":500}

I call this an issue because before updating the Podman binary, the very same Nomad job would run without this error, on Podman version 2.2.1.
Let me know if I can give you more data and if it's better to create a separate issue.

@towe75
Copy link
Collaborator

towe75 commented Mar 4, 2021

@Procsiab : i did not see such behavior in any test setup, sorry. Does the same container/image work if you start it manually? do you use any "special" flags when you run it without nomad?

@Procsiab
Copy link
Contributor

Procsiab commented Mar 4, 2021

Speaking about how am I running the job: is the 3 container setup for a NextCloud install; I am using the volumes config with selinuxlabel = "z".
Inside the job I have 3 ports that I defined static, and assigned them to a custom host_network pointing to a specific NIC.
Then, I defined 3 services, to which I assigned the ports; finally, I have the 3 tasks.
For each task, I used only the stanzas

  • driver="podman"
  • env
  • config (with image, volumes, network_mode="slirp4netns" and ports)
  • resources (with cpu and memory).

By "starting manually" you mean trying to run it from the Podman CLI? In that case, I tried a simpler test first:
podman run -it alpine sh
The container starts without problems; as I add --memory or --cpu-shares flags, I get the following errors (however the container IS created):

  • cpu-shares

Error: OCI runtime error: the requested cgroup controller cpu is not available

  • memory

Error: OCI runtime error: sync socket closed

After these experiments, I have a good feeling that the new Podman version requires some tweaking for limiting CPU and RAM resources in a rootless configuration, speaking of which: I have a dedicated unprivileged account called nomad under which the Nomad binary is run, and the Podman containers are started from.

EDIT: I found the solution by enabling CPU limit delegation, as explained in the official Podman troubleshooting guide.
However, it's still curious that with the previous version I was using (2.2.1) that was not an issue at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants