Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f37 rootless: pod create: creating cgroup, already exists #16107

Closed
edsantiago opened this issue Oct 10, 2022 · 31 comments
Closed

f37 rootless: pod create: creating cgroup, already exists #16107

edsantiago opened this issue Oct 10, 2022 · 31 comments
Assignees
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

on f37:

$ podman pod create
Error: unable to create pod cgroup for pod SHA:
    creating cgroup user.slice/user-libpod_pod_SHA.slice: 
   Unit user-libpod_pod_SHA.slice already exists.

Infinitely repeatable, each time with a different SHA.

podman-4.3.0~rc1-3.fc37.x86_64 on 5.19.12-300.fc37.x86_64

also built from sources, main @ bb0b184

@rhatdan
Copy link
Member

rhatdan commented Oct 11, 2022

This is a blocker and should not be released until this is fixed.

@mheon FYI

@giuseppe
Copy link
Member

this looks like a systemd issue:

I've tried the following program on F36, F37 and Rawhide and the issue happens only with F37:

package main

import (
	"context"
	"fmt"
	"os"
	"strconv"

	systemdDbus "github.com/coreos/go-systemd/v22/dbus"
	"github.com/godbus/dbus/v5"
)

// GetUserConnection returns an user connection to D-BUS
func GetUserConnection(uid int) (*systemdDbus.Conn, error) {
	return systemdDbus.NewConnection(func() (*dbus.Conn, error) {
		return dbusAuthConnection(uid, dbus.SessionBusPrivate)
	})
}

func dbusAuthConnection(uid int, createBus func(opts ...dbus.ConnOption) (*dbus.Conn, error)) (*dbus.Conn, error) {
	conn, err := createBus()
	if err != nil {
		return nil, err
	}

	methods := []dbus.Auth{dbus.AuthExternal(strconv.Itoa(uid))}

	err = conn.Auth(methods)
	if err != nil {
		conn.Close()
		return nil, err
	}
	if err := conn.Hello(); err != nil {
		return nil, err
	}

	return conn, nil
}

func main() {
	conn, err := GetUserConnection(os.Geteuid())
	if err != nil {
		panic(err)
	}
	name := "user-libpod_pod_642a6b41ae846eedd6587e86d653e43fae07431e9c8ee235dfff9652274aca2f.slice"
	properties := []systemdDbus.Property{
		systemdDbus.PropDescription(fmt.Sprintf("cgroup %s", name)),
		systemdDbus.PropWants("user.slice"),
	}
	pMap := map[string]bool{
		"DefaultDependencies": false,
		"MemoryAccounting":    true,
		"CPUAccounting":       true,
		"BlockIOAccounting":   true,
	}

	for k, v := range pMap {
		p := systemdDbus.Property{
			Name:  k,
			Value: dbus.MakeVariant(v),
		}
		properties = append(properties, p)
	}

	ch := make(chan string)
	_, err = conn.StartTransientUnitContext(context.TODO(), "user-libpod_pod_047d1efebfc678193ffe5a251334be92e62a1aee7ecab2c8c6fc80973e9e51d5.slice", "replace", properties, ch)
	fmt.Println(err)
}

On F37 it fails with:

Unit user-libpod_pod_047d1efebfc678193ffe5a251334be92e62a1aee7ecab2c8c6fc80973e9e51d5.slice already exists.

@giuseppe
Copy link
Member

filed a bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2133792

@cdoern
Copy link
Contributor

cdoern commented Oct 12, 2022

I am going to take a look at this, I was the last one to mess around with this code. It is possible we create the directory in two places incorrectly.

@cdoern
Copy link
Contributor

cdoern commented Oct 12, 2022

I would bet that this has to do with the recent split up of the runtime files for freebsd support....

@edsantiago
Copy link
Member Author

Update: this is new in podman-4.3. podman-4:4.2.1-2.fc37.x86_64 works fine. I am sorry for the misinformation during scrum.

@rhatdan
Copy link
Member

rhatdan commented Oct 12, 2022

@giuseppe WDYT?

@rhatdan
Copy link
Member

rhatdan commented Oct 12, 2022

If this new then it is a blocker for release.

@giuseppe
Copy link
Member

@cdoern can you reproduce the issue with the code example I've posted above?

I'd expect that to work anyway (as it does on F36 and Rawhide), no matter what we do in Podman

@cdoern
Copy link
Contributor

cdoern commented Oct 12, 2022

@giuseppe I am running F36 but it does work for me on there. is the problem reproducible with podman 4.2 on F37? The major changes that have been done due to the freebsd support and from me adding extended systemd support for pods makes me worried that it might actually be our issue.

@giuseppe
Copy link
Member

I can reproduce the issue on F37 both with podman and the test snippet

@edsantiago
Copy link
Member Author

The issue is new in podman 4.3. podman 4.2 works fine on f37. This is a regression. No matter whose fault it is, podman 4.3 cannot ship on f37 right now.

@cdoern
Copy link
Contributor

cdoern commented Oct 12, 2022

The issue is new in podman 4.3. podman 4.2 works fine on f37. This is a regression. No matter whose fault it is, podman 4.3 cannot ship on f37 right now.

Ok, this makes me think it is a podman 4.3 inclusion. Unless a major change in systemd also shipped in that timeframe? I am going to investigate the cgroup creation path this afternoon to see if I can pinpoint it

@giuseppe
Copy link
Member

giuseppe commented Oct 12, 2022

it seems to work fine on F37 if I downgrade to systemd-251.4-53.fc37, looking at the changelog the patch "[PATCH] manager: optionally, do a full preset on first boot" was removed from the last systemd version, so I've tried running systemctl preset-all and the command seems to fix the system:

$ bin/podman pod create
67bb0802368eee78e79ade156756026c3ece2c811ed6144fc3c6f64ae274600d

@cevich
Copy link
Member

cevich commented Oct 13, 2022

@giuseppe so if I read that right, I could stick that command into our CI as a temporary workaround. To get F37 tests passing for now, until the package is fixed (and new images built)?

@yuwata
Copy link

yuwata commented Oct 14, 2022

Hi! I am a maintainer of systemd.
Could you open an issue on systemd upstream https://github.com/systemd/systemd with

  1. minimal reproducer without the go module,
  2. debugging log of systemd,
  3. if possible, result of bisect between v251.4..v251.5,

Note, the patch about the preset-all should not be relevant. The patch was merged to v251.5, hence dropped from the spec file. The functionality about that feature should not be changed.

@giuseppe
Copy link
Member

@yuwata there is already a reproducer that doesn't require Podman attached to the bugzilla.

@yuwata
Copy link

yuwata commented Oct 14, 2022

Is it possible to reproduce the issue with busctl or gdbus, without using the go module??
Also, please provide the debugging log with the reproducer. Otherwise, there is almost nothing we can do.

@giuseppe
Copy link
Member

It works on Rawhide so from what I can see it is related to the specific version on F37.

I will try to get a simpler reproducer, but I don't see how there is nothing you can do since there is already one with minimal dependencies. Have you tried running it on a freshly installed F37 machine?

@giuseppe
Copy link
Member

I've reinstalled Fedora 37 trying to find a better reproducer but I don't see the issue anymore, I'll update here as soon as I've more details

@giuseppe
Copy link
Member

after updating the packages and rebooting I see again the error on the freshly installed F37 VM.

This is what I captured from busctl:

method call time=1665745962.106018 sender=:1.18 -> destination=org.freedesktop.systemd1 serial=2 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=StartTransientUnit
   string "test-25478.slice"
   string "replace"
   array [
      struct {
         string "Description"
         variant             string "cgroup test-25478.slice"
      }
      struct {
         string "Wants"
         variant             array [
               string "user.slice"
            ]
      }
      struct {
         string "MemoryAccounting"
         variant             boolean true
      }
      struct {
         string "CPUAccounting"
         variant             boolean true
      }
      struct {
         string "BlockIOAccounting"
         variant             boolean true
      }
      struct {
         string "DefaultDependencies"
         variant             boolean false
      }
   ]
   array [
   ]
method call time=1665745962.106175 sender=:1.0 -> destination=org.freedesktop.DBus serial=64 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=GetConnectionCredentials
   string ":1.18"
method return time=1665745962.106180 sender=org.freedesktop.DBus -> destination=:1.0 serial=4294967295 reply_serial=64
   array [
      dict entry(
         string "UnixUserID"
         variant             uint32 1000
      )
      dict entry(
         string "ProcessID"
         variant             uint32 1959
      )
      dict entry(
         string "UnixGroupIDs"
         variant             array [
               uint32 10
               uint32 1000
               uint32 1000
            ]
      )
      dict entry(
         string "LinuxSecurityLabel"
         variant             array of bytes "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023" + \0
      )
   ]
method call time=1665745962.106508 sender=:1.0 -> destination=org.freedesktop.DBus serial=65 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=GetConnectionUnixUser
   string ":1.18"
method return time=1665745962.106524 sender=org.freedesktop.DBus -> destination=:1.0 serial=4294967295 reply_serial=65
   uint32 1000
error time=1665745962.107075 sender=:1.0 -> destination=:1.18 error_name=org.freedesktop.systemd1.UnitExists reply_serial=2
   string "Unit test-25478.slice already exists."
signal time=1665745962.107083 sender=:1.0 -> destination=(null destination) serial=67 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitNew
   string "test.slice"
   object path "/org/freedesktop/systemd1/unit/test_2eslice"
signal time=1665745962.107086 sender=:1.0 -> destination=(null destination) serial=68 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
   string "test.slice"
   object path "/org/freedesktop/systemd1/unit/test_2eslice"
signal time=1665745962.107097 sender=:1.0 -> destination=(null destination) serial=69 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitNew
   string "test-25478.slice"
   object path "/org/freedesktop/systemd1/unit/test_2d25478_2eslice"
signal time=1665745962.107100 sender=:1.0 -> destination=(null destination) serial=70 path=/org/freedesktop/systemd1; interface=org.freedesktop.systemd1.Manager; member=UnitRemoved
   string "test-25478.slice"
   object path "/org/freedesktop/systemd1/unit/test_2d25478_2eslice"
signal time=1665745962.107523 sender=org.freedesktop.DBus -> destination=:1.19 serial=4294967295 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string ":1.19"
signal time=1665745962.107529 sender=org.freedesktop.DBus -> destination=(null destination) serial=4294967295 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.19"
   string ":1.19"
   string ""
signal time=1665745962.107576 sender=org.freedesktop.DBus -> destination=:1.18 serial=4294967295 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string ":1.18"
signal time=1665745962.107580 sender=org.freedesktop.DBus -> destination=(null destination) serial=4294967295 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.18"
   string ":1.18"
   string ""

@giuseppe
Copy link
Member

and the same reproducer with busctl:

NAME=test-$RANDOM.slice
echo Running $NAME
busctl --user call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartTransientUnit 'ssa(sv)a(sa(sv))' "$NAME" "replace" 6 "Description" "s" "slice" "Wants" as 1 "user.slice" "DefaultDependencies" b false MemoryAccounting b true CPUAccounting b true BlockIOAccounting b true 0

On F36 I get:

Running test-20500.slice
o "/org/freedesktop/systemd1/job/44528"

On F37:

Running test-31900.slice
Call failed: Unit test-31900.slice already exists.

keszybz added a commit to keszybz/systemd that referenced this issue Oct 14, 2022
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.
@keszybz
Copy link

keszybz commented Oct 14, 2022

systemd/systemd#25004

keszybz added a commit to keszybz/systemd that referenced this issue Oct 16, 2022
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.
keszybz added a commit to keszybz/systemd that referenced this issue Oct 16, 2022
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.
keszybz added a commit to systemd/systemd-stable that referenced this issue Oct 26, 2022
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244)
@akdev1l
Copy link

akdev1l commented Oct 26, 2022

This is fixed in F37

$ uname -r
5.19.15-301.fc37.x86_64
[akdev@halifax ~]$ podman pod create test
f656fddd7f4dda70e06bc3e1c65defcafddee4b324b252b8cb6316d0d2dbe001

@vkSegfault
Copy link

Fresh Fedora 37, fully updated:
podman pod create test Error: unable to create pod cgroup for pod eedfbde3da99856b9f4e363b986ab7b33d0432e8b83ea20ccfe6d7d68f086487: error creating cgroup user.slice/user-libpod_pod_eedfbde3da99856b9f4e363b986ab7b33d0432e8b83ea20ccfe6d7d68f086487.slice: Unit user-libpod_pod_eedfbde3da99856b9f4e363b986ab7b33d0432e8b83ea20ccfe6d7d68f086487.slice already exists.

@akdev1l
Copy link

akdev1l commented Oct 27, 2022

@vkSegfault what is your kernel version?

I was able to validate above by applying: https://bodhi.fedoraproject.org/updates/FEDORA-2022-1a5b125ac6

perhaps the package isn't yet in the stable repos? (it looks like it should be though)

@vkSegfault
Copy link

vkSegfault commented Oct 27, 2022

@akdev1l :
uname -r 5.19.16-301.fc37.x86_64

but with previous one: 5.19.15-301 was all the same.

sudo dnf upgrade --refresh --advisory=FEDORA-2022-1a5b125ac6 Dependencies resolved. Nothing to do. Complete!
So seems it's either not fixed or not in stable.

There are plenty of updates in Fedora 37 - Test Updates repo, but I had it turned off and actually a bit scary to install them cause F37 is pretty much great given it's not officialy released yet as stable release.

Edit 1:
Above someone stated that bug affects only Podman 4.3 but in my case it's podman --version podman version 4.2.1

@akdev1l
Copy link

akdev1l commented Oct 27, 2022

@vkSegfault mm it looks like I was using podman from rawhide repos:

akdev@halifax ~]$ rpm -q podman
podman-4.3.0~rc1-5.fc38.x86_64
[akdev@halifax ~]$ podman --version
podman version 4.3.0-rc1
[akdev@halifax ~]$ uname -r 
5.19.15-301.fc37.x86_64
[akdev@halifax ~]$ podman pod create testpod
2f442310b82f6f05d2649076dfedb378d34f1b029ed3c67d2a23bac711d2dbad

but I downgraded to 4.2.1 and it also worked - I fully upgraded my system to be latest F37 with no rawhide packages:

$ uname -r
5.19.16-301.fc37.x86_64
$ rpm -q podman
podman-4.2.1-2.fc37.x86_64
$ podman --version
podman version 4.2.1
$ podman pod create test3
5f23957501f00c5fe38f1be62b4e64526ca6c0b125ef4f6c0376085c94064321
$ podman pod ps
POD ID        NAME        STATUS      CREATED         INFRA ID      # OF CONTAINERS
5f23957501f0  test3       Created     43 seconds ago  686e7ebb80b6  1

works ok for me not sure - we need a tie breaker

@jonnyso
Copy link

jonnyso commented Nov 3, 2022

Well, don't know if this helps but, same problem for me on F37 Silverblue:

[jonny@fedora Podman]$ rpm -q podman
podman-4.2.1-2.fc37.x86_64
[jonny@fedora Podman]$ podman --version
podman version 4.2.1
[jonny@fedora Podman]$ uname -r
5.19.16-301.fc37.x86_64
[jonny@fedora Podman]$ systemctl --version
systemd 251 (251.6-609.fc37)

@edsantiago
Copy link
Member Author

systemd 251 (251.6-609.fc37)

I believe the fix is in systemd 251.7, you have .6.

@giuseppe
Copy link
Member

giuseppe commented Nov 4, 2022

I think this issue can be closed since it has been fixed in systemd and there is no action to take in Podman, but please feel free to comment further

@giuseppe giuseppe closed this as completed Nov 4, 2022
keszybz added a commit to systemd/systemd-stable that referenced this issue Nov 4, 2022
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244)
(cherry picked from commit 98a4560)
lnykryn pushed a commit to lnykryn/systemd-rhel8 that referenced this issue Mar 17, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.
dtardon pushed a commit to dtardon/systemd-rhel8 that referenced this issue Apr 13, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
dtardon pushed a commit to dtardon/systemd-rhel8 that referenced this issue Apr 20, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
dtardon pushed a commit to dtardon/systemd-rhel8 that referenced this issue Apr 20, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
dtardon pushed a commit to dtardon/systemd-rhel8 that referenced this issue Jun 15, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
dtardon pushed a commit to dtardon/systemd-rhel8 that referenced this issue Jul 13, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
systemd-rhel-bot pushed a commit to redhat-plumbers/systemd-rhel8 that referenced this issue Aug 22, 2023
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 11, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 11, 2023
zlind0 pushed a commit to zlind0/systemd-239 that referenced this issue Sep 14, 2024
In containers/podman#16107, starting of a transient
slice unit fails because there's a "global" drop-in
/usr/lib/systemd/user/slice.d/10-oomd-per-slice-defaults.conf (provided by
systemd-oomd-defaults package to install some default oomd policy). This means
that the unit_is_pristine() check fails and starting of the unit is forbidden.

It seems pretty clear to me that dropins at any other level then the unit
should be ignored in this check: we now have multiple layers of drop-ins
(for each level of the cgroup path, and also "global" ones for a specific
unit type). If we install a "global" drop-in, we wouldn't be able to start
any transient units of that type, which seems undesired.

In principle we could reject dropins at the unit level, but I don't think that
is useful. The whole reason for drop-ins is that they are "add ons", and there
isn't any particular reason to disallow them for transient units. It would also
make things harder to implement and describe: one place for drop-ins is good,
but another is bad. (And as a corner case: for instanciated units, a drop-in
in the template would be acceptable, but a instance-specific drop-in bad?)

Thus, $subject.

While at it, adjust the message. All the conditions in unit_is_pristine()
essentially mean that it wasn't loaded (e.g. it might be in an error state),
and that it doesn't have a fragment path (now that drop-ins are acceptable).
If there's a job for it, it necessarilly must have been loaded. If it is
merged into another unit, it also was loaded and found to be an alias.
Based on the discussion in the bugs, it seems that the current message
is far from obvious ;)

Fixes containers/podman#16107,
https://bugzilla.redhat.com/show_bug.cgi?id=2133792.

(cherry picked from commit 1f83244641f13a9cb28fdac7e3c17c5446242dfb)

Resolves: #2156620
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

10 participants