Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: podman system reset: unlinkat: directory not empty #17957

Open
edsantiago opened this issue Mar 28, 2023 · 20 comments
Open

e2e: podman system reset: unlinkat: directory not empty #17957

edsantiago opened this issue Mar 28, 2023 · 20 comments
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug.

Comments

@edsantiago
Copy link
Member

  podman system reset
...
$ podman [options] system reset -f
time="2023-03-27T14:47:44-05:00" level=error msg="unlinkat /tmp/podman_test2964397936: directory not empty"

podman system reset [It] podman system reset

@edsantiago edsantiago added flakes Flakes from Continuous Integration rootless kind/bug Categorizes issue or PR as related to a bug. labels Mar 28, 2023
@vrothberg
Copy link
Member

I do not see an immediate/obvious bug in the code.

The directory not empty usually happens when another process is writing to the specific directory but the test has it's own graph/run root etc. I also checked the logs and couldn't see any other test using the same directory.

@edsantiago
Copy link
Member Author

I wonder if this could be related to #18057, tmpdir, podman-pause, namespaces?

@vrothberg
Copy link
Member

I wonder if this could be related to #18057, tmpdir, podman-pause, namespaces?

Good thinking! Let's see what happens once it has been resolved.

@github-actions
Copy link

github-actions bot commented May 7, 2023

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

Funny you should mention it, github-bot! It failed again just today. Here are the two instances from the past month:

  • fedora-37 : int podman fedora-37 root host sqlite
    • 04-05-2023 16:58 in Podman build [It] podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root container sqlite
    • 05-05 15:50 in podman system reset must clean host shared cache

@edsantiago
Copy link
Member Author

It's happening in ginkgov2, even with Serial. debian root. There's one new bit of information, the events subdirectory:

# podman [options] system reset -f
time="2023-05-17T03:15:46Z" level=error msg="unlinkat /tmp/podman_test4235535237/events: directory not empty"

@vrothberg
Copy link
Member

Ha, that is interesting. Random guess: are we writing an event while trying to remove the directory?

@edsantiago
Copy link
Member Author

I instrumented the test, and just saw it trigger:

$ podman [options] build --pull-never --file build/cache/Dockerfilecachewrite build/cache/
STEP 1/3: FROM alpine
STEP 2/3: RUN mkdir /test
--> 11332e28703f
STEP 3/3: RUN --mount=type=cache,target=/test,z echo hello > /test/world
COMMIT
--> fc59322656e6
fc59322656e6317e70afa8a7287db8481f1cacec2e542d3686d45ab270629fc2
Execing /usr/bin/ls -lR /tmp/podman_test1295736800/events
         
/tmp/podman_test1295736800/events:
total 4
-rwx------. 1 some19778dude some19778dude 171 Jun 28 16:22 events.log
-rw-r--r--. 1 some19778dude some19778dude   0 Jun 28 16:22 events.log.lock

$ podman [options] system reset -f
time="2023-06-28T16:22:43-05:00" level=error msg="unlinkat /tmp/podman_test1295736800/events: directory not empty"

Does this help?

@vrothberg
Copy link
Member

Thanks, @edsantiago!

Yes, I think that's helpful. Apologies, I am buried and don't have time to sniff into it at the moment.

@edsantiago
Copy link
Member Author

Another one, f38 root, exactly the same symptoms (events.log size 171, events.log.lock size 0). Would it help to find ... -type f | xargs cat ?

@edsantiago
Copy link
Member Author

xargs cat was easy, so here we go. f38 rootless:

...
COMMIT
--> 77574a3144ff
77574a3144ff0c193c6ecb1b2f7239451e69b36d056bd3bb8734e4633ae5c5d6
Execing /usr/bin/bash -c find /tmp/podman_test2191038988/events -type f | while read f;do echo;ls -l $f;cat $f;done

-rwx------. 1 some14065dude some14065dude 171 Jun 29 13:51 /tmp/podman_test2191038988/events/events.log
{"Name":"77574a3144ff0c193c6ecb1b2f7239451e69b36d056bd3bb8734e4633ae5c5d6","Status":"build","Time":"2023-06-29T13:51:40.199021878-05:00","Type":"image","Attributes":null}
         
-rw-r--r--. 1 some14065dude some14065dude 0 Jun 29 13:51 /tmp/podman_test2191038988/events/events.log.lock

@edsantiago
Copy link
Member Author

Here's the list so far. I removed the rootless label.

  • debian-12 : int podman debian-12 root host sqlite
  • debian-13 : int podman debian-13 rootless host sqlite
    • 08-10 18:00 in Podman build podman system reset must clean host shared cache
  • fedora-37 : int podman fedora-37 root container sqlite
  • fedora-37 : int podman fedora-37 root host sqlite
  • fedora-37 : int podman fedora-37 rootless host boltdb
  • fedora-37 : int podman fedora-37 rootless host sqlite
  • fedora-38 : int podman fedora-38 root container sqlite
  • fedora-38 : int podman fedora-38 root host sqlite
  • fedora-38 : int podman fedora-38 rootless host boltdb
    • 08-14 15:33 in Podman build podman system reset must clean host shared cache
    • 06-29-2023 14:51 in Podman build podman system reset must clean host shared cache
    • 06-07-2023 10:55 in podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host sqlite
  • rawhide : int podman rawhide root host sqlite
    • 06-08-2023 22:20 in Podman build podman system reset must clean host shared cache

@vrothberg
Copy link
Member

@edsantiago do you still see this flake? I am asking because of the recent storage fixes which may play a role.

@edsantiago
Copy link
Member Author

Last seen Sept 28 but this is an elusive one, infrequent, so I prefer not to close this yet.

  • fedora-38 : int podman fedora-38 rootless host sqlite
    • 09-20-2023 07:49 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 rootless host boltdb
    • 09-28 11:54 in Podman build podman system reset must clean host shared cache

Seen in: int podman fedora-38+fedora-39 rootless host boltdb+sqlite

@edsantiago
Copy link
Member Author

New failure just now in rawhide rootless:

           [It] podman system reset must clean host shared cache
...
           STEP 3/3: RUN --mount=type=cache,target=/test,z echo hello > /test/world
           COMMIT
           --> bbe957a263fc
           bbe957a263fc92a1c01e98376df336bd5bb7cbfc0e8a948eb4eaa1b8ab75cc68
           Execing /usr/bin/bash -c find /tmp/podman_test1906333851/events -type f | while read f;do echo;ls -l $f;cat $f;done
         
         
           -rwx------. 1 some30272dude some30272dude 171 Oct 25 10:29 /tmp/podman_test1906333851/events/events.log
           {"Name":"bbe957a263fc92a1c01e98376df336bd5bb7cbfc0e8a948eb4eaa1b8ab75cc68","Status":"build","Time":"2023-10-25T10:29:01.640789818-05:00","Type":"image","Attributes":null}
         
           -rw-r--r--. 1 some30272dude some30272dude 0 Oct 25 10:29 /tmp/podman_test1906333851/events/events.log.lock
           $ podman [options] system reset -f
           time="2023-10-25T10:29:01-05:00" level=error msg="Unable to write image event: \"creating lock file directory: mkdir /tmp/podman_test1906333851/events: no such file or directory\""

Message is completely different -- "lock file ENOENT" instead of "unlinkat ENOTEMPTY" -- but I bet it's the same thing.

@edsantiago
Copy link
Member Author

...and we're back to unlinkat ENOTEMPTY. int f38 root.

@edsantiago
Copy link
Member Author

The story so far

  • debian-13 : int podman debian-13 rootless host sqlite
    • 08-10-2023 18:00 in Podman build podman system reset must clean host shared cache
  • fedora-37 : int podman fedora-37 root host sqlite
    • 07-10-2023 10:19 in Podman build podman system reset must clean host shared cache
  • fedora-37 : int podman fedora-37 rootless host sqlite
    • 08-27-2023 21:41 in Podman build podman system reset must clean host shared cache
    • 06-30-2023 12:57 in Podman build podman system reset must clean host shared cache
    • 06-15-2023 16:51 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root host boltdb
    • 11-13 16:11 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root host sqlite
    • 10-31-2023 11:36 in Podman build podman system reset must clean host shared cache
    • 10-26-2023 17:08 in Podman build podman system reset must clean host shared cache
    • 06-29-2023 07:39 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host boltdb
    • 12-06 15:17 in Podman build podman system reset must clean host shared cache
    • 08-14-2023 15:33 in Podman build podman system reset must clean host shared cache
    • 06-29-2023 14:51 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host sqlite
    • 09-20-2023 07:49 in Podman build podman system reset must clean host shared cache
    • 06-28-2023 17:23 in Podman build podman system reset must clean host shared cache
    • 06-21-2023 09:14 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 rootless host boltdb
    • 09-28-2023 11:54 in Podman build podman system reset must clean host shared cache
  • rawhide : int podman rawhide rootless host sqlite
    • 10-25-2023 11:29 in Podman build podman system reset must clean host shared cache
x x x x x x
int(17) podman(17) fedora-38(10) rootless(12) host(17) sqlite(12)
fedora-37(4) root(5) boltdb(5)
fedora-39(1)
rawhide(1)
debian-13(1)

@edsantiago
Copy link
Member Author

Still happening, on main, even after the system-reset work. rawhide rootless. Could this be related to the new panic, #21279?

@edsantiago
Copy link
Member Author

Ping, still happening

  • fedora-37 : int podman fedora-37 rootless host sqlite
    • 08-27-2023 21:41 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root host boltdb
    • 11-13-2023 16:11 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root host sqlite
    • 10-31-2023 11:36 in Podman build podman system reset must clean host shared cache
    • 10-26-2023 17:08 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host boltdb
    • 02-04 12:38 in podman system reset system reset completely removes container
    • 12-06-2023 15:17 in Podman build podman system reset must clean host shared cache
    • 08-14-2023 15:33 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host sqlite
    • 09-20-2023 07:49 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 root host sqlite
    • 02-07 22:52 in podman system reset podman system reset
  • fedora-39 : int podman fedora-39 rootless host boltdb
    • 09-28-2023 11:54 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 rootless host sqlite
    • 02-06 08:21 in Podman build podman system reset must clean host shared cache
    • 01-18 08:40 in podman system reset system reset completely removes container
  • rawhide : int podman rawhide rootless host sqlite
    • 01-17 23:01 in Podman build podman system reset must clean host shared cache
    • 10-25-2023 11:29 in Podman build podman system reset must clean host shared cache
x x x x x x
int(14) podman(14) fedora-38(7) rootless(10) host(14) sqlite(9)
fedora-39(4) root(4) boltdb(5)
rawhide(2)
fedora-37(1)

@edsantiago
Copy link
Member Author

Still happening

  • fedora-38 : int podman fedora-38 root host boltdb
    • 03-04 17:23 in Podman build podman system reset must clean host shared cache
    • 02-27 07:28 in podman system reset system reset completely removes container
    • 02-19 16:18 in podman system reset podman system reset
    • 11-13-2023 16:11 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 root host sqlite
    • 10-31-2023 11:36 in Podman build podman system reset must clean host shared cache
    • 10-26-2023 17:08 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host boltdb
    • 02-21 12:04 in podman system reset system reset completely removes container
    • 02-04-2024 12:38 in podman system reset system reset completely removes container
    • 12-06-2023 15:17 in Podman build podman system reset must clean host shared cache
  • fedora-38 : int podman fedora-38 rootless host sqlite
    • 09-20-2023 07:49 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 root host sqlite
    • 02-07 22:52 in podman system reset podman system reset
  • fedora-39 : int podman fedora-39 rootless host boltdb
    • 09-28-2023 11:54 in Podman build podman system reset must clean host shared cache
  • fedora-39 : int podman fedora-39 rootless host sqlite
    • 02-06-2024 08:21 in Podman build podman system reset must clean host shared cache
    • 01-18-2024 08:40 in podman system reset system reset completely removes container
  • rawhide : int podman rawhide rootless host sqlite
    • 02-14 14:36 in Podman build podman system reset must clean host shared cache
    • 01-17-2024 23:01 in Podman build podman system reset must clean host shared cache
    • 10-25-2023 11:29 in Podman build podman system reset must clean host shared cache
x x x x x x
int(17) podman(17) fedora-38(10) rootless(10) host(17) sqlite(9)
fedora-39(4) root(7) boltdb(8)
rawhide(3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants