Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay: Diffsize: naive diff when not parent #1691

Merged
merged 2 commits into from
Aug 23, 2023

Conversation

vrothberg
Copy link
Member

If the specified parent is not ID's parent, use the naive diff.

@mtrmac @giuseppe PTAL

@vrothberg
Copy link
Member Author

/hold

@vrothberg
Copy link
Member Author

This reintroduces the perf issue.

@vrothberg
Copy link
Member Author

This reintroduces the perf issue.

False alarm. I used the wrong branch for testing -.-

@rhatdan
Copy link
Member

rhatdan commented Aug 21, 2023

LGTM

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vrothberg
Copy link
Member Author

Does anybody know what's up the Debian job?

[+2491s] kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
[10:42:58] END - [+2492s] total duration since START

@cevich @giuseppe ?

@edsantiago
Copy link
Member

Hypothesis: kill getting called with no args. Very quick check:

$ ack 'kill.*\$' contrib
contrib/cirrus/build_and_test.sh
74:        kill $(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')

Root cause is therefore: https://github.com/containers/storage/pull/1535/files#diff-27779cc9489581b15cc0f02985665abb1f2bab23026e256dc72d6823b3d8761b

@edsantiago
Copy link
Member

Suggested fix:

index 8b1f3ad10..24bec3c76 100755
--- a/contrib/cirrus/build_and_test.sh
+++ b/contrib/cirrus/build_and_test.sh
@@ -71,7 +71,10 @@ case $TEST_DRIVER in
         zfs create $zpool/tmp
         TMPDIR="/$zpool/tmp" showrun make STORAGE_DRIVER=$TEST_DRIVER local-test-integration local-test-unit
         # Ensure no datasets are held open prior to `zfs destroy` trap.
-        kill $(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')
+        foo=$(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')
+        if [[ -n "$foo" ]]; then
+            kill $foo
+        fi
         ;;
     *)
         die "Unknown/Unsupported \$TEST_DRIVER=$TEST_DRIVER (see .cirrus.yml and $(basename $0))"

...with a better name than foo, please. I don't have time right now to figure out this code or what the hell lsns is supposed to be.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If the specified parent is not ID's parent, use the naive diff.

Signed-off-by: Valentin Rothberg <[email protected]>
Commit 224f0b9 added the kill but it seems to not always be the
case there's process holding the datasets which ultimately leads to the
kill to fail and in turn CI to turn red.

Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Thanks, @edsantiago !

@@ -71,7 +71,10 @@ case $TEST_DRIVER in
zfs create $zpool/tmp
TMPDIR="/$zpool/tmp" showrun make STORAGE_DRIVER=$TEST_DRIVER local-test-integration local-test-unit
# Ensure no datasets are held open prior to `zfs destroy` trap.
kill $(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')
datasets=$(lsns -J -t mnt --output-all | jq '.namespaces[]|select(.command=="sleep 1000s").pid')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“dataset” seems to be a ZFS term, something that is held by a process but is not a process. pids?

(I know ~nothing about ZFS and whether this is the right thing to do, or why it suddenly started breaking after ~4 months.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cevich could you weigh in here please? The solution is obvious and already applied, but it would be really nice to give this code a meaningful comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZFS testing has been a PITA (mostly for me) in this repo. for a long while. However, apparently zfs-testing is actually needed due to a large user-base (I think it's Giuseppe who strongly desires it). I didn't write these script bits, but I would guess they came in when Ubuntu was used in CI. I'm not surprised it's breaking under the much more modern Debian SID. For Ubuntu it was easy/built-in to the distro, but in Debian it's a kind of add-on similar to EPEL (near as I can tell). That part alone smells like trouble to me 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe do you have any clues to share on what this datasets is used/needed for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no idea about ZFS, nor desire to keep it :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any case, ZFS failures for sure do not depend on this change as it affects only the overlay driver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe @cevich I interpret these comments as "get rid of that ZFS dataset special case"; but it could also be interpreted as "get rid of all ZFS special cases". I would like to open a cleanup PR to do so. Please clarify which of those I should do (only 3 lines, or all zfs lines). Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no idea about ZFS, nor desire to keep it :-)

Well darn my bad memory...I'm sure somebody said (strongly) that we need it in CI.
That was in response to my wanting to trash all ZFS testing. This is the only repo. that does it. That's really the only high-level context I can add here 😞

@edsantiago I really don't know/remember anything about these build_and_test.sh lines, I don't think I wrote them(?). I welcome your efforts toward making it cleaner, and if I can help somehow I will. I was just questioning if we really do need this at all, since it's so much of a corner-case in the overall/multi-project CI matrix. I think the answer is still "yes"...just wish I remembered who the strong advocate is 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone in the community cares enough to open PRs to maintain these other drivers, then it is fine but I don't see the point for us to worry about anything else than overlay. I am not even talking about RHEL support, even upstream the other backends miss too many features that are present in overlay.

So +1 on "get rid of that "get rid of all ZFS special cases" sounds right if that can simplify our life.

@vrothberg vrothberg mentioned this pull request Aug 22, 2023
@giuseppe giuseppe merged commit 6902c2d into containers:main Aug 23, 2023
@vrothberg vrothberg deleted the fix-podman-19467 branch August 23, 2023 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants