Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudo high cpu usage after terminal is closed #841

Closed
blazp7 opened this issue Apr 19, 2024 · 18 comments · Fixed by #845
Closed

Sudo high cpu usage after terminal is closed #841

blazp7 opened this issue Apr 19, 2024 · 18 comments · Fixed by #845
Labels
bug Something isn't working C-exec Execution component (interfacing with OS)

Comments

@blazp7
Copy link

blazp7 commented Apr 19, 2024

Describe the bug
Frequently the sudo process is left running at 100% on one cpu core, after the terminal is already closed. When i hear the cpu fans i know there is atleast one sudo that i need to kill manually.

To Reproduce
Steps to reproduce the behavior:

  • in terminal for example Alacritty sudo su
  • with nix temporarily download an app nix shell nixpkgs#sysbench
  • kill the terminal (i close it with a Hyprland hotkey)
  • sudo process now starts to run at 100% on one core

Expected behavior
sudo process doesnt torture my cpu.

Environment (please complete the following information):

  • Linux distribution: NixOS 24.05.20240329.d8fe5e6 (Uakari) x86_64
  • sudo-rs 0.2.2

Additional context
I use Hyprland window manager, maybe it has something to do with how it closes apps? Or maybe nix shell causes it.

@blazp7 blazp7 added the bug Something isn't working label Apr 19, 2024
@blazp7 blazp7 changed the title Sudo high cpu usage after teminal is closed Sudo high cpu usage after terminal is closed Apr 19, 2024
@pvdrz
Copy link
Collaborator

pvdrz commented Apr 20, 2024

@blazp7 does this happen with OG sudo as well?

Edit: I can't reproduce this while downloading a large file and killing alacritty on gnome.

@squell
Copy link
Member

squell commented Apr 22, 2024

I remember we fixed this related issue during the Berlin Meeting: #263. We could also have a look at the relevant code where the potential loop might be.

Of course, @pvdrz, if we fix this we are getting really close to your favorite scenario

@squell
Copy link
Member

squell commented Apr 24, 2024

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

@pvdrz
Copy link
Collaborator

pvdrz commented Apr 24, 2024

@squell the only place where something similar could happen would be here: https://github.com/memorysafety/sudo-rs/blob/9e7a4c4fa48ec6e411e5cb3b2ae0a4258227a856/src/exec/event.rs#L222-L243

But that would mean that there aren't any file descriptors being polled or that the polling is failing consistently, which I find very unlikely as all the signal handling is done via sockets that are polled there.

@blazp7
Copy link
Author

blazp7 commented Apr 24, 2024

It seems that just exiting a terminal where sudo su is active causes this issue. I have tried multiple terminal emulators, same issue. I have tried to exit the terminal app with killall instead of hotkey, same issue. So it probably isnt Hyprland or Alacritty at fault. If you guys cant reproduce this problem i would say it has something to do with how NixOS packages the binary and wraps it in a module. Or just how NixOS is.

recording.webm

@pvdrz
Copy link
Collaborator

pvdrz commented Apr 24, 2024

I don't seem to be able to reproduce it on my system:

Screenshot from 2024-04-24 08-42-52

@blazp7
Copy link
Author

blazp7 commented Apr 24, 2024

Note: the example you cite is sudo su; is that our su implementation or the "regular" version? Does it also occur with say something like sudo sleep 600 or sudo yes > /dev/null?

Only sudo-rs, not the regular version. And only sudo su then killing the terminal, i tried the other commands you posted. Edit: nixos uses this su implementation from here

Might or might not be relevant, dont know if this is the default behaviour but someone from nixos matrix chat said:

  • Don't do sudo su. It spawns a sudo process, which then spawns a su process as root, which then spawns a shell.
  • resized

@squell
Copy link
Member

squell commented Apr 24, 2024

(Yes, sudo -s or sudo -i is generally what you should do instead of sudo su, but that shouldn't cause this issue.)

Does it happen with sudo sudo -s as well?

@blazp7
Copy link
Author

blazp7 commented Apr 24, 2024

Does it happen with sudo sudo -s as well?

No. After i close the terminal that has an active sudo sudo -s or sudo bash every sudo process is properly closed as expected. This points to su being the culprit? However when i do su myuser there are no issues.

@squell
Copy link
Member

squell commented Apr 24, 2024

I'm suspecting an interaction between util-linux nixos's su and sudo-rs based on your answers; even though su seems to trigger it, I'd say the problem lies with sudo-rs (since the child process shouldn't be able to influence it in this way).

@squell squell added the C-exec Execution component (interfacing with OS) label Apr 24, 2024
@blazp7
Copy link
Author

blazp7 commented Apr 25, 2024

Alright, well i learned that i should use sudo -s and sudo -i, but if youre going to troubleshoot this interaction just remember that nixos uses this su implementation -> https://github.com/shadow-maint/shadow/blob/master/src/su.c

@squell
Copy link
Member

squell commented Apr 25, 2024

That's very useful info (I didn't know shadow-utils also packaged a su implementation). @pvdrz

@AppleSheeple
Copy link

I have commands showing the same symptoms. They look like:

sudo -- su root -c ip netns exec SOME_NS su someuser -c "'mate-terminal' '--disable-factory'"

Self-built sudo-rs 0.2.2 on Arch.
su from util-linux.

strace shows these two lines repeating:

poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1 ([{fd=4, revents=POLLIN|POLLERR|POLLHUP}])
read(4, "", 8192)                       = 0

Never seen this behavior before installing sudo-rs. But It's worth noting that I updated many system packages at the same time. So other variables could be relevant, but I highly doubt it.

Is that the same issue?


I know Rust if you think fiddling with/debuging a specific part of the code would help.

@pvdrz
Copy link
Collaborator

pvdrz commented May 9, 2024

@AppleSheeple thanks for the report!

It seems one of the sockets we're using is being reported as ready constantly (file descriptor 4 in your strace output) when being polled.

I have a hunch this has to do with a socket on the other "side" being closed (we always use socketpair here) and polling socket number 4 returns immediately because it is actually returning an error.

We may be handling this error improperly. I'll try to take a look at this next week.

@AppleSheeple
Copy link

I had some time for a proper initial investigation (didn't get to the code yet).

What's happening is that I run this stuff from a script in a terminal. But I background and detach this run.

So:

./script_with_a_sudo_call_inside.sh &!

fd 4 is not actually a socket. It's actually /dev/tty.
Now, if I don't close the terminal from where this script is running, the issue is not triggered.
But once I do (/dev/pts/<n> associated with stdin/stdout/stderr is deleted), the issue is triggered.
So, in dev logs, you get endless lines of:

src/exec/use_pty/parent.rs:353:24: event Tty(Readable) is ready

@AppleSheeple
Copy link

I gave the code a look, and confirmed that polling here does NOT return an error:
https://github.com/memorysafety/sudo-rs/blob/90e2385b866dde529837da9e059ec0f1e9cd83c6/src/exec/event.rs#L182

So, even if that error wasn't ignored, the issue would still be triggered.

I don't think I will have the time to chase this further.

@AppleSheeple
Copy link

Alright. Had some time to look at this again.

This change fixes the issue for me:

diff --git a/src/exec/use_pty/pipe/mod.rs b/src/exec/use_pty/pipe/mod.rs
index 7f02f06..b6046c7 100644
--- a/src/exec/use_pty/pipe/mod.rs
+++ b/src/exec/use_pty/pipe/mod.rs
@@ -7,6 +7,7 @@ use std::{
 };
 
 use crate::exec::event::{EventHandle, EventRegistry, PollEvent, Process};
+use crate::log::dev_warn;
 
 use self::ring_buffer::RingBuffer;
 
@@ -58,6 +59,19 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         &self.right
     }
 
+    pub(super) fn tty_gone<T: Process>(&mut self, registry: &mut EventRegistry<T>) -> bool {
+        let gone = unsafe {
+            // Check if tty which existed is now gone.
+            // NOTE: errno set is EIO which is not a documented possibility.
+            libc::tcgetsid(self.left.as_raw_fd()) == -1
+        };
+        if gone {
+            dev_warn!("tty gone (closed/detached), ignoring future events");
+            self.ignore_events(registry);
+        }
+        gone
+    }
+
     /// Stop the poll events of this pipe.
     pub(super) fn ignore_events<T: Process>(&mut self, registry: &mut EventRegistry<T>) {
         self.buffer_lr.read_handle.ignore(registry);
@@ -80,6 +94,9 @@ impl<L: Read + Write + AsRawFd, R: Read + Write + AsRawFd> Pipe<L, R> {
         poll_event: PollEvent,
         registry: &mut EventRegistry<T>,
     ) -> io::Result<()> {
+        if self.tty_gone(registry) {
+            return Ok(());
+        }
         match poll_event {
             PollEvent::Readable => self.buffer_lr.read(&mut self.left, registry),
             PollEvent::Writable => self.buffer_rl.write(&mut self.left, registry),

@AppleSheeple
Copy link

Just ran into this on another device where I forgot to install a patched version.
Quite annoying.

@squell squell linked a pull request Jun 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C-exec Execution component (interfacing with OS)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants