Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yabai seems to partially stop working after upgrading to Big Sur #714

Closed
4 tasks done
adityavm opened this issue Nov 13, 2020 · 35 comments
Closed
4 tasks done

Yabai seems to partially stop working after upgrading to Big Sur #714

adityavm opened this issue Nov 13, 2020 · 35 comments
Labels
bug Something isn't working

Comments

@adityavm
Copy link

adityavm commented Nov 13, 2020

I've followed all instructions re: adding yabai to sudoers and adding the --load-sa line to yabairc. However, yabai seems to randomly (but only partially) stop working some time after starting (or restarting) the service.

(I've created a binding for skhd to restart the service, but that's just a temporary relief)

I can make a list of things I've discovered work / don't work:

  • Tiling
  • Signals (handles refreshing Ubersicht)
  • Moving between spaces
  • No commands work (eg. yabai -m query --spaces hangs)

Yabai v3.3.3
MacOS Big Sur 11.0.1

(I have a screen recording showing it go from working to not working in a span of 30s if it helps.)

@pyrho
Copy link

pyrho commented Nov 13, 2020

I can confirm this behavior on my end too.
When the "lock" occurs, running a command like yabai -m window --focus west will spawn a yabai process that never terminates.

$>ps aux |grep -i yabai | wc -l
31

And this number grows with each command issued by skhd (or others).

/usr/local/var/log/yabai/yabai.err.log contains no data.

Restarting yabai via brew kills all the "zombie" processes, sometime it fixes the whole situation, sometimes one or more restarts are necessary.

I have yet to find a scenario to help reproduce this issue reliably.

Happy to help by providing logs and stuff as needed.

@pyrho
Copy link

pyrho commented Nov 13, 2020

(for the record, I'm on bigsur and updated yabai via homebrew as mentioned in the wiki, I also did the sudoers trick and checked that it works, and I'm using Ubersicht (with yabar) and stackline ).

I've enabled debug logging and repdroduced the issue a few times, I don't have one scenario locked down, but under a minute of window focusing and space changing (via the touchpad, not via yabai -m space) usually does the trick.

I've uploaded the debug log in this gist.
This was triggered by spamming focus events in the same space, nothing else.

After the last line in this log, yabai won't answer to any command sent by skhd (ie. yabai -m window etc.) but it's still very much alive (I know because the .out.log file keeps reporting things). No error messages (in the .out nor in the .err log files).

@koekeishiya
Copy link
Owner

If this happens, please open activity monitor and grab a sample of the yabai process and paste in a gist or something.

@pyrho
Copy link

pyrho commented Nov 13, 2020

If this happens, please open activity monitor and grab a sample of the yabai process and paste in a gist or something.

Here you go

Not sure I mentioned it earlier, but this is very easy to reproduce (in my case at least), I can't use yabai fore more than a minute before having to restart it.

@stoffeastrom
Copy link

@pyrho I had similar issue first.

rm -rf /Library/ScriptingAdditions
brew services stop yabai
brew services stop skhd
brew uninstall yabai
brew uninstall skhd
brew cleanup
brew install yabai
brew install skhd
brew services start yabai
brew services start skhd

Allowed accessibility in system preferences again.
Now the yabai processes doesn't pile up any more. However I can't create a new space.

yabai -m space --create

Doesn't give any errors or nothing but no new space.

yabai -m space --destroy

gives me

acting space is the last user-space on the source display and cannot be destroyed.

which makes sense

@adityavm
Copy link
Author

@stoffeastrom after running all your steps, i don't get an ever expanding list of yabai processes. however, it does still hang. here's a current dump of my yabai processes:

https://gist.github.com/adityavm/f6b2471e65d40462e2dd8c5f1565f6e9

@koekeishiya
Copy link
Owner

Based on that sample I would say that accessibility permission is messed up. Try to stop the brew service, open system preferences and remove accessibility permissions, reboot the system. Start the brew service and re-approve the accessibility permissions.

I did a clean install of Big Sur on my machine and forgot to backup the previous certificate that was used for signing the binary, but the accessibility preferences doesn't usually notice this and will be in some wacky state.

@koekeishiya
Copy link
Owner

I did manage to reproduce the issue although I am unsure as of now what the cause is.

@koekeishiya koekeishiya added the bug Something isn't working label Nov 13, 2020
@pyrho
Copy link

pyrho commented Nov 13, 2020

@stoffeastrom I tried that. Still have the problem (and the growing number of processes once the yabai daemon (?) stops answering back to client requests).

@koekeishiya thanks for looking into it. I did disallow, uninstall, reboot, reinstall, re-allow. Still got the issue.
If it was an accessibility issue, wouldn't it just not work; rather than work for some time then die ?

@koekeishiya
Copy link
Owner

So I can somewhat reproduce the issue; I can make it hang a single instance if I really spam messages, but if I kill that one instance everything works again. I do not get tons of zombie processes, and the main instance is still responding if I kill that instance.

Tested by opening 4 windows in a space, and running the following infinite loop:

while true; do yabai -m window --focus next || yabai -m window --focus first; done

@pyrho
Copy link

pyrho commented Nov 13, 2020

@koekeishiya I can confirm that everything gets "unlocked" once I kill the "client" process that caused the issue.

I feel like this (killing the culprit) "unstacks" pending calls too.

edit: the unstacking behavior is confirmed too as all the other processes that were "waiting" have now disappeared.

@koekeishiya
Copy link
Owner

koekeishiya commented Nov 13, 2020

I don't actually get it, the amount of time it takes before a freeze is absolutely random. I've had it take literally 3seconds and now this latest run took closer to 3minutes of just the running infinite loop I posted above.

Here is a short snippet of the event queue when we post, dequeue and handle, and finish handling a message.
Some of the items in queue pops are from me moving the cursor and other interactivity while this is happening:

POSTING MSG EVENT FROM: 6
items in queue: 3
items in queue: 2
items in queue: 1
items in queue: 0
HANDLING EVENT FROM: 6
FINISHED HANDLING EVENT FROM: 6
POSTING MSG EVENT FROM: 6
items in queue: 1
items in queue: 0
HANDLING EVENT FROM: 6
FINISHED HANDLING EVENT FROM: 6
items in queue: 1
items in queue: 0
POSTING MSG EVENT FROM: 6
items in queue: 1
items in queue: 0
HANDLING EVENT FROM: 6
FINISHED HANDLING EVENT FROM: 6
items in queue: 1
items in queue: 0
POSTING MSG EVENT FROM: 6
items in queue: 1
items in queue: 0
HANDLING EVENT FROM: 6
items in queue: 1
FINISHED HANDLING EVENT FROM: 6
items in queue: 0
items in queue: 1
items in queue: 2
items in queue: 1
items in queue: 0
POSTING MSG EVENT FROM: 6
items in queue: 1
items in queue: 0
HANDLING EVENT FROM: 6
FINISHED HANDLING EVENT FROM: 6

# stops receiving new connections here

# other interactivity shows that yabai itself is responsive here..

So after some random amount of time we appear to simply just stop receiving connection requests from/on the socket.
I am fairly certain that I did not have this issue on Catalina.. not sure where to proceed from here.

@pyrho
Copy link

pyrho commented Nov 13, 2020

I am fairly certain that I did not have this issue on Catalina.. not sure where to proceed from here.

Me too !

From looking at the code, when the issue happens, the daemon thread seems to be stuck on the socket_read call.
So I'm thinking maybe the issue is with the "client" part.

Relevant sample snippet:

    2818 Thread_1651892
    + 2818 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2032b47b]
    +   2818 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff2032f950]
    +     2818 socket_connection_handler  (in yabai) + 87  [0x105cef6b7]
    +       2818 socket_read  (in yabai) + 169  [0x105cef2b9]
    +         2818 read  (in libsystem_kernel.dylib) + 10  [0x7fff202fb89e]

The code is here, so the client is accepted by the daemon but it seems like the client then fails to send any data.

(Just throwing some ideas around, maybe it'll light a magic bulb above your head :D)

edit: So if my intuition is right, the issue might be in the client_send_message function.

edit2:

So after some random amount of time we appear to simply just stop receiving connection requests from/on the socket.

It stops because it's stuck on the read call of the daemon socket, so that infinite while(daemon) loop does not come back to the accept call, so all further client requests wait on their own connect call until the daemon is ready to accept again.

edit3: well I git blame'd client_send_message and nothing has changed since end of 2019, so either there's a behavior change in the big sur libc, or my intuition is shit (probably the latter :P)

@koekeishiya
Copy link
Owner

I don't see how this is a bug in the yabai code. Am I stupid?

Screenshot 2020-11-14 at 00 45 25

client (bottom terminal window) sends message, shuts write pipe from its side, and waits for a response..
yabai (top terminal window) blocks in recv claiming no bytes have been sent.

@pyrho
Copy link

pyrho commented Nov 14, 2020

@koekeishiya Could you debug print the value for message and message_length in the client_send_message function ? I'm thinking it might have to do with one-off errors in the various length computation occuring there.

If you're into that we can rubber duck on zoom or something and maybe I could help you shed some light on that. (disclaimer I'm a typescript dev, my C days are long gone, so don't expect K&R level expertise ^^)

@koekeishiya
Copy link
Owner

koekeishiya commented Nov 14, 2020

I changed how the listener interacts with the yabai event-loop and it solved the issue for me, although it should have no effect in practice. Can you build the latest commit from master to verify?

koekeishiya added a commit that referenced this issue Nov 14, 2020
@pyrho
Copy link

pyrho commented Nov 14, 2020

I changed how the listener interacts with the yabai run-loop and it solved the issue for me, although it should have no effect in practice. Can you build the latest commit from master to verify?

Simple as make all and just overwrite the yabai bin ? if so, Im on it.

@koekeishiya
Copy link
Owner

If you are using brew, the easiest would be to brew uninstall, and then brew install yabai --HEAD

@pyrho
Copy link

pyrho commented Nov 14, 2020

Well make all && cp bin/yabai /usr/local/bin/yabai seems to have done the trick.
But I'm sorry to report that I can still reproduce with the while true; do yabai -m window --focus next || yabai -m window --focus first; done command 😢

For good measure I also tried via homebew --HEAD (followed by a restart and a re-grant in the accessibility tab), same result, no dice.

@koekeishiya
Copy link
Owner

This feels so random; it just ran fine for 4min 10sec before it froze. Then the next run lasted 10sec..

@pyrho
Copy link

pyrho commented Nov 14, 2020

This feels so random; it just ran fine for 4min 10sec before it froze. Then the next run lasted 10sec..

Yeah it's very random over here too, that's why I was thinking it was maybe a mis-initiliazed buffer; so the randomness would be caused by some random stuff sitting at the particular memory address.

edit: Now that I know how to build and run, I'll do some more good'ol printf debugging here and there see what I can find.

@koekeishiya
Copy link
Owner

Agreed; that does seem like the most plausible reason why this would be happening. Likely the client_send_message function.

@pyrho
Copy link

pyrho commented Nov 14, 2020

Well...

This patch:

diff --git a/src/yabai.c b/src/yabai.c
index dacee39..87c5f36 100644
--- a/src/yabai.c
+++ b/src/yabai.c
@@ -83,9 +83,12 @@ static int client_send_message(int argc, char **argv)
         *temp++ = '\0';
     }
 
+                fprintf(stderr, "[%s]\n", message);
+                fprintf(stderr, "[%i]\n", message_length);
     if (!socket_write_bytes(sockfd, message, message_length)) {
         error("yabai-msg: failed to send data..\n");
     }
+        error("yabai-msg: SENT!\n");
 
     shutdown(sockfd, SHUT_WR);

Seems to fix the issue for me; I've run it 5 times and no crashes so far.

But it doesn't make sense.

Unless printing shit with a \n in it flushes something and makes shit work.

@pyrho
Copy link

pyrho commented Nov 14, 2020

Oh wait, error actually exits the program, I thought i was only printing to stderr.
I can confirm that only the error line in the above patch fixes the issue (the two preceding fprintf have no effect).

So the issue is that the client never hears back from the daemon.

Maybe it's because the daemon handles the request so fast that it replies before the client has a chance to open the socket to read its reply? Hence the randomness ?

@koekeishiya
Copy link
Owner

I don't really understand why the randomness is there. I've verified that the client_send_message function properly builds the message. The buffer is not left uninitialised in any way and we do not overwrite the buffer or have any kinds of buffer overflows. The only thing I'd imagine would be that the loop in socket_read does not terminate after successfully reading the message.

@pyrho
Copy link

pyrho commented Nov 14, 2020

Which function is called by the daemon as a response to a focus event ?

It seems that the client is stuck at the poll call in client_send_message, I'd like to make sure the daemon attempts a response, but I'm not sure where to find it (;

@pyrho
Copy link

pyrho commented Nov 14, 2020

I might be on to something, I've printed what is received by the client in the while(poll) body.

            if ((byte_count = recv(sockfd, rsp, sizeof(rsp)-1, 0)) <= 0) {
                fprintf(stderr, "IN BREAK: [%s]\n", rsp);
                break;
            }

and it prints what looks like random memory, which does not sound good; thoughts ?

edit: well in all cases (only playing with -m window --focus) byte_count is 0, so I guess I'm seeing the unintialized memory of the rsp buffer.

And I also guess that the daemon just closes the socket file descriptor instead of sending a message, so then the issue becomes that the daemon fails to close the socket or closes it before poll() is executed on the client side.

@pyrho
Copy link

pyrho commented Nov 14, 2020

So in the meantime, anyone looking for a work around can apply this patch:

diff --git a/src/yabai.c b/src/yabai.c
index dacee39..f92e215 100644
--- a/src/yabai.c
+++ b/src/yabai.c
@@ -90,6 +90,9 @@ static int client_send_message(int argc, char **argv)
     shutdown(sockfd, SHUT_WR);
 
     int result = EXIT_SUCCESS;
+    socket_close(sockfd);
+    return result; // TEMP HACK
+
     int byte_count = 0;
     char rsp[BUFSIZ];

Build: make all, and then install it by doing cp bin/yabai /usr/local/bin/yabai, followed by brew services restart yabai.

this is a very shitty and temporary workaround that will break anything trying to query yabai (like Ubersicht or Hammerspoon); but will let you use yabai until a proper fix is given to us by the gods of random bugs and late nights.

@koekeishiya
Copy link
Owner

Should be fixed on master now.

@pyrho
Copy link

pyrho commented Nov 14, 2020

Fix confirmed on my end, thank you so much ! 💯

@koekeishiya
Copy link
Owner

Released v3.3.4 with this fix.

@adityavm
Copy link
Author

@koekeishiya thanks for the fix. confirmed on my end as well. i see you closed this but the moving between spaces still doesn't work

@koekeishiya
Copy link
Owner

See this issue regarding focusing spaces: #712

Summary: Make sure SIP is disabled for debugging and filesystem. Reinstall the scripting-addition from the latest yabai version and load it as root. Instructions https://github.com/koekeishiya/yabai/wiki/Installing-yabai-(latest-release) have been updated and may be of interest.

@adityavm
Copy link
Author

@koekeishiya i had done all of that, but looks like it didn't work because the --install-sa and --load-sa silently failed because /Library/ScriptingAdditions didn't exist (maybe i removed trying to follow instructions in other issues). it's probably worth adding that as a footnote in the instructions.

i've managed to get it all working again. thanks for getting this fixed up!

@stoffeastrom
Copy link

@adityavm Sorry I think that was me giving that faulty instruction 😬.

And for others I finally got it all working also after @koekeishiya comment above

Summary: Make sure SIP is disabled for debugging and filesystem. Reinstall the scripting-addition from the latest yabai version and load it as root. Instructions koekeishiya/yabai/wiki/Installing-yabai-(latest-release) have been updated and may be of interest.

e.g had to run:

csrutil disable --with kext --with dtrace --with nvram --with basesystem

to get it working. Thx all for enabling big sur and a extra shout out for @koekeishiya ofc 👏

unrevre pushed a commit to unrevre/yabai that referenced this issue Jan 26, 2022
unrevre pushed a commit to unrevre/yabai that referenced this issue Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants