Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling early startup issues #219

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Conversation

marmarek
Copy link
Member

@marmarek marmarek commented Dec 2, 2024

Better handle stopping the service while it's still starting up. And also improve handling Xorg startup errors (much simpler alternative to #176).

In case the service is stopped while Xorg is still starting up (and
gui-agent still waits for the Xorg connectin in mkghandles), gui-agent
would exit before killing Xorg and Xorg would try connecting back to the
gui-agent forever, delaying the shutdown.

Fix this by moving signal registration earlier, before Xorg startup.
Since ghandles_for_vchan_reinitialize is now set before its fully
initialized, initialize x_pid field explicitly and leave all the other
fields zeroed (instead of random stack rubble).
Register proper signal handler for SIGCHLD, and collect the Xorg's
zombie in it.

This has two effects:
1. The main loop can explicitly exit on Xorg termination, not only via
   receiving EOF on the socket.
2. Due to not ignoring SIGCHLD anymore, accept() in mkghandles will also
   notice Xorg early exit and not wait indefinitely (it will fail with
   EINTR). For this case, improve error message.

There is still a small race on startup, if Xorg exits before reaching
accept() (or listen()) call. Handle this by checking just before
accept() call. It isn't perfect (there is still a few instructions
window where it wouldn't notice it in time), but it's good enough for
practical purposes.

QubesOS/qubes-issues#8060
Use X's logging function instead of plain perror, to ensure the message
is written in appropriate Xorg's log.
@qubesos-bot
Copy link

qubesos-bot commented Dec 3, 2024

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024121004-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024111705-4.3&flavor=update

  • system_tests_extra

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0
  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: wait_serial (wait serial expected)
      # wait_serial expected: "echo -e '[Layout]\nLayoutList=us,de' | sud...

Failed tests

4 failures
  • system_tests_extra

    • TC_00_QVCTest_whonix-gateway-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0

  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: wait_serial (wait serial expected)
      # wait_serial expected: "echo -e '[Layout]\nLayoutList=us,de' | sud...

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/119126#dependencies

2 fixed
  • system_tests_audio@hw1

  • system_tests_basic_vm_qrexec_gui_zfs

    • switch_pool: Failed (test died)
      # Test died: command 'dnf install -y ./zfs-release.rpm' failed at /...

Unstable tests

  • system_tests_audio

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
  • system_tests_audio@hw1

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...

If Xorg is going to be terminated, do not try to connect to gui-agent
anymore. This avoids infinite loop when handling SIGTERM, and properly
shutdown instead.
@marmarek marmarek merged commit 7b37186 into QubesOS:main Dec 10, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants