-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build-and-deploy: use update-via-pacman.ps1
instead of a home-grown pacman -Syyu
#61
Conversation
…dated When `pacman -Syyu` (whose job is to update the packages in the Git for SDK) updates the MSYS2 runtime in a Bash step that uses said MSYS2 runtime, the step will stop and fail. A better way to run that command is via PowerShell, and an even better way is to use the same script that the `sync` automation of the `git-sdk-*` repositories already uses. So let's do that. Signed-off-by: Johannes Schindelin <[email protected]>
@dscho I think this change is consistently giving us the hang from Both your deploy job for gnutls and my deploy job for openssl got stuck on
|
And that roughly coincides with my last attempt to debug the hanging deployments for git-for-windows/MINGW-packages#105 that got stuck on |
EDIT: It did, but got unstuck by manual intervention. See git-for-windows/MINGW-packages#109 (comment) |
Looking into this some more, Maybe |
Yes, and it's a child process (marked with an identical command-line) of
The problems continue:
This happens only the first time, though, a subsequent
The stack traces are useless:
Trying to attach
Good idea! I was able to reproduce the hang (or at least a hang, it might be for a different reason) when running the
The output continues only very slowly, though, which may or may not be of relevance here. It feels as if something is putting weights on the cart, so to say, I could imagine that there are lots of waiting for objects with lots of timeouts. Eventually, the execution does reach that last
I also prefixed that
For the record, the relevant processes at that time are, according to $ wmic process where "commandline like '%pacm%'" get commandline,parentprocessid,processid
CommandLine ParentProcessId ProcessId
C:\git-sdk-arm64-full\usr\bin\pacman.exe -Syyu --overwrite=* --noconfirm 2792 10672
"C:\git-sdk-arm64-full\usr\bin\bash.exe" -lc "strace -fo /tmp/a1 pacman --verbose --debug -S --overwrite=\* --noconfirm mingw-w64-clang-aarch64-git-extra" 5920 5660
C:\git-sdk-arm64-full\usr\bin\strace.exe -fo C:/Users/self-runner-debug/AppData/Local/Temp/a1 pacman --verbose --debug -S --overwrite=* --noconfirm mingw-w64-clang-aarch64-git-extra 5660 10524
pacman --verbose --debug -S --overwrite=* --noconfirm mingw-w64-clang-aarch64-git-extra 10524 10624
pacman --verbose --debug -S --overwrite=* --noconfirm mingw-w64-clang-aarch64-git-extra 10624 10052
C:\Windows\System32\Wbem\wmic.exe process where "commandline like '%pacm%'" get commandline,parentprocessid,processid 8476 8076 According to $ ps
PID PPID PGID WINPID TTY UID STIME COMMAND
2458 1 2458 10624 cons0 197108 22:23:08 /usr/bin/pacman
2529 2458 2458 10052 cons0 197108 22:23:31 /usr/bin/pacman
1901 1900 1901 11156 pty1 197108 22:06:16 /usr/bin/bash
1900 1 1900 2136 ? 197108 22:06:15 /usr/bin/mintty
2587 1901 2587 568 pty1 197108 22:36:18 /usr/bin/ps
2118 1 2064 10672 cons0 197108 22:10:44 /usr/bin/pacman
1840 1839 1840 11040 pty0 197108 21:42:04 /usr/bin/bash
1839 1 1839 10928 ? 197108 21:42:04 /usr/bin/mintty
2441 1 2322 10524 cons0 197108 22:23:07 /usr/bin/strace
After
Here is another potentially interesting part of the
This seems to happen with or without |
Binutils doesn't support AArch64 Windows yet but LLDB should work if you still need a debugger. |
@mati865 thank you for the advice! In this instance, I am still working on the x86_64 MSYS2 runtime, even if it is running in an ARM64 VM: There is no aarch64 MSYS2 runtime as of time of writing. |
They might still have a point that LLDB could be helpfull. |
And I just tested the linked script to check for CHPEv2 DLLs. |
Looking at llvm/llvm-project@48feef2, LLDB probably wouldn't know what to do with these DLLs here either. |
Oh, sorry. I had forgot it's still x86_64 when I saw AArch64 libs in the output. I don't know how the emulation works but if's close enough to real x86_64 gdb should also work if you somehow convince it to find x86_64 libs. |
Hmm. Y'all got a good point. For shiggles, I ran a simple program in Sadly, WinDbg seems to have problems attaching, and Visual Studio (while it can attach) produces stacktraces that look quite dubious, highly likely mixing up function symbols within |
Visual Studio doesn't really work with DWARF debug information. Supposedly, PDB support in binutils was fixed last year, so you would need to rebuild the library and add |
I hope you have better luck than I have trying to track this down. My latest attempt was to backport a change that didn't attempt to even run the What I've had to do to get some sort of reasonable stack traces: attach with windbgx from the store (it says there's a new version available that's not in preview, but I haven't tried that yet). It times out on initial breakin. Then I hit the "break" toolbar button, and do I did just see that I reproduced the hang on x86_64 on windows 11 arm64, but windbgx is giving me the old |
In case it helps your investigation, here's what I'm currently seeing on i686:
Where it disappears into kernel32/kernelbase is calling TerminateThread. Examining the stack in windbg shows it passing 0x7a8, which matches the handle in the cygthread. Examining the handle with I was just able to reproduce the first hang I saw, which was in |
Previously, looking at the code, with constructs like HANDLE close_h;
if (h)
{
close_h = h;
h = NULL;
ForceCloseHandle1 (close_h, pinfo_shared_handle);
} I was thinking maybe there are expected to be multiple threads doing this, in which case maybe they want something like |
Seen the CloseHandle variant a few more times, one time |
I tried The backtrace from lldb does not go very far, but from what I can tell trying to symbolicate the backtrace from windbg (which has some extra garbage in it) this also occurs from do_exit -> pinfo::exit -> proc_terminate, calling pinfo::release instead of cygthread::terminate_thread, which are both called from the same loop over chld_procs. |
I don't know... I tried that diff --git a/winsup/cygwin/local_includes/debug.h b/winsup/cygwin/local_includes/debug.h
index 858f8d5dcd..59e581dafb 100644
--- a/winsup/cygwin/local_includes/debug.h
+++ b/winsup/cygwin/local_includes/debug.h
@@ -9,7 +9,7 @@ details. */
#define being_debugged() (IsDebuggerPresent ())
-#ifndef DEBUGGING
+#if 1 // #ifndef DEBUGGING
# define cygbench(s)
# define ForceCloseHandle CloseHandle
# define ForceCloseHandle1(h, n) CloseHandle (h) That forces all the Then I ran my reproducer. It worked once! So I was excited. But the next attempt hung in the very same way, twice. I got it unstuck both times by running these PowerShell statements: $p = (Get-CimInstance Win32_Process |`
Where-Object {$_.CommandLine -like '*pacman*'} |`
Select-Object CommandLine, ParentProcessId, ProcessId |`
Group-Object CommandLine |`
Where-Object {$_.Count -gt 1} |`
Select-Object -ExpandProperty Group)
if ($p.Length -eq 2 -and $p[0].processId -eq $p[1].parentProcessId) {
Stop-Process -Force -Id $p[1].processId
} @jeremyd2019 I wonder whether you know what this child process is that is hanging, where it is spawned and what its purpose is? |
I don't know what the child process is for. Perhaps it is the The |
Needs git-for-windows/build-extra#539 to be merged first.
The intricacies of updating the Git for Windows SDKs are not quite trivial. For one, you have to make sure to call
pacman
another time if "system packages" such asmsys2-runtime
have been updated. Also, in the endmingw-w64-git-extra
has to be installed so that that package's post-install script can adjust a couple of system files in accordance with Git for Windows' requirements.These intricacies are documented and automated in the
update-via-pacman.ps1
script that is run as part of thesync
GitHub workflows that are installed in thegit-sdk-*
repositories.It only makes sense to use the same automation when building Git for Windows' packages, so let's do this.
This fixes #29