-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Bus error when running PublishSingleFile=true .NET 6.0 app on linux-arm (Raspbian) #62273
Comments
Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov Issue DetailsDescriptionHello, In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.
Reproduction StepsIt's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result). The minimal repro I have right now is cloning my project
Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip Expected behaviorThe app works as previously, initializes properly and executes code. Actual behaviorThe app crashes with I've asked the user to record
Sadly not very informative to me. Regression?Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar. Known WorkaroundsI'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was Right now I'm testing with the user if Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to enable for our ConfigurationHost machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+) Last working (tested) runtime: .NET 5.0.11. The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine. Other informationPlease let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) OS and therefore narrow it down. I'm trying to actively work with the user to provide more info in regards to this, you can find out conversation here: JustArchiNET/ArchiSteamFarm#2457 Thank you in advance for your interest in regards to this issue.
|
Tagging subscribers to this area: @agocke, @vitek-karas, @VSadov Issue DetailsDescriptionHello, In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.
Reproduction StepsIt's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result). The minimal repro I have right now is cloning my project
Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip Expected behaviorThe app works as previously, initializes properly and executes code. Actual behaviorThe app crashes with I've asked the user to record
Sadly not very informative to me. Regression?Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar. Known WorkaroundsI'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was Right now I'm testing with the user if Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to suggest for users dealing with this issue in our ConfigurationHost machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+) Last working (tested) runtime: .NET 5.0.11. The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine. Other informationPlease let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) linux-arm OS and therefore gather more info required to fix the problem. I'm trying to actively work with the user to provide more info in regards to this, you can find our conversation here: JustArchiNET/ArchiSteamFarm#2457 Thank you in advance for your interest in regards to this issue.
|
@VSadov Could you take a look? |
Since this is specific to a particular environment, it could be hard to reproduce. I wonder what additional diagnostics we can get from the user. Perhaps a core dump could be helpful, if available? |
I'll try to work with the user to provide core dump if possible, albeit one wasn't generated on the signal alone it seems, thanks for a tip. |
@VSadov it took some time but I got the core dump from the user. ArchiSteamFarm-7-0-0-24158-1640748878.dump.zip It was generated with I don't know how useful that dump is as I'm not skilled enough to analyze it, but I hope it'll aid debugging and hint potential cause of this issue. Thank you in advance for looking into it. |
So far the dump was not very helpful.
The only new information is that the crash could be caused by accessing a misaligned memory location. I am not sure how singlefile packaging could cause that and if it does, why the problem is not more common. |
@janvorli - Jan, maybe you have some ideas how to make progress on this? |
Looking at the dump on my RPI4, it is really a misaligned access:
Unaligned access handling can be set on Linux as described in https://mjmwired.net/kernel/Documentation/arm/mem_alignment. There are three options - the kernel handles it, but prints a warning message, the kernel handles it silently or the kernel generates SIGBUS. I was able to easily repro the crash after issuing this command on my RPI4 (without any docker container) echo 4 | sudo tee /proc/cpu/alignment |
Here is the stack trace at the crash:
|
And here the code of the crashing method:
|
I guess the issue is caused by this code: runtime/src/native/corehost/bundle/file_entry.cpp Lines 22 to 23 in bd6a64b
|
@JustArchi the workaround for the issue until it is fixed is to execute the following on the affected devices:
This makes the kernel handle the unaligned accesses and make apps work fine (only a tiny bit slower due to the trap to kernel on each unaligned access ). |
@VSadov these reads need to be replaced by something like the GET_UNALIGNED_64 functions we have in coreclr: runtime/src/coreclr/pal/inc/pal_endian.h Lines 121 to 126 in bd6a64b
|
Right. I just wonder why we did not see this issue before. Is it more common to configure kernels to handle misaligned reads (vs. SIGBUS)? Is it also the case on Apple M1? |
By default, Linux has alignment set to let kernel handle misaligned accesses gracefully. So it only occurs when the setting differs from the default, which I guess is rare. |
I see. We used to read the entire I will make a fix. It looks like it may need to be ported to 6.0 as well. Thanks for getting to the bottom of this!!! |
@janvorli @VSadov thank you a lot guys for getting to the bottom of this, and double thanks for valid workaround @janvorli. I appreciate a lot your time put into this, as I was unable to gather more info myself. Everything you've said so far makes complete sense to me - as I said, I was unable to reproduce it myself, including help of other users, so chance is that it's indeed a bit rare, but still valid, machine environment configuration of affected user that causes that. I'm not sure why that machine differs from all other Raspberry setups, perhaps user is running some other software that affects that CPU alignment setting or something like that. In any case I'm very happy that we've managed to tackle this one down. Thanks again, you're awesome 🎉. |
You are welcome! |
As a side note, based on my further discussion with the affected user, it seems that he has installed That could explain that weird value of He was also apparently running some kind of "modified kernel" which was If you want to confirm or deny the above, it should be enough to install all .NET prereqs such as In any case, I was just wondering why this CPU alignment value doesn't follow the default for OS - this seems to be the reason. The root cause for the issue remains the same, but this hopefully answers why nobody ran into this before, as @VSadov was wondering. |
I'll keep this open to track porting to 6.0 |
Fixed in #63519 |
Description
Hello,
In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with
Bus error
(so to the best of my knowledge kernel sendingSIGBUS
to the process).This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.
Reproduction Steps
It's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result).
The minimal repro I have right now is cloning my project
git clone https://github.com/JustArchiNET/ArchiSteamFarm.git
and checking out876c3324526d0fe6b0a801210b63f663a4eb816c
commit. The minimal build instructions I managed to pull it with was:Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip
Expected behavior
The app works as previously, initializes properly and executes code.
Actual behavior
The app crashes with
Bus error
(so to the best of my knowledge kernel sendingSIGBUS
to the process). This happens before initialization of my app takes place (first line logged to the console), so it's likely something related to decompression in-memory process of the single-file app.I've asked the user to record
COREHOST_TRACE=1
, this was the output it recorded before crashing:Sadly not very informative to me.
Regression?
Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g.
linux-arm64
,linux-x64
,win-x64
), it's also not reproducible even in alllinux-arm
setups, I didn't receive such error from other users, and we've tried to reproduce it ourselves.According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar.
Known Workarounds
I'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was
PublishSingleFile=false
.Right now I'm testing with the user if
IncludeNativeLibrariesForSelfExtract=true
orIncludeAllContentForSelfExtract=true
helps with this issue.Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to suggest for users dealing with this issue in our
linux-arm
builds while this issue is investigated.Configuration
Host machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+)
Last working (tested) runtime: .NET 5.0.11.
First not-working (tested) runtime: .NET 6.0.0
The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine.
Other information
Please let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) linux-arm OS and therefore gather more info required to fix the problem.
I'm trying to actively work with the user to provide more info in regards to this, you can find our conversation here: JustArchiNET/ArchiSteamFarm#2457
Thank you in advance for your interest in regards to this issue.
The text was updated successfully, but these errors were encountered: