Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Bus error when running PublishSingleFile=true .NET 6.0 app on linux-arm (Raspbian) #62273

Closed
JustArchi opened this issue Dec 2, 2021 · 22 comments · Fixed by #63431
Assignees
Milestone

Comments

@JustArchi
Copy link
Contributor

JustArchi commented Dec 2, 2021

Description

Hello,

In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process).

This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.

<username>@<hostname>:~/ArchiSteamFarm $ ./ArchiSteamFarm
Bus error

Reproduction Steps

It's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result).

The minimal repro I have right now is cloning my project git clone https://github.com/JustArchiNET/ArchiSteamFarm.git and checking out 876c3324526d0fe6b0a801210b63f663a4eb816c commit. The minimal build instructions I managed to pull it with was:

dotnet publish ArchiSteamFarm -c Release -o out -r linux-arm /p:PublishSingleFile=true --self-contained

Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip

Expected behavior

The app works as previously, initializes properly and executes code.

Actual behavior

The app crashes with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process). This happens before initialization of my app takes place (first line logged to the console), so it's likely something related to decompression in-memory process of the single-file app.

I've asked the user to record COREHOST_TRACE=1, this was the output it recorded before crashing:

Tracing enabled @ Thu Dec  2 10:52:21 2021 GMT
--- Invoked apphost [version: static, commit hash: static] main = {
./ArchiSteamFarm
}
The managed DLL bound to this executable is: 'ArchiSteamFarm.dll'
Detected Single-File app bundle
Using internal fxr
Invoking fx resolver [/home/pi/ArchiS2/] hostfxr_main_bundle_startupinfo
Host path: [/home/pi/ArchiS2/ArchiSteamFarm]
Dotnet path: [/home/pi/ArchiS2/]
App path: [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Bundle Header Offset: [18f0a400]
--- Invoked hostfxr_main_bundle_startupinfo [commit hash: static]
Mapped application bundle
Unmapped application bundle
Single-File bundle details:
DepsJson Offset:[1d938] Size[61fa2b8]
RuntimeConfigJson Offset:[2b0] Size[75b0f0]
.net core 3 compatibility mode: [No]
--- Executing in a native executable mode...
Using dotnet root path [/home/pi/ArchiS2/]
App runtimeconfig.json from [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Runtime config is cfg=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json dev=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Attempting to read runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json
Attempting to read dev runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Mapped bundle for [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Unmapped application bundle
Runtime config [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json] is valid=[1]
Executing as a self-contained app as per config file [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Using internal hostpolicy
Reading from host interface version: [0x16041101:124] to initialize policy version: [0x16041101:124]
Mapped application bundle

Sadly not very informative to me.

Regression?

Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. linux-arm64, linux-x64, win-x64), it's also not reproducible even in all linux-arm setups, I didn't receive such error from other users, and we've tried to reproduce it ourselves.

According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar.

Known Workarounds

I'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was PublishSingleFile=false.

Right now I'm testing with the user if IncludeNativeLibrariesForSelfExtract=true or IncludeAllContentForSelfExtract=true helps with this issue.

Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to suggest for users dealing with this issue in our linux-arm builds while this issue is investigated.

Configuration

Host machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+)

Last working (tested) runtime: .NET 5.0.11.
First not-working (tested) runtime: .NET 6.0.0

The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine.

Other information

Please let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) linux-arm OS and therefore gather more info required to fix the problem.

I'm trying to actively work with the user to provide more info in regards to this, you can find our conversation here: JustArchiNET/ArchiSteamFarm#2457

Thank you in advance for your interest in regards to this issue.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-Host untriaged New issue has not been triaged by the area owner labels Dec 2, 2021
@ghost
Copy link

ghost commented Dec 2, 2021

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Hello,

In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process).

This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.

<username>@<hostname>:~/ArchiSteamFarm $ ./ArchiSteamFarm
Bus error

Reproduction Steps

It's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result).

The minimal repro I have right now is cloning my project git clone https://github.com/JustArchiNET/ArchiSteamFarm.git and checking out 876c3324526d0fe6b0a801210b63f663a4eb816c commit. The minimal build instructions I managed to pull it with was:

dotnet publish ArchiSteamFarm -c Release -o out -r linux-arm /p:PublishSingleFile=true --self-contained

Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip

Expected behavior

The app works as previously, initializes properly and executes code.

Actual behavior

The app crashes with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process).

I've asked the user to record COREHOST_TRACE=1, this was the output it recorded before crashing:

Tracing enabled @ Thu Dec  2 10:52:21 2021 GMT
--- Invoked apphost [version: static, commit hash: static] main = {
./ArchiSteamFarm
}
The managed DLL bound to this executable is: 'ArchiSteamFarm.dll'
Detected Single-File app bundle
Using internal fxr
Invoking fx resolver [/home/pi/ArchiS2/] hostfxr_main_bundle_startupinfo
Host path: [/home/pi/ArchiS2/ArchiSteamFarm]
Dotnet path: [/home/pi/ArchiS2/]
App path: [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Bundle Header Offset: [18f0a400]
--- Invoked hostfxr_main_bundle_startupinfo [commit hash: static]
Mapped application bundle
Unmapped application bundle
Single-File bundle details:
DepsJson Offset:[1d938] Size[61fa2b8]
RuntimeConfigJson Offset:[2b0] Size[75b0f0]
.net core 3 compatibility mode: [No]
--- Executing in a native executable mode...
Using dotnet root path [/home/pi/ArchiS2/]
App runtimeconfig.json from [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Runtime config is cfg=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json dev=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Attempting to read runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json
Attempting to read dev runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Mapped bundle for [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Unmapped application bundle
Runtime config [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json] is valid=[1]
Executing as a self-contained app as per config file [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Using internal hostpolicy
Reading from host interface version: [0x16041101:124] to initialize policy version: [0x16041101:124]
Mapped application bundle

Sadly not very informative to me.

Regression?

Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. linux-arm64, linux-x64, windows-x64), it's also not reproducible even in all linux-arm setups, I didn't receive such error from other users.

According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar.

Known Workarounds

I'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was PublishSingleFile=false.

Right now I'm testing with the user if IncludeNativeLibrariesForSelfExtract=true or IncludeAllContentForSelfExtract=true helps with this issue.

Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to enable for our linux-arm builds while this issue is investigated.

Configuration

Host machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+)

Last working (tested) runtime: .NET 5.0.11.
First not-working (tested) runtime: .NET 6.0.0

The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine.

Other information

Please let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) OS and therefore narrow it down.

I'm trying to actively work with the user to provide more info in regards to this, you can find out conversation here: JustArchiNET/ArchiSteamFarm#2457

Thank you in advance for your interest in regards to this issue.

Author: JustArchi
Assignees: -
Labels:

area-Host, untriaged

Milestone: -

@agocke agocke added area-Single-File and removed area-Host untriaged New issue has not been triaged by the area owner labels Dec 3, 2021
@ghost
Copy link

ghost commented Dec 3, 2021

Tagging subscribers to this area: @agocke, @vitek-karas, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Hello,

In original issue JustArchiNET/ArchiSteamFarm#2457 I'm dealing with a regression that caused single-file publised app crash during initialization with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process).

This issue did not happen with .NET 5.0 runtime, therefore I classify it as a regression.

<username>@<hostname>:~/ArchiSteamFarm $ ./ArchiSteamFarm
Bus error

Reproduction Steps

It's very hard for me to give reproduction steps as I'm unable to reproduce this myself. The issue is specific to one user (albeit he claims that he has tried at least 2 different machines and got the same result).

The minimal repro I have right now is cloning my project git clone https://github.com/JustArchiNET/ArchiSteamFarm.git and checking out 876c3324526d0fe6b0a801210b63f663a4eb816c commit. The minimal build instructions I managed to pull it with was:

dotnet publish ArchiSteamFarm -c Release -o out -r linux-arm /p:PublishSingleFile=true --self-contained

Precompiled build is also available for download: https://github.com/JustArchiNET/ArchiSteamFarm/releases/download/5.2.0.9/ASF-linux-arm.zip

Expected behavior

The app works as previously, initializes properly and executes code.

Actual behavior

The app crashes with Bus error (so to the best of my knowledge kernel sending SIGBUS to the process). This happens before initialization of my app takes place (first line logged to the console), so it's likely something related to decompression in-memory process of the single-file app.

I've asked the user to record COREHOST_TRACE=1, this was the output it recorded before crashing:

Tracing enabled @ Thu Dec  2 10:52:21 2021 GMT
--- Invoked apphost [version: static, commit hash: static] main = {
./ArchiSteamFarm
}
The managed DLL bound to this executable is: 'ArchiSteamFarm.dll'
Detected Single-File app bundle
Using internal fxr
Invoking fx resolver [/home/pi/ArchiS2/] hostfxr_main_bundle_startupinfo
Host path: [/home/pi/ArchiS2/ArchiSteamFarm]
Dotnet path: [/home/pi/ArchiS2/]
App path: [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Bundle Header Offset: [18f0a400]
--- Invoked hostfxr_main_bundle_startupinfo [commit hash: static]
Mapped application bundle
Unmapped application bundle
Single-File bundle details:
DepsJson Offset:[1d938] Size[61fa2b8]
RuntimeConfigJson Offset:[2b0] Size[75b0f0]
.net core 3 compatibility mode: [No]
--- Executing in a native executable mode...
Using dotnet root path [/home/pi/ArchiS2/]
App runtimeconfig.json from [/home/pi/ArchiS2/ArchiSteamFarm.dll]
Runtime config is cfg=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json dev=/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Attempting to read runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json
Attempting to read dev runtime config: /home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.dev.json
Mapped bundle for [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Unmapped application bundle
Runtime config [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json] is valid=[1]
Executing as a self-contained app as per config file [/home/pi/ArchiS2/ArchiSteamFarm.runtimeconfig.json]
Using internal hostpolicy
Reading from host interface version: [0x16041101:124] to initialize policy version: [0x16041101:124]
Mapped application bundle

Sadly not very informative to me.

Regression?

Yes, single-file publish of this particular app worked fine in .NET 5.0. Single-file publish also works fine in .NET 6.0 for all other OS targets (e.g. linux-arm64, linux-x64, win-x64), it's also not reproducible even in all linux-arm setups, I didn't receive such error from other users, and we've tried to reproduce it ourselves.

According to the user this happens on 2 different machines (albeit similar), this decreases the chance of some kind of hardware malfunction or similar.

Known Workarounds

I'd be very happy if you could suggest any. I'm trying various things that come to my mind in original issue at JustArchiNET/ArchiSteamFarm#2457 and the only thing that actually made it work (at least for now) was PublishSingleFile=false.

Right now I'm testing with the user if IncludeNativeLibrariesForSelfExtract=true or IncludeAllContentForSelfExtract=true helps with this issue.

Is there any way to force through environment variable old-style method of self-extraction single-file published app? The one that doesn't involve switches during compilation, if that worked it'd be decent enough workaround for me to suggest for users dealing with this issue in our linux-arm builds while this issue is investigated.

Configuration

Host machine: Raspberry Pi linux-arm (raspbian.10-arm, kernel 5.10.63-v8+)

Last working (tested) runtime: .NET 5.0.11.
First not-working (tested) runtime: .NET 6.0.0

The issue is specific to that configuration, I could not reproduce this on my linux-arm64 Raspberry Pi 4 machine.

Other information

Please let me know what else I can provide/do to help narrow this one down. I had no luck reproducing it myself on any of my machines, but I strongly believe this is a regression in regards to .NET 5.0. Perhaps one of you will be able to reproduce this problem by running my app on Raspberry Pi (Raspbian) linux-arm OS and therefore gather more info required to fix the problem.

I'm trying to actively work with the user to provide more info in regards to this, you can find our conversation here: JustArchiNET/ArchiSteamFarm#2457

Thank you in advance for your interest in regards to this issue.

Author: JustArchi
Assignees: -
Labels:

area-Single-File

Milestone: -

@agocke agocke added this to the 7.0.0 milestone Dec 3, 2021
@agocke
Copy link
Member

agocke commented Dec 3, 2021

@VSadov Could you take a look?

@VSadov
Copy link
Member

VSadov commented Dec 4, 2021

Since this is specific to a particular environment, it could be hard to reproduce. I wonder what additional diagnostics we can get from the user.

Perhaps a core dump could be helpful, if available?

@JustArchi
Copy link
Contributor Author

I'll try to work with the user to provide core dump if possible, albeit one wasn't generated on the signal alone it seems, thanks for a tip.

@JustArchi
Copy link
Contributor Author

JustArchi commented Dec 29, 2021

@VSadov it took some time but I got the core dump from the user.

ArchiSteamFarm-7-0-0-24158-1640748878.dump.zip

It was generated with %e-%s-%u-%g-%p-%t.dump pattern - 7 proves it's SIGBUS signal killing the process.

I don't know how useful that dump is as I'm not skilled enough to analyze it, but I hope it'll aid debugging and hint potential cause of this issue. Thank you in advance for looking into it.

@VSadov
Copy link
Member

VSadov commented Jan 3, 2022

So far the dump was not very helpful.

Generic Unix Version 0 UP Free ARM (NT) Thumb-2
Machine Name:
System Uptime: not available
Process Uptime: not available
.............
(5e5e.5e5e): Signal SIGBUS (Bus error) code BUS_ADRALN (Invalid address alignment) at 0xf75dae5b*** WARNING: Unable to verify timestamp for ArchiSteamFarm
ArchiSteamFarm+0x1a10a8:
009340a8 ??       ???

The only new information is that the crash could be caused by accessing a misaligned memory location. I am not sure how singlefile packaging could cause that and if it does, why the problem is not more common.

@VSadov
Copy link
Member

VSadov commented Jan 3, 2022

@janvorli - Jan, maybe you have some ideas how to make progress on this?

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

Looking at the dump on my RPI4, it is really a misaligned access:

Program terminated with signal SIGBUS, Bus error.
#0  0x009340a8 in ?? ()
(gdb) bt
#0  0x009340a8 in ?? ()
#1  0x0093409c in ?? ()

(gdb) disassemble 0x009340a8,+4
Dump of assembler code from 0x9340a8 to 0x9340ac:
=> 0x009340a8:  ldrd    r11, r0, [r0]
End of assembler dump.

(gdb) p/x $r0
$1 = 0xf75dae5b

ldrd instruction requires addresses aligned to 8 bytes.

Unaligned access handling can be set on Linux as described in https://mjmwired.net/kernel/Documentation/arm/mem_alignment. There are three options - the kernel handles it, but prints a warning message, the kernel handles it silently or the kernel generates SIGBUS.

I was able to easily repro the crash after issuing this command on my RPI4 (without any docker container)

echo 4 | sudo tee /proc/cpu/alignment

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

Here is the stack trace at the crash:

(lldb) bt
* thread #1, name = 'ArchiSteamFarm', stop reason = signal SIGBUS: illegal alignment
  * frame #0: 0x005a10a8 ArchiSteamFarm`bundle::file_entry_t::read(reader=0xbeffef38, bundle_major_version=6, force_extraction=false) at file_entry.cpp:22:25
    frame #1: 0x005a1236 ArchiSteamFarm`bundle::manifest_t::read(reader=0xbeffef38, header=0x00a462b8) at manifest.cpp:14:30
    frame #2: 0x005a15ae ArchiSteamFarm`bundle::runner_t::extract(this=0x00a46270) at runner.cpp:32:22
    frame #3: 0x0059cd3e ArchiSteamFarm`corehost_main_init(hostpolicy_init_t&, int, char const**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [inlined] bundle::runner_t::process_manifest_and_extract() at runner.h:42:35
    frame #4: 0x0059cd38 ArchiSteamFarm`corehost_main_init(hostpolicy_init=0x00a46118, argc=1, argv=0xbefff5e4, location="corehost_main") at hostpolicy.cpp:394
    frame #5: 0x0059cecc ArchiSteamFarm`::corehost_main(argc=1, argv=0xbefff5e4) at hostpolicy.cpp:416:14
    frame #6: 0x0058724c ArchiSteamFarm`fx_muxer_t::handle_exec_host_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, host_startup_info_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<known_options, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) at fx_muxer.cpp:146:20
    frame #7: 0x00587198 ArchiSteamFarm`fx_muxer_t::handle_exec_host_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, host_startup_info_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<known_options, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) [inlined] (anonymous namespace)::read_config_and_execute(host_command=<unavailable>, host_info=<unavailable>, app_candidate=""..., opts=0xb6f87868, new_argc=1, new_argv=0xbefff5e4, mode=<unavailable>, is_sdk_command=<unavailable>, out_buffer=<unavailable>, buffer_size=<unavailable>, required_buffer_size=<unavailable>) at fx_muxer.cpp:533
    frame #8: 0x00586fc0 ArchiSteamFarm`fx_muxer_t::handle_exec_host_command(host_command=<unavailable>, host_info=<unavailable>, app_candidate=""..., opts=0xb6f87868, argc=1, argv=0xbefff5e4, argoff=1, mode=apphost, is_sdk_command=<unavailable>, result_buffer=0x00000000, buffer_size=0, required_buffer_size=0x00000000) at fx_muxer.cpp:1018
    frame #9: 0x0058672a ArchiSteamFarm`fx_muxer_t::execute(host_command="", argc=1, argv=0xbefff5e4, host_info=0xbefff2f8, result_buffer=0x00000000, buffer_size=0, required_buffer_size=0x00000000) at fx_muxer.cpp:579:18
    frame #10: 0x005844ae ArchiSteamFarm`::hostfxr_main_bundle_startupinfo(argc=1, argv=0xbefff5e4, host_path=<unavailable>, dotnet_root=0x00a7db08, app_path=0x00a7db30, bundle_header_offset=55914502) at hostfxr.cpp:46:12
    frame #11: 0x005b42c4 ArchiSteamFarm`exe_start(argc=1, argv=0xbefff5e4) at corehost.cpp:207:18
    frame #12: 0x005b44ec ArchiSteamFarm`main(argc=1, argv=0xbefff5e4) at corehost.cpp:301:21
    frame #13: 0xb6bfa718 libc.so.6`__libc_start_main(main=0xbefff5e4, argc=-1227698176, argv=0xb6bfa718, init=<unavailable>, fini=(ArchiSteamFarm`__libc_csu_fini + 1), rtld_fini=(ld-2.28.so`_dl_fini at dl-fini.c:50:20), stack_end=0xbefff5e4) at libc-start.c:308:16
    frame #14: 0x00581034 ArchiSteamFarm`_start + 52

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

And here the code of the crashing method:

(lldb) disass
ArchiSteamFarm`bundle::file_entry_t::read:
    0x5a1080 <+0>:   push   {r4, r5, r6, r7, lr}
    0x5a1082 <+2>:   add    r7, sp, #0xc
    0x5a1084 <+4>:   push.w {r8, r9, r10, r11}
    0x5a1088 <+8>:   sub    sp, #0xc
    0x5a108a <+10>:  str    r3, [sp, #0x4]
    0x5a108c <+12>:  mov    r6, r2
    0x5a108e <+14>:  mov    r4, r0
    0x5a1090 <+16>:  mov    r0, r1
    0x5a1092 <+18>:  movs   r2, #0x8
    0x5a1094 <+20>:  movs   r3, #0x0
    0x5a1096 <+22>:  mov    r5, r1
    0x5a1098 <+24>:  bl     0x5af044                  ; bundle::reader_t::bounds_check at reader.cpp:39
    0x5a109c <+28>:  ldr    r0, [r5, #0x4]
    0x5a109e <+30>:  movs   r2, #0x8
    0x5a10a0 <+32>:  movs   r3, #0x0
    0x5a10a2 <+34>:  add.w  r1, r0, #0x8
    0x5a10a6 <+38>:  str    r1, [r5, #0x4]
->  0x5a10a8 <+40>:  ldrd   r11, r0, [r0]
    0x5a10ac <+44>:  str    r0, [sp, #0x8]
    0x5a10ae <+46>:  mov    r0, r5
    0x5a10b0 <+48>:  bl     0x5af044                  ; bundle::reader_t::bounds_check at reader.cpp:39
    0x5a10b4 <+52>:  ldr    r0, [r5, #0x4]
    0x5a10b6 <+54>:  cmp    r6, #0x6
    0x5a10b8 <+56>:  add.w  r1, r0, #0x8
    0x5a10bc <+60>:  str    r1, [r5, #0x4]
    0x5a10be <+62>:  ldrd   r8, r9, [r0]
    0x5a10c2 <+66>:  blo    0x5a10dc                  ; <+92> at file_entry.cpp:26:33
    0x5a10c4 <+68>:  mov    r0, r5
    0x5a10c6 <+70>:  movs   r2, #0x8
    0x5a10c8 <+72>:  movs   r3, #0x0
    0x5a10ca <+74>:  bl     0x5af044                  ; bundle::reader_t::bounds_check at reader.cpp:39
    0x5a10ce <+78>:  ldr    r0, [r5, #0x4]
    0x5a10d0 <+80>:  add.w  r1, r0, #0x8
    0x5a10d4 <+84>:  str    r1, [r5, #0x4]
    0x5a10d6 <+86>:  ldrd   r6, r10, [r0]
    0x5a10da <+90>:  b      0x5a10e2                  ; <+98> [inlined] bundle::reader_t::read_direct(long long) at file_entry.cpp:30
    0x5a10dc <+92>:  movs   r6, #0x0
    0x5a10de <+94>:  mov.w  r10, #0x0
    0x5a10e2 <+98>:  mov    r0, r5
    0x5a10e4 <+100>: movs   r2, #0x1
    0x5a10e6 <+102>: movs   r3, #0x0
    0x5a10e8 <+104>: bl     0x5af044                  ; bundle::reader_t::bounds_check at reader.cpp:39
    0x5a10ec <+108>: ldr    r0, [r5, #0x4]
    0x5a10ee <+110>: ldr    r2, [sp, #0x8]
    0x5a10f0 <+112>: adds   r1, r0, #0x1
    0x5a10f2 <+114>: str    r1, [r5, #0x4]
    0x5a10f4 <+116>: ldr    r1, [sp, #0x4]
    0x5a10f6 <+118>: ldrb   r0, [r0]
    0x5a10f8 <+120>: strb.w r1, [r4, #0x35]
    0x5a10fc <+124>: adds   r1, r4, #0x4
    0x5a10fe <+126>: str.w  r11, [r4]
    0x5a1102 <+130>: stm.w  r1, {r2, r8, r9}
    0x5a1106 <+134>: movs   r1, #0x0
    0x5a1108 <+136>: mov    r8, r4
    0x5a110a <+138>: strd   r6, r10, [r4, #16]
    0x5a110e <+142>: mov    r6, r4
    0x5a1110 <+144>: strb.w r1, [r4, #0x34]
    0x5a1114 <+148>: str    r1, [r4, #0x20]
    0x5a1116 <+150>: strb   r0, [r4, #0x18]
    0x5a1118 <+152>: strb   r1, [r8, #36]!
    0x5a111c <+156>: subs.w r1, r11, #0x1
    0x5a1120 <+160>: str    r8, [r6, #28]!
    0x5a1124 <+164>: sbcs   r1, r2, #0x0
    0x5a1128 <+168>: blt    0x5a114a                  ; <+202> at file_entry.cpp:36:9
    0x5a112a <+170>: orrs.w r1, r10, r9
    0x5a112e <+174>: bmi    0x5a114a                  ; <+202> at file_entry.cpp:36:9
    0x5a1130 <+176>: cmp    r0, #0x6
    0x5a1132 <+178>: bhs    0x5a114a                  ; <+202> at file_entry.cpp:36:9
    0x5a1134 <+180>: mov    r0, r5
    0x5a1136 <+182>: mov    r1, r6
    0x5a1138 <+184>: bl     0x5af13c                  ; bundle::reader_t::read_path_string at reader.cpp:91
    0x5a113c <+188>: mov    r0, r6
    0x5a113e <+190>: bl     0x5a0324                  ; bundle::dir_utils_t::fixup_path_separator at dir_utils.cpp:98:1
    0x5a1142 <+194>: add    sp, #0xc
    0x5a1144 <+196>: pop.w  {r8, r9, r10, r11}
    0x5a1148 <+200>: pop    {r4, r5, r6, r7, pc}
    0x5a114a <+202>: ldr    r0, [pc, #0x3c]           ; <+264> at new_allocator.h
    0x5a114c <+204>: add    r0, pc
    0x5a114e <+206>: bl     0x5af5a4                  ; trace::error at trace.cpp:121
    0x5a1152 <+210>: ldr    r0, [pc, #0x38]           ; <+268> at new_allocator.h
    0x5a1154 <+212>: add    r0, pc
    0x5a1156 <+214>: bl     0x5af5a4                  ; trace::error at trace.cpp:121
    0x5a115a <+218>: movs   r0, #0x4
    0x5a115c <+220>: blx    0xa1cb90                  ; symbol stub for: __cxa_allocate_exception
    0x5a1160 <+224>: ldr    r1, [pc, #0x2c]           ; <+272> at new_allocator.h
    0x5a1162 <+226>: movw   r2, #0x809f
    0x5a1166 <+230>: movt   r2, #0x8000
    0x5a116a <+234>: add    r1, pc
    0x5a116c <+236>: str    r2, [r0]
    0x5a116e <+238>: movs   r2, #0x0
    0x5a1170 <+240>: blx    0xa1cba0                  ; symbol stub for: __cxa_throw
    0x5a1174 <+244>: mov    r5, r0
    0x5a1176 <+246>: ldr    r0, [r4, #0x1c]
    0x5a1178 <+248>: cmp    r0, r8
    0x5a117a <+250>: it     ne
    0x5a117c <+252>: blne   0x7bbfd0                  ; operator delete at clrhost_nodependencies.cpp:393
    0x5a1180 <+256>: mov    r0, r5
    0x5a1182 <+258>: blx    0xa1c800                  ; symbol stub for: _Unwind_Resume
    0x5a1186 <+262>: nop
    0x5a1188 <+264>: .long  0xffeed6b0                ; unknown opcode
    0x5a118c <+268>: .long  0xffedd62e                ; unknown opcode
    0x5a1190 <+272>: strheq lr, [r7], #-22

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

I guess the issue is caused by this code:

fixed_data.offset = *(int64_t*)reader.read_direct(sizeof(int64_t));
fixed_data.size = *(int64_t*)reader.read_direct(sizeof(int64_t));

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

@JustArchi the workaround for the issue until it is fixed is to execute the following on the affected devices:

echo 2 | sudo tee /proc/cpu/alignment

This makes the kernel handle the unaligned accesses and make apps work fine (only a tiny bit slower due to the trap to kernel on each unaligned access ).

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

@VSadov these reads need to be replaced by something like the GET_UNALIGNED_64 functions we have in coreclr:

inline UINT64 GET_UNALIGNED_64(const void *pObject)
{
UINT64 temp;
memcpy(&temp, pObject, sizeof(temp));
return temp;
}

@VSadov
Copy link
Member

VSadov commented Jan 5, 2022

Right. I just wonder why we did not see this issue before. Is it more common to configure kernels to handle misaligned reads (vs. SIGBUS)? Is it also the case on Apple M1?

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

By default, Linux has alignment set to let kernel handle misaligned accesses gracefully. So it only occurs when the setting differs from the default, which I guess is rare.
Also, this is arm32 only, arm64 doesn't have such a way to set the misalignment handling. So my guess is that is always handled gracefully there.

@VSadov
Copy link
Member

VSadov commented Jan 5, 2022

I see. We used to read the entire file_entry_fixed_t here prior to 6.0, which I guess worked because it is a struct and compiler would do memcpy. When compression was introduced, the header got versioned fields, and no longer can be read all-at-once. Switching to element-wise reading introduced a regression.

I will make a fix. It looks like it may need to be ported to 6.0 as well.

Thanks for getting to the bottom of this!!!

@JustArchi
Copy link
Contributor Author

@janvorli @VSadov thank you a lot guys for getting to the bottom of this, and double thanks for valid workaround @janvorli. I appreciate a lot your time put into this, as I was unable to gather more info myself. Everything you've said so far makes complete sense to me - as I said, I was unable to reproduce it myself, including help of other users, so chance is that it's indeed a bit rare, but still valid, machine environment configuration of affected user that causes that. I'm not sure why that machine differs from all other Raspberry setups, perhaps user is running some other software that affects that CPU alignment setting or something like that.

In any case I'm very happy that we've managed to tackle this one down. Thanks again, you're awesome 🎉.

@janvorli
Copy link
Member

janvorli commented Jan 5, 2022

You are welcome!

@JustArchi
Copy link
Contributor Author

JustArchi commented Jan 5, 2022

As a side note, based on my further discussion with the affected user, it seems that he has installed arm32 libs on arm64 OS and attempted to run a project built for linux-arm instead of, as he should, just running linux-arm64 one.

That could explain that weird value of 4 or 5 in CPU alignment. As you said yourself, this feature isn't available in arm64 machines, so perhaps it's just fixed to always raise SIGBUS in this case when running 32-bit code.

He was also apparently running some kind of "modified kernel" which was arm64... with arm32 OS... Don't ask me, but the setup is so awkward that I could totally believe that somewhere deep the rabbit hole CPU alignment either didn't exist to begin with due to arm64 kernel executing 32-bit code, or it was disabled along the way.

If you want to confirm or deny the above, it should be enough to install all .NET prereqs such as libc6:armhf and friends, and then trying to execute single-file published app built for linux-arm, on aarch64 kernel. But I don't know what user did exactly to achieve that setup, and I'm afraid to ask 😂.

In any case, I was just wondering why this CPU alignment value doesn't follow the default for OS - this seems to be the reason. The root cause for the issue remains the same, but this hopefully answers why nobody ran into this before, as @VSadov was wondering.

obraz

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 6, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 7, 2022
@VSadov
Copy link
Member

VSadov commented Jan 7, 2022

I'll keep this open to track porting to 6.0

@VSadov VSadov reopened this Jan 7, 2022
@VSadov VSadov modified the milestones: 7.0.0, 6.0.x Jan 7, 2022
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Jan 7, 2022
@VSadov
Copy link
Member

VSadov commented Feb 15, 2022

Fixed in #63519

@VSadov VSadov closed this as completed Feb 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants