Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime exception on android #9510

Closed
meold opened this issue Jan 5, 2018 · 24 comments
Closed

Runtime exception on android #9510

meold opened this issue Jan 5, 2018 · 24 comments
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Milestone

Comments

@meold
Copy link

meold commented Jan 5, 2018

Hi,
I'm facing up with a strange behavior of the compiled HelloWorld app for android coreclr.
So, first of all, the cross building of the coreclr and corefx had a bunch of issues, but at the end it was built successfully.
So after that I've compiled a very simple app with a single "Console.WriteLine("Hello World");" statement in the Main. Than I prepared necessary files in the dist/helloworld folder (as described in this tutorial) and pushed it with adb tool to android device.

So, when I run
coreuser@Ubuntu16D:~/git/coredroid/apps/helloworld$ adb shell LD_LIBRARY_PATH=/data/local/tmp/coredroid/ /data/local/tmp/coredroid/corerun /data/local/tmp/coredroid/helloworld.dll

I got a known error "Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support." with the next output:

WARNING: linker: /data/local/tmp/coredroid/libcoreclr.so: unused DT entry: type 0xf arg 0x10e2 WARNING: linker: /data/local/tmp/coredroid/libclrjit.so: unused DT entry: type 0xf arg 0xa79 FailFast: Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support. at System.Environment.FailFast(System.String) at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode() at System.Globalization.GlobalizationMode..cctor() at System.Globalization.GlobalizationMode.get_Invariant() at System.Globalization.CultureData.CreateCultureWithInvariantData() at System.Globalization.CultureData.get_Invariant() at System.Globalization.CultureData.GetCultureData(System.String, Boolean) at System.Globalization.CultureInfo.InitializeFromName(System.String, Boolean) at System.Globalization.CultureInfo..ctor(System.String, Boolean) at System.Globalization.CultureInfo.Init() at System.Globalization.CultureInfo..cctor() at System.Globalization.CultureInfo.get_InvariantCulture() at System.IO.TextWriter+NullTextWriter..ctor() at System.IO.TextWriter..cctor() at System.IO.StreamWriter..ctor(System.IO.Stream, System.Text.Encoding, Int32, Boolean) at System.Console.CreateOutputWriter(System.IO.Stream) at System.Threading.LazyInitializer.EnsureInitializedCore[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef, System.Object ByRef, System.Func1<System.__Canon>) at System.Threading.LazyInitializer.EnsureInitialized[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](System.__Canon ByRef, System.Object ByRef, System.Func1<System.__Canon>) at System.Console.WriteLine(System.String) at ConsoleApp.Program.Main(System.String[]) Aborted

Then I've tried to add CORECLR_GLOBAL_INVARIANT=1 env variable to the previous command, but got another runtime error:

WARNING: linker: /data/local/tmp/coredroid/libcoreclr.so: unused DT entry: type 0xf arg 0x10e2 WARNING: linker: /data/local/tmp/coredroid/libclrjit.so: unused DT entry: type 0xf arg 0xa79 Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.StreamWriter..ctor(Stream stream, Encoding encoding, Int32 bufferSize, Boolean leaveOpen) at System.Console.CreateOutputWriter(Stream outputStream) at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Object& syncLock, Func1 valueFactory) at System.Threading.LazyInitializer.EnsureInitialized[T](T& target, Object& syncLock, Func1 valueFactory) at System.Console.WriteLine(String value) at ConsoleApp.Program.Main(String[] args) in /home/padre/git/coredroid/apps/helloworld/Program.cs:line 9 Aborted

So I'm wondering how I can to localize this error - is it something wrong with cross-building of the coreclr or some other issue?
Any suggestions would very helpful.
Thanks.

@jkotas
Copy link
Member

jkotas commented Jan 5, 2018

The NullReferenceException looks like a low-level bug, potentially a JIT bug. To start investigating it, run it under lldb or gdb. The debugger should stop on segfault (that gets translated into the NullReferenceException later). Disassemble the code around the segfault to see what it is segfaulting on exactly or where things might have gone wrong.

@meold
Copy link
Author

meold commented Jan 5, 2018

Hi,
I've tried to debug with lldb and got the next output:

(lldb) platform connect connect://localhost:1234
Platform: remote-android
Triple: aarch64-*-linux-android
OS Version: 23.0.0 (3.18.22-user-00151-g1af75a9-cI8e7db70)
Kernel: dotnet/coreclr#1 SMP PREEMPT Wed Jun 28 23:22:09 CST 2017
Hostname: localhost
Connected: yes
WorkingDir: /data/local/tmp/coredroid
(lldb) target create /data/local/tmp/coredroid/corerun
Current executable set to '/data/local/tmp/coredroid/corerun' (aarch64).
(lldb) env LD_LIBRARY_PATH=/data/local/tmp/coredroid CORECLR_GLOBAL_INVARIANT=1
(lldb) run /data/local/tmp/coredroid/helloworld.dll
Process 12247 launched: '/home/padre/.lldb/module_cache/remote-android/.cache/509B0993-FE0F-F335-749F-C8A0E9C8A0F1-2B255599/corerun' (aarch64)
(lldb) WARNING: linker: /data/local/tmp/coredroid/libcoreclr.so: unused DT entry: type 0xf arg 0x10e2
WARNING: linker: /data/local/tmp/coredroid/libclrjit.so: unused DT entry: type 0xf arg 0xa79
Process 12247 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSEGV: invalid address (fault address: 0x15)
frame #0: 0x0000007f3d4d5564
-> 0x7f3d4d5564: ldr x9, [x20]
0x7f3d4d5568: ldr x9, [x9, #0x50]
0x7f3d4d556c: ldr x9, [x9]
0x7f3d4d5570: blr x9
process continue
Process 12247 resuming
(lldb)
Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
at System.IO.StreamWriter..ctor(Stream stream, Encoding encoding, Int32 bufferSize, Boolean leaveOpen)
at System.Console.CreateOutputWriter(Stream outputStream)
at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Object& syncLock, Func'1 valueFactory)
at System.Threading.LazyInitializer.EnsureInitialized[T](T& target, Object& syncLock, Func'1 valueFactory)
at System.Console.WriteLine(String value)
at ConsoleApp.Program.Main(String[] args) in /home/padre/git/coredroid/apps/helloworld/Program.cs:line 9
Process 12247 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGABRT
frame #0: 0x0000007fb7ecda44 libc.so'tgkill + 8
libc.so'tgkill:
-> 0x7fb7ecda44 <+8>: cmn x0, #0x1, lsl dotnet/coreclr#12 ; =0x1000
0x7fb7ecda48 <+12>: cneg x0, x0, hi
0x7fb7ecda4c <+16>: b.hi 0x7fb7e8721c ; __set_errno_internal
0x7fb7ecda50 <+20>: ret
process continue
Process 12247 resuming
(lldb) Process 12247 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGABRT
frame #0: 0x0000007fb7ecda44 libc.so'tgkill + 8
libc.so'tgkill:
-> 0x7fb7ecda44 <+8>: cmn x0, #0x1, lsl dotnet/coreclr#12 ; =0x1000
0x7fb7ecda48 <+12>: cneg x0, x0, hi
0x7fb7ecda4c <+16>: b.hi 0x7fb7e8721c ; __set_errno_internal
0x7fb7ecda50 <+20>: ret

Any suggestions would very helpful.
Thanks.

@jkotas
Copy link
Member

jkotas commented Jan 6, 2018

You should disassemble say 30 instructions before this one -> 0x7f3d4d5564: ldr x9, [x20]. Then we can correlate the disassembly with the StreamWriter..ctor code (https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Extensions/src/System/IO/StreamWriter.cs#L98)

@meold
Copy link
Author

meold commented Jan 9, 2018

@jkotas, I finally figure out how to manage with lldb and got some disassembled code in attachment. Unfortunately, I'm not so good in asm, so I'm not fully understand what's going on there.
So, what are the next steps I should perform?
Debug.txt

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

The problem is that the stream argument is bogus - it should be a valid pointer and it is 0x15 instead. The code is crashing when trying to call stream.CanWrite. I am including the disassembly fragment with my comments. You would need to do some low level debugging to figure this out.

You can try to to set COMPlus_JitBreak environment variable to break in the JIT at different spots that leads to the crash to see where the corrupted register value is coming from. System.IO.StreamWriter..ctor would be a good one to start with to see whether the stream argument is corrupted on entry to the method. https://github.com/dotnet/coreclr/issues/8927#issuecomment-276505128 is an example of investigation like this.

`    0x7f3d4f54d8: stp    x29, x30, [sp, #-0x40]!                                                                                                `
`    0x7f3d4f54dc: stp    x19, x20, [sp, #0x18]                                                                                                  `
`    0x7f3d4f54e0: stp    x21, x22, [sp, #0x28]                                                                                                  `
`    0x7f3d4f54e4: str    x23, [sp, #0x38]                                                                                                       `
`    0x7f3d4f54e8: mov    x29, sp                                                                                                                `
`    0x7f3d4f54ec: mov    x19, x0 // x19=this                                                                                                           `
`    0x7f3d4f54f0: mov    x20, x1 // x20=stream
`    0x7f3d4f54f4: mov    x21, x2 // x21=encoding                                                                                                      `
`    0x7f3d4f54f8: mov    w22, w3 // w22=bufferSize                                                                                             `
`    0x7f3d4f54fc: mov    w23, w4 // w23=leaveOpen                                                                                             `

`    0x7f3d4f5500: mov    x0, #0x95a0                                                                                                            `
`    0x7f3d4f5504: movk   x0, #0x3d10, lsl dotnet/coreclr#16                                                                                                   `
`    0x7f3d4f5508: movk   x0, #0x7f, lsl dotnet/coreclr#32                                                                                                     `
`    0x7f3d4f550c: mov    w1, #0x38                                                                                                              `
`    0x7f3d4f5510: bl     0x7f3d4f17b0

`    0x7f3d4f5514: mov    x14, #0xe9d8                                                                                                           `
`    0x7f3d4f5518: movk   x14, #0x350a, lsl dotnet/coreclr#16                                                                                                  ``    0x7f3d4f551c: movk   x14, #0x7f, lsl dotnet/coreclr#32                                                                                                    `    0x7f3d4f5520: ldr    x15, [x14]                                                                                                             `
`    0x7f3d4f5524: add    x14, x19, #0x8            ; =0x8                                                                                       `    0x7f3d4f5528: bl     0x7f3d4f1830
`    0x7f3d4f552c: mov    x14, #0xf250                                                                                                           `
`    0x7f3d4f5530: movk   x14, #0x350a, lsl dotnet/coreclr#16                                                                                                  
`    0x7f3d4f5534: movk   x14, #0x7f, lsl dotnet/coreclr#32                                                                                                    `
`    0x7f3d4f5538: ldr    x15, [x14]                                                                                                             `
`    0x7f3d4f553c: add    x14, x19, #0x10           ; =0x10                                                                                     
`    0x7f3d4f5540: bl     0x7f3d4f1830

`    0x7f3d4f5544: mov    x0, x19                                                                                                                `
`    0x7f3d4f5548: bl     0x7f3d34b908 // base.ctor                                                                                                         `
`    0x7f3d4f554c: str    xzr, [x19, #0x18]                                                                                                      `

`    0x7f3d4f5550: cmp    x20, #0x0 // if (stream == null
`    0x7f3d4f5554: b.eq   0x7f3d4f55b8                                                                                                           `
`    0x7f3d4f5558: cmp    x21, #0x0 // if (encoding == null
`    0x7f3d4f555c: b.eq   0x7f3d4f55b8                                                                                                           `
`    0x7f3d4f5560: mov    x0, x20                                                                                                                `
`->  0x7f3d4f5564: ldr    x9, [x20]
`    0x7f3d4f5568: ldr    x9, [x9, #0x50]                                                                                                        `
`    0x7f3d4f556c: ldr    x9, [x9]                                                                                                               `
`    0x7f3d4f5570: blr    x9        // stream.CanWrite                                                                                                         `
`    0x7f3d4f5574: cmp    w0, #0x0                  ; =0x0                                                                                       `

@RussKeldorph
Copy link
Contributor

@dotnet/arm64-contrib

@sdmaclea
Copy link
Contributor

/cc @qmfrederik

@sdmaclea
Copy link
Contributor

@meold

I think Arm64 for Android is not very different from Arm64 Linux. Arm64 on Linux is usually stable.

I would not start by looking at JIT. I would look at your build process. The tutorial you cite looked incomplete.

It is important to match coreclr and corefx versions so they are compatible. (The tips are not always comaptible.) It can be determined by looking at the dependencies.prop in both projects. Look at the CoreFxCurrentRef and/or CoreClrCurrentRef tags.

I would also start with a coreclr version which passes tests on Linux Arm64.

This coreclr version was stable yesterday 4e6435be
This is the corresponding corefx version 334dd046. (Based on updates to dependencies.prop in both projects).

/cc @janvorli

@qmfrederik
Copy link
Contributor

Yeah, that 'tutorial' contains my notes from about a year ago, so though it may still contain useful information, I don't think it's very accurate at this time.

@meold
Copy link
Author

meold commented Jan 18, 2018

@qmfrederik @jkotas @janvorli, I took coreclr from here and corefx from here. Then I maked all, which described in this fix for coreclr. All builds had completed fine.

After this, I transfered all files to my Android device with adb. And then I had run this command to test my application:
adb shell /data/local/tmp/coredroid/corerun -c /data/local/tmp/coredroid/ /data/local/tmp/coredroid/helloworld.dll

But I have this error: dlopen failed to open the libcoreclr.so with error dlopen failed: library "libgnustl_shared.so" not found
I had run This command and libgnustl_shared.so is in my Android folder:
yura@ubuntu:~/git/coredroid/dist/helloworld$ adb shell ls /data/local/tmp/coredroid/libgnu* /data/local/tmp/coredroid/libgnustl_shared.so

Any suggestions would very helpful.
Thanks.

@qmfrederik
Copy link
Contributor

@meold You need to set the LD_LIBRARY_PATH variable (my old notes proved to be useful, after all 😄 ):

adb shell LD_LIBRARY_PATH=/data/local/tmp/coredroid/ /data/local/tmp/coredroid/corerun /data/local/tmp/coredroid/helloworld.dll

@meold
Copy link
Author

meold commented Jan 18, 2018

@qmfrederik I tried this, but nothing happens about n minutes(very long)...
image

@qmfrederik
Copy link
Contributor

@meold corefx on Android was not particularly fast last time I tried it, but N minutes probably means the code hangs somewhere.

There are a couple of environment variables you can set that help you keep track of what .NET Core is doing, but your best bet probably is to attach a debugger and get a native stack trace. That should give you an idea of where the code is hanging.

I have some (old) notes at https://github.com/qmfrederik/coredroid#getting-ready-to-debug.

Can you try to run your app via the debugger as described there?

@meold
Copy link
Author

meold commented Jan 21, 2018

@qmfrederik, I had attached to debugger and here are results:
image
What else do I need to do to understand the problem?

@qmfrederik
Copy link
Contributor

@meold What you can try is this:

  • Launch your program via debugger (like you've done) and let your program run/hang for a while, until the point you're sure it's hanging (give it one minute or so)
  • Then, hit CTRL+C in lldb. This will cause the program to break
  • You should then get a backtrace which shows which method the program is executing (and which hangs). See https://github.com/dotnet/coreclr/issues/8927#issuecomment-272538788 for an example.

@qmfrederik
Copy link
Contributor

You can also try to set
COMPlus_DumpJittedMethods=1 via env COMPlus_DumpJittedMethods=1, this should output the methods which are being JIT'ed. It'll give you a sense of progress. See https://github.com/dotnet/coreclr/issues/8927#issuecomment-276505128 for some background.

@meold
Copy link
Author

meold commented Jan 21, 2018

@qmfrederik, thanks. I have the following output now:
(lldb) platform connect connect://localhost:1234
Platform: remote-android
Triple: aarch64-*-linux-android
OS Version: 27.0.0 (3.10.73-g33ace82f84b)
Kernel: dotnet/coreclr#1 SMP PREEMPT Fri Oct 13 04:41:33 UTC 2017
Hostname: localhost
Connected: yes
WorkingDir: /data/local/tmp/coredroid
(lldb) target create /data/local/tmp/coredroid/corerun
Current executable set to '/data/local/tmp/coredroid/corerun' (aarch64).
(lldb) env LD_LIBRARY_PATH=/data/local/tmp/coredroid
(lldb) env COMPlus_DumpJittedMethods=1
(lldb) run /data/local/tmp/coredroid/helloworld.dll
Process 32118 launched: '/home/yura/.lldb/module_cache/remote-android/.cache/FF7520EE-23E2-261C-9B95-A23522BB5250-7DC75D2F/corerun' (aarch64)
(lldb) ^CProcess 32118 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSTOP
frame #0: 0x000000558983b374 corerun'atexit
corerun'atexit:
-> 0x558983b374 <+0>: mov x1, x0
0x558983b378 <+4>: adrp x2, 58
0x558983b37c <+8>: adrp x0, 0
0x558983b380 <+12>: add x2, x2, #0x180 ; =0x180

But I do not understand what that means.

@qmfrederik
Copy link
Contributor

@meold Well, I think this is where @jkotas, @sdmaclea or @janvorli take over ;-).
But you can make it easier for them if you copy the console output as text and paste it as text in the GitHub issue, instead of screenshots.

@meold
Copy link
Author

meold commented Jan 21, 2018

@qmfrederik, OK, I copied the console output as text.
@jkotas, @sdmaclea, @janvorli Can you help me, please, with this problem?

@janvorli
Copy link
Member

@meold can you please run bt command in the lldb after you break in?
Also, I don't understand one thing - at the beginning, it seemed that things were running for you including managed code until the code reached a point where it needed the globalization. Then you did something that has completely broken things - all of a sudden you couldn't run it without setting the LD_LIBRARY_PATH and it still hangs if you do.
Could you please explain the difference between the process in which you got the executables in the first and second case?

@meold
Copy link
Author

meold commented Jan 22, 2018

@janvorli, here is result of bt command:
(lldb) run /data/local/tmp/coredroid/helloworld.dll
Process 13989 launched: '/home/yura/.lldb/module_cache/remote-android/.cache/FF7520EE-23E2-261C-9B95-A23522BB5250-7DC75D2F/corerun' (aarch64)
(lldb) ^CProcess 13989 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSTOP
frame #0: 0x0000005555cc0244 corerun'__cxa_atexit + 4
corerun'__cxa_atexit:
-> 0x5555cc0244 <+4>: ldr x17, [x16, #0x170]
0x5555cc0248 <+8>: add x16, x16, #0x170 ; =0x170
0x5555cc024c <+12>: br x17
corerun'::_GLOBAL__sub_I_eh_globals.cc():
0x5555cc0250 <+0>: stp x29, x30, [sp, #-0x20]!
(lldb) bt
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSTOP
* frame #0: 0x0000005555cc0244 corerun'__cxa_atexit + 4
frame dotnet/coreclr#1: 0x0000005555cc0374 corerun'do_arm64_start + 64
frame dotnet/coreclr#2: 0x0000007fb7eb6068 linker64'__dl__start + 8

@janvorli, in the first case I took coreclr and corefx from master branch. But the build process of coreclr was not correct I think(I took libintl.h from musl-dev package). After some time I can't even build corefx from master branch(this described here). So i go to the second case.
In the second case, I took coreclr from here and corefx from here. Then I maked all, which described in this fix for coreclr. All builds had completed fine.

@meold
Copy link
Author

meold commented Jan 22, 2018

@janvorli, and with coreclr and corefx from master branch I have the next output:
(lldb) run /data/local/tmp/coredroid/helloworld.dll
Process 25208 launched: '/home/yura/.lldb/module_cache/remote-android/.cache/FF7520EE-23E2-261C-9B95-A23522BB5250-7DC75D2F/corerun' (aarch64)
(lldb) ^CProcess 25208 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSTOP
frame #0: 0x000000557a8a9248 corerun'__cxa_atexit + 8
corerun'__cxa_atexit:
-> 0x557a8a9248 <+8>: add x16, x16, #0x170 ; =0x170
0x557a8a924c <+12>: br x17
corerun'::_GLOBAL__sub_I_eh_globals.cc():
0x557a8a9250 <+0>: stp x29, x30, [sp, #-0x20]!
0x557a8a9254 <+4>: adrp x0, 56
bt
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSTOP
* frame #0: 0x000000557a8a9248 corerun'__cxa_atexit + 8
frame dotnet/coreclr#1: 0x000000557a8a9374 corerun'do_arm64_start + 64
frame dotnet/coreclr#2: 0x0000007fb7eb6068 linker64'__dl__start + 8

@janvorli
Copy link
Member

@meold it is strange, since the hang happens during initialization of the process before any of our code is executed. Now thinking about it, it actually seems that the fact that it complained about the "libgnustl_shared.so" and you had to set the LD_LIBRARY_PATH while with your first attempt this was not happening seems to indicate that this actually may be an indication of the problem. Do you still have the previous binaries that worked (except for the globalization issue)? It would be helpful to compare dependencies of those and the dependencies of the ones you've just built. You can use the readelf command with -d option to dump it, so e.f. readelf -d corerun

@BruceForstall
Copy link
Member

@meold There hasn't been any activity on this issue in a long time. I'm going to close it. If you are still debugging and have additional questions, please re-open.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

8 participants