Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protobuf crashes at runtime when loading tensor lib. #12794

Closed
jakiechris opened this issue Sep 4, 2017 · 17 comments
Closed

protobuf crashes at runtime when loading tensor lib. #12794

jakiechris opened this issue Sep 4, 2017 · 17 comments
Labels
type:support Support issues

Comments

@jakiechris
Copy link

hardware: Huawei P7 Android 4.4.2

i tried ndk r12b , r10e , and api 9, api 14
all run into this error:

09-04 19:10:47.640 21660-21660/com.zhuxin.ecg.jijian A/libc: Fatal signal 6 (SIGABRT) at 0x0000549c (code=-6), thread 21660 (uxin.ecg.jijian)
09-04 19:10:47.740 162-162/? I/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
09-04 19:10:47.740 162-162/? I/DEBUG: Build fingerprint: 'Huawei/P7-L00/hwp7:4.4.2/HuaweiP7-L00/C17B620:user/ota-rel-keys,release-keys'
09-04 19:10:47.740 162-162/? I/DEBUG: Revision: '0'
09-04 19:10:47.740 162-162/? I/DEBUG: pid: 21660, tid: 21660, name: uxin.ecg.jijian >>> com.zhuxin.ecg.jijian <<<
09-04 19:10:47.740 162-162/? I/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
09-04 19:10:47.880 754-754/? W/View: requestLayout() improperly called by com.android.systemui.statusbar.phone.IconMerger{434090b0 V.E..... ......I. 0,0-270,72 #7f0a0069 app:id/notificationIcons} during layout: running second layout pass
...
09-04 19:10:48.360 162-162/? I/DEBUG: #6 pc 0000274d /system/bin/linker
09-04 19:10:48.360 162-162/? I/DEBUG: #7 pc 00002823 /system/bin/linker
09-04 19:10:48.360 162-162/? I/DEBUG: #8 pc 00002975 /system/bin/linker
09-04 19:10:48.360 162-162/? I/DEBUG: #9 pc 000029e9 /system/bin/linker
09-04 19:10:48.360 162-162/? I/DEBUG: #10 pc 00000f43 /system/bin/linker
09-04 19:10:48.360 162-162/? I/DEBUG: #11 pc 00050ee /system/lib/libdvm.so (dvmLoadNativeCode(char const*, Object*, char**)+182)
09-04 19:10:48.360 162-162/? I/DEBUG: #12 pc 00068885 /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #13 pc 00027ea0 /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #14 pc 0002eef0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
09-04 19:10:48.360 162-162/? I/DEBUG: #15 pc 0002c588 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+184)
09-04 19:10:48.360 162-162/? I/DEBUG: #16 pc 00061595 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+336)
09-04 19:10:48.360 162-162/? I/DEBUG: #17 pc 000615b9 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20)
09-04 19:10:48.360 162-162/? I/DEBUG: #18 pc 0006cd7d /system/lib/libdvm.so (dvmInitClass+1020)
09-04 19:10:48.360 162-162/? I/DEBUG: #19 pc 0006da87 /system/lib/libdvm.so (dvmResolveMethod+198)
09-04 19:10:48.360 162-162/? I/DEBUG: #20 pc 000234f4 /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #21 pc 0002eef0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
09-04 19:10:48.360 162-162/? I/DEBUG: #22 pc 0002c588 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+184)
09-04 19:10:48.360 162-162/? I/DEBUG: #23 pc 00061879 /system/lib/libdvm.so (dvmInvokeMethod(Object*, Method const*, ArrayObject*, ArrayObject*, ClassObject*, bool)+392)
09-04 19:10:48.360 162-162/? I/DEBUG: #24 pc 00069963 /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #25 pc 00027ea0 /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #26 pc 0002eef0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
09-04 19:10:48.360 162-162/? I/DEBUG: #27 pc 0002c588 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+184)
09-04 19:10:48.360 162-162/? I/DEBUG: #28 pc 00061595 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+336)
09-04 19:10:48.360 162-162/? I/DEBUG: #29 pc 0004ac6b /system/lib/libdvm.so
09-04 19:10:48.360 162-162/? I/DEBUG: #30 pc 0004ed47 /system/lib/libandroid_runtime.so
09-04 19:10:48.360 162-162/? I/DEBUG: #31 pc 0004faef /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, char const*)+354)
09-04 19:10:48.360 162-162/? I/DEBUG: stack:
09-04 19:10:48.360 162-162/? I/DEBUG: beed50e0 0006be74
09-04 19:10:48.360 162-162/? I/DEBUG: beed50e4 81cfa290
09-04 19:10:48.360 162-162/? I/DEBUG: beed50e8 beed5104 [stack]

09-04 19:10:48.360 162-162/? I/DEBUG: beed50ec 812a0da8 /data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so (std::unordered_map<std::string, google::protobuf::FieldDescriptorProto_Type, google::protobuf::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, google::protobuf::FieldDescriptorProto_Type> > >::operator+48)

09-04 19:10:48.360 162-162/? I/DEBUG: beed50f0 beed51a0 [stack]
09-04 19:10:48.360 162-162/? I/DEBUG: beed50f4 81cfa290
09-04 19:10:48.360 162-162/? I/DEBUG: beed50f8 81cfa290
09-04 19:10:48.360 162-162/? I/DEBUG: beed50fc 20492111
09-04 19:10:48.360 162-162/? I/DEBUG: beed5100 81cfa290
09-04 19:10:48.360 162-162/? I/DEBUG: beed5104 00000001
09-04 19:10:48.360 162-162/? I/DEBUG: beed5108 00000015
09-04 19:10:48.370 162-162/? I/DEBUG: beed510c 71a0b990
09-04 19:10:48.370 162-162/? I/DEBUG: beed5110 00000001
09-04 19:10:48.370 162-162/? I/DEBUG: beed5114 4007d9b5 /system/lib/libc.so (write+12)
09-04 19:10:48.370 162-162/? I/DEBUG: beed5118 4008e1d8 /system/lib/libc.so
09-04 19:10:48.370 162-162/? I/DEBUG: beed511c 71a0b990
09-04 19:10:48.370 162-162/? I/DEBUG: #00 beed5120 00000006
09-04 19:10:48.370 162-162/? I/DEBUG: beed5124 00000016
09-04 19:10:48.370 162-162/? I/DEBUG: beed5128 0000549c
09-04 19:10:48.370 162-162/? I/DEBUG: beed512c 400b1f0f /system/bin/linker
09-04 19:10:48.370 162-162/? I/DEBUG: beed5130 400b1f0f /system/bin/linker
09-04 19:10:48.370 162-162/? I/DEBUG: beed5134 4005628d /system/lib/libc.so (pthread_kill+52)
09-04 19:10:48.370 162-162/? I/DEBUG: #1 beed5138 00000006
09-04 19:10:48.370 162-162/? I/DEBUG: beed513c 00000000
09-04 19:10:48.370 162-162/? I/DEBUG: beed5140 74a2f24c
09-04 19:10:48.370 162-162/? I/DEBUG: beed5144 400564a1 /system/lib/libc.so (raise+14)
09-04 19:10:48.370 162-162/? I/DEBUG: #2 beed5148 beed5154 [stack]
09-04 19:10:48.370 162-162/? I/DEBUG: beed514c 400551d7 /system/lib/libc.so
09-04 19:10:48.390 162-162/? I/DEBUG: memory near r1:
.....
09-04 19:10:49.350 754-754/? W/View: requestLayout() improperly called by com.android.systemui.statusbar.phone.IconMerger{434090b0 V.E..... ........ 0,0-270,72 #7f0a0069 app:id/notificationIcons} during second layout pass: posting in next frame
09-04 19:10:49.600 658-694/? W/InputDispatcher: channel '43db7980 com.zhuxin.ecg.jijian/com.ikinloop.ecgapplication.ui.activity.MainActivity (server)' ~ Consumer closed input channel or an error occurred. events=0x9
09-04 19:10:49.600 658-694/? E/InputDispatcher: channel '43db7980 com.zhuxin.ecg.jijian/com.ikinloop.ecgapplication.ui.activity.MainActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
09-04 19:10:49.710 362-466/? I/logserver: Object Path:/data/system/dropbox/, mask=0x00000080
09-04 19:10:49.710 362-466/? I/logserver: event->len=48, name=[email protected]
09-04 19:10:49.710 362-466/? I/logserver: process_one_event, can not find this event([email protected])
09-04 19:10:49.710 362-466/? I/logserver: clean_cur_cache:962, system(rm -r /data/log/logcache/3577632/* > /dev/null 2>&1)
09-04 19:10:49.710 658-1213/? W/InputDispatcher: Attempted to unregister already unregistered input channel '43db7980 com.zhuxin.ecg.jijian/com.ikinloop.ecgapplication.ui.activity.MainActivity (server)'
09-04 19:10:49.720 1095-1095/? I/HwLauncher: DynamicIcon onWindowVisibilityChanged 4 - com.android.calendar

@jakiechris
Copy link
Author

don't know why , somebody pls help

beed50ec 812a0da8 /data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so (std::unordered_map<std::string, google::protobuf::FieldDescriptorProto_Type, google::protobuf::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, google::protobuf::FieldDescriptorProto_Type> > >::operator+48)

@cy89
Copy link

cy89 commented Sep 12, 2017

I don't quite understand how you triggered this failure--can you please tell us more about what you built, and what command you issued to get this stack trace?

Please provide details about what platform you are using (operating system, architecture). Also include your TensorFlow version. Also, did you compile from source or install a binary? Make sure you also include the exact command if possible to produce the output included in your test case. If you are unclear what to include see the issue template displayed in the Github new issue template.

We ask for this in the issue submission template, because it is really difficult to help without that information. Thanks!

@cy89 cy89 added the stat:awaiting response Status - Awaiting response from author label Sep 12, 2017
@jakiechris
Copy link
Author

@cy89
thks for replying this, Mr. Young.
i'll answer your questions one by one in a few days , thank you again!

@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Sep 14, 2017
@aselle aselle added stat:awaiting response Status - Awaiting response from author type:support Support issues labels Sep 21, 2017
@jakiechris
Copy link
Author

jakiechris commented Nov 20, 2017

@cy89 @aselle

hi , sorry for taking so long time to reply .

phone:
Huawei/P7-L00/hwp7:4.4.2/HuaweiP7-L00/C17B620

protobuf commit version:
0b059a3

compiler:
NDK r12b
api level set to 14 (i checked 4.4.2 supports all levels under 19)

crash information:
(std::unordered_map<std::string, google::protobuf::FieldDescriptorProto_Type, google::protobuf::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, google::protobuf::FieldDescriptorProto_Type> > >::operator+48)

i think it maybe a protobuf issue, not a tensor issue, so i posted this issue to protobuf on git too:

protocolbuffers/protobuf#3922

any suggestions are highly appreciated~

@jakiechris
Copy link
Author

the news now is that i tried don't trigger any interface and :

  1. only load protobuf lib , it works well,
  2. load tensor and protobuf lib, crashes.

i assume that it's a runtime error of protobuf ,
say , protobuf can be loaded normally on the test phone,
but when u try to call interface of protobuf, it crashes.

am i right ? on loading time of tensorflow lib , classes call construct function,
did tensorflow called any interfaces of protobuf in construct functions ?

@jakiechris
Copy link
Author

i also tried the compiler :
Android NDK r10e / API 14 / armeabi-v7a / 32-bit ARM

the crash remains

@jakiechris
Copy link
Author

i down graded the version of protobuf to :
https://github.com/google/protobuf/archive/a428e42072765993ff674fda72863c9f1aa2d268.tar.gz

the crash remains.

the crash information changed to :
11-20 17:54:28.790 182-182/? I/DEBUG: #31 pc 0004faef /system/lib/libandroid_runtime.so (android::AndroidRuntime::start(char const*, char const*)+354)
11-20 17:54:28.790 182-182/? I/DEBUG: stack:
11-20 17:54:28.790 182-182/? I/DEBUG: beec00e0 751be770
11-20 17:54:28.790 182-182/? I/DEBUG: beec00e4 829f9e0c
11-20 17:54:28.790 182-182/? I/DEBUG: beec00e8 821b4c64 /data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so

it seems nothing to do with protobuf, but after i tested :
1.only load protobuf lib , it works well,
2.load tensor and protobuf lib, crashes.

after deeply thinking, there are 2 possible reasons:

  1. when loading tensorflow lib , some construction funcs of tensor called protobuf and result in crashes.
  2. tensor lib crashes at loading time , back trace of protobuf is only the dead words of victim.

@jakiechris
Copy link
Author

and i tried protobuf-lite , crashes and reported:
java.lang.UnsatisfiedLinkError: dlopen failed:

which is the same symptom with only load tensor and don't load protobuf

@jakiechris
Copy link
Author

then i tried a very old compiler:
ndk r9 api level 9

and still crashes:

11-21 15:17:44.940 2970-2970/com.zhuxin.ecg.jijian A/libc: Fatal signal 6 (SIGABRT) at 0x00000b9a (code=-6), thread 2970 (uxin.ecg.jijian)
11-21 15:17:45.050 182-182/? I/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-21 15:17:45.050 182-182/? I/DEBUG: Build fingerprint: 'Huawei/P7-L00/hwp7:4.4.2/HuaweiP7-L00/C17B620:user/ota-rel-keys,release-keys'
11-21 15:17:45.050 182-182/? I/DEBUG: Revision: '0'
11-21 15:17:45.050 182-182/? I/DEBUG: pid: 2970, tid: 2970, name: uxin.ecg.jijian >>> com.zhuxin.ecg.jijian <<<
11-21 15:17:45.050 182-182/? I/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
11-21 15:17:45.450 182-182/? I/DEBUG: r0 00000000 r1 00000b9a r2 00000006 r3 00000000
11-21 15:17:45.450 182-182/? I/DEBUG: r4 00000006 r5 00000016 r6 00000b9a r7 0000010c
11-21 15:17:45.450 182-182/? I/DEBUG: r8 0000010b r9 4005c678 sl 00000001 fp 82040428
11-21 15:17:45.450 182-182/? I/DEBUG: ip 40057f0f sp beec0120 lr 4016528d pc 40174238 cpsr 000f0010
11-21 15:17:45.450 182-182/? I/DEBUG: d0 3436646578696673 d1 0000000000000069
11-21 15:17:45.450 182-182/? I/DEBUG: d2 000000000000006e d3 0000000000000074
11-21 15:17:45.450 182-182/? I/DEBUG: d4 4058400000000000 d5 4058400000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d6 4058400000000000 d7 0000000000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d8 0000000044008000 d9 0000000000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d10 0000000000000000 d11 0000000000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d12 0000000000000000 d13 0000000000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d14 0000000000000000 d15 0000000000000000
11-21 15:17:45.450 182-182/? I/DEBUG: d16 0000000000000000 d17 00000006ffffffff
11-21 15:17:45.450 182-182/? I/DEBUG: d18 3fe47ae147ae147b d19 3fd51eb851eb851f
11-21 15:17:45.450 182-182/? I/DEBUG: d20 3fd3333333333333 d21 3fe3333333333333
11-21 15:17:45.450 182-182/? I/DEBUG: d22 3fc3333333333333 d23 3faeb851eb851eb8
11-21 15:17:45.450 182-182/? I/DEBUG: d24 40de898000000000 d25 40de89a0058c0000
11-21 15:17:45.450 182-182/? I/DEBUG: d26 3fe0000000000000 d27 40e0108ffce00000
11-21 15:17:45.450 182-182/? I/DEBUG: d28 40ef400ff4480000 d29 40e01d100abe0000
11-21 15:17:45.450 182-182/? I/DEBUG: d30 40dd4c2013880000 d31 40ed4c1013880000
11-21 15:17:45.450 182-182/? I/DEBUG: scr 60000017
11-21 15:17:45.460 182-182/? I/DEBUG: backtrace:
11-21 15:17:45.460 182-182/? I/DEBUG: #00 pc 00022238 /system/lib/libc.so (tgkill+12)
11-21 15:17:45.460 182-182/? I/DEBUG: #1 pc 00013289 /system/lib/libc.so (pthread_kill+48)
11-21 15:17:45.460 182-182/? I/DEBUG: #2 pc 0001349d /system/lib/libc.so (raise+10)
11-21 15:17:45.460 182-182/? I/DEBUG: #3 pc 000121d3 /system/lib/libc.so
11-21 15:17:45.460 182-182/? I/DEBUG: #4 pc 00021aec /system/lib/libc.so (abort+4)
11-21 15:17:45.460 182-182/? I/DEBUG: #5 pc 00efba04 /data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so (__check_for_sync8_kernelhelper+68)
11-21 15:17:45.460 182-182/? I/DEBUG: stack:
11-21 15:17:45.460 182-182/? I/DEBUG: beec00e0 751bd678
11-21 15:17:45.460 182-182/? I/DEBUG: beec00e4 82863c64
11-21 15:17:45.460 182-182/? I/DEBUG: beec00e8 8201f294 /data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so

@jakiechris
Copy link
Author

ndk r9 's highest supporting gcc version is 4.8,
so i changed all "4.9" in contrib/makefile/ to "4.8"

@jakiechris
Copy link
Author

by changing the protobuf version, crash information various:

/data/app-lib/com.zhuxin.ecg.jijian-1/libecg_sdk.so (std::unordered_map<std::string, google::protobuf::FieldDescriptorProto_Type, google::protobuf::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, google::protobuf::FieldDescriptorProto_Type> > >::operator+48)

i started to think about this may tensor lib's issue, is there any possibility that tensor lib is not compatible to various phone cpus ? course tensor incline to use gpu or other hardware..?

@jakiechris
Copy link
Author

i used:
ndk r9d gcc 4.8 compiled tensorflow 4.0, and still crashed

@jakiechris
Copy link
Author

could any one tell me :
how to open the log trace of tensorflow and protobuf ?

@jakiechris
Copy link
Author

i studied the compile procedure of tf, and find that there are three parts of source file :
1.pb source files
2.pb_text source files
3.tensor core source files

i collected all these source files to one IDE and study the source .
any one can give any suggestion of learning the source core code of tensor is appreciated.

@jakiechris
Copy link
Author

i found a path for this issue, first i studied tensor lib consists of:

1.pb source files
2.pb_text source files
3.tensor core source files
4.protobuf lib

  1. i let the compile procedure pass normally,
  2. then before AR all these obj files, i only include part of the obj files
  3. then see if the System.loadlibrary() would not crash.
  4. in app i comment out all callings to tensorflow lib.
  5. as i continuously add obj files to tensorflow lib, and i may could positioning the obj who causes crash.

just wish me luck , guys~

@jakiechris
Copy link
Author

jakiechris commented Nov 23, 2017

my project is a shared lib, and in this lib i called tensorflow lib . and linked the shared lib to apk.

i found that even if i cut tensor to hollow, it still crashes.
the problem is: i did:
target_link_libraries(${PROJECT_NAME} -Wl,--allow-multiple-definition -Wl,--whole-archive tensorflow-core)
which means pack tensor lib file into my so lib, phone of huawei p7 seems don't support this sport .

on contrast , my protobuf linked as:
target_link_libraries(${PROJECT_NAME} protobuf)
it's a big surprise that protobuf is not packet to my shared lib,
my shared lib can be loaded normally on android app, that's not surprise,
the real surprise is : why my app runs fine on many android phones when using tensor interfaces ?
is protobuf a default lib in android kernel ? and every phone just use the local protobuf lib ?

so the question turn out to be :
how to pack a static lib into a shared lib , and link the shared lib to apk ?

@jakiechris
Copy link
Author

jakiechris commented Nov 23, 2017

good news:
through very hard work of these days , this issue finally solved .

the solution is :
target_link_libraries(${PROJECT_NAME} "-Wl,--whole-archive" tensorflow-core "-Wl,--no-whole-archive")
target_link_libraries(${PROJECT_NAME} "-Wl,--whole-archive" protobuf "-Wl,--no-whole-archive")

next i will learn what whole archive and no whole archive means.

although no one replied , i still thank this tensor discuss forum , for giving me the stress to hold on to what i am looking for.

# happy Thanksgiving

@jakiechris jakiechris changed the title protobuf crashes at runtime when loading tensor lib. [solved, pls close this issue] protobuf crashes at runtime when loading tensor lib. Nov 23, 2017
@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Nov 29, 2017
@aselle aselle changed the title [solved, pls close this issue] protobuf crashes at runtime when loading tensor lib. protobuf crashes at runtime when loading tensor lib. Nov 29, 2017
@aselle aselle closed this as completed Nov 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support Support issues
Projects
None yet
Development

No branches or pull requests

3 participants