Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL 3.5.1 crashes with Signal 4 Illegal Instruction #6150

Closed
erydit opened this issue Aug 2, 2022 · 13 comments
Closed

GDAL 3.5.1 crashes with Signal 4 Illegal Instruction #6150

erydit opened this issue Aug 2, 2022 · 13 comments

Comments

@erydit
Copy link

erydit commented Aug 2, 2022

After last update, GDAL upgraded to ver. 3.5.1. and started to crash.

Some gdal binaries (gdal-config, for an example) still work, but most of them gives an error "Illegal instruction (core dumped)".

My system:

OS: Manjaro 21.3.6 Ruah
Kernel: x86_64 Linux 5.15.57-2-MANJARO
Uptime: 26m
Packages: 1469
Shell: bash
Resolution: 1920x1080
DE: Xfce4
WM: Xfwm4
WM Theme: Matcha-sea
GTK Theme: Matcha-sea [GTK2]
Icon Theme: Papirus-Maia
Font: Noto Sans 10
Disk: 1,5T / 6,8T (23%)
CPU: AMD FX-8350 Eight-Core @ 8x 4GHz
GPU: AMD RS780 (DRM 2.50.0 / 5.15.57-2-MANJARO, LLVM 14.0.6)
RAM: 1829MiB / 19485MiB

coredumpctl info:
       PID: 2333 (gdalinfo)
       UID: 1000 (rstanislav)
       GID: 1000 (rstanislav)
    Signal: 4 (ILL)
 Timestamp: Tue 2022-08-02 13:33:42 MSK (35min ago)

Command Line: gdalinfo
Executable: /usr/bin/gdalinfo
Control Group: /user.slice/user-1000.slice/session-2.scope
Unit: session-2.scope
Slice: user-1000.slice
Session: 2
Owner UID: 1000 (rstanislav)
Boot ID: 6b3e5ea1666440239afb43c4a246b953
Machine ID: 6313df66156547e292fedaf552861e30
Hostname: rstanislav-ipen
Storage: /var/lib/systemd/coredump/core.gdalinfo.1000.6b3e5ea1666440239afb43c4a246b953.2333.1659436422>
Disk Size: 743.6K
Message: Process 2333 (gdalinfo) of user 1000 dumped core.

            Module linux-vdso.so.1 with build-id 125b0285aa529b1b1396ff79a856cc580f53371b
            Module libgflags.so.2.2 with build-id 7f92dc764545b3d8e058f547553713232168c9db
            Module libprotobuf.so.32 with build-id 86a9fa3c8369df69f01ddb2fb38c14307b91773d
            Module libthrift-0.16.0.so with build-id c3b31b7dd733754897f96dc1b0919b8ea6446c4f
            Module libbz2.so.1.0 with build-id 919597c477c9b2cb9cdbb7745ed6494ac0e6da60
            Module libre2.so.9 with build-id 8eb359a590bc49054f8b88c11baeecdb93ef6de4
            Module libutf8proc.so.2 with build-id d514e62118589d7cc2a0b6651846a918805bdf60
            Module libglog.so.1 with build-id 51ac814e28ba46783fc9b23fa75e765832042c9b
            Module liborc.so with build-id 9e406019264b1360a9a62d63c104f787661b1761
            Module libbrotlienc.so.1 with build-id 74adbc62e4fbb5da9d37b5aa458471f4130862ff
            Module libparquet.so.800 without build-id.
            Module libarrow.so.800 without build-id.
            Module ogr_Parquet.so with build-id 013b214726c92b361e3ac7f2b4f6aa42406f09f8
            Module libtirpc.so.3 with build-id 5bef2adfdee3df283f593b3e2d37b6dac405256a
            Module libbrotlicommon.so.1 with build-id acfd597a977c8087bb6184383daae2e828a9ce42
            Module libresolv.so.2 with build-id 89a368a6ad1b392d126a2a5beb9c2f61ade00279
            Module libkeyutils.so.1 with build-id ac405ddd17be10ce538da3211415ee50c8f8df79
            Module libkrb5support.so.0 with build-id 15f223925ef59dee4379ebbc0fcd14eda9ba81a2
            Module libcom_err.so.2 with build-id 3360a28740ffbbd5a5c0c21d09072445908707e5
            Module libk5crypto.so.3 with build-id cc77a742cb62447a53d98285b41558b8acd92866
            Module libkrb5.so.3 with build-id 371cc767dacb17cb42c9c44b88eebbed5ee9a756
            Module libunistring.so.2 with build-id 617dbf3d3d6f85d6556a7a036e23845e95490158
            Module libgeos.so.3.11.0 with build-id d887b72494bc4a6891c63023aac0affb11fe61bf
            Module libfreexl.so.1 with build-id 47cfde32de9f4388d151493ccad97ce484fe9bc7
            Module librttopo.so.1 with build-id 94fe58373c8c7e89593294f1e7739a64bb0ec255
            Module libminizip.so.1 with build-id 0785202fbd0261af699c78770961bf72fb3fb817
            Module libicudata.so.71 with build-id 4fef196388e678deb881978139e125e20ee2d94d
            Module libicui18n.so.71 with build-id 6fd5c97fd2808ee29958bf809656d5885e7e8963
            Module libnsl.so.3 with build-id 3063b4b800bdbadb6b136951de10ad004b40e22b
            Module libdl.so.2 with build-id 94198b268228074fa9f405bbedbbae94112593ed
            Module libpthread.so.0 with build-id 95ae4f30a6f12ccbff645d30f8e1a3ee23ec7d36
            Module libsnappy.so.1 with build-id 36e3fb247a476fe2f755162644ebcd8ebd5d92cb
            Module libicuuc.so.71 with build-id 633fdc0c5385d916571f6140e7a978ad0630ef55
            Module libltdl.so.7 with build-id c4ee3f1ba09fe34163d71ff336756fbecb6f409f
            Module libbrotlidec.so.1 with build-id 66c54e9301f7e102ecc1d88547e5f0e8a056fe22
            Module libgssapi_krb5.so.2 with build-id 292f1ce32161c0ecc4a287bc8494d5d7c420a03f
            Module libssl.so.1.1 with build-id e6b1f97a5b60b4248c49dfc5b11f53f281b507d0
            Module libpsl.so.5 with build-id 0229a201aaf5652186c9fdc192ebe52baf19d7f1
            Module libssh2.so.1 with build-id a4adfe44cc7ebd295b3b783361acc3dcfcea1d50
            Module libidn2.so.0 with build-id b16e7570b102789b13ff77289762dbfe3f8f46bc
            Module libnghttp2.so.14 with build-id 16f0981d5251b03b11a49236ac403562ee458887
            Module ld-linux-x86-64.so.2 with build-id 0effd0e43efa4468d3c31871c93af0b7f3005673
            Module libgcc_s.so.1 with build-id 0e3de903950e35ae59a5de8c00b1817a4a71ca01
            Module libstdc++.so.6 with build-id a24b312bb5881ceae0ffbed599201690f2a1747b
            Module libm.so.6 with build-id 1b7296ef9fd806e47060788389293c824b09ad72
            Module libjson-c.so.5 with build-id fc75a469bc875da3c642c484a0f8e7bd1fc2e944
            Module libproj.so.25 with build-id 55ca998e2819cf562cc79cb3fb403bc90ce50a65
            Module libgeos_c.so.1 with build-id 02ca7d0a9a84e1b80e418ccaec3142c8069861d7
            Module libexpat.so.1 with build-id 113bb5a3e9ad856801bfcfc029102c9bdc13d67e
            Module libspatialite.so.7 with build-id 51798093f0c4f5f2df6762d80e18363f3fcbff3d
            Module libpcre2-8.so.0 with build-id a0306c1eb7393936ed0fb7328c8bb117726c2adc
            Module libsqlite3.so.0 with build-id 90fb9a043b4a51db25530e16cd543c4b2a9319a9
            Module libgif.so.7 with build-id 6377b63d77aae3d04283a564860190f5a51a5a99
            Module libzstd.so.1 with build-id ab54c2881f53ab314e134f3e08c76d504376dd5d
            Module libpng16.so.16 with build-id 2dc0bce07f199bf983c07a05fb95a6f4af83a9b3
            Module libgeotiff.so.5 with build-id 64dcd1ec09f58f195bb7f63d7f4fd75348c89610
            Module libtiff.so.5 with build-id 31895d2bd133f34f0cdc2d4ac855ed838ec927b6
            Module libjpeg.so.8 with build-id 8e6d3f3e8f438912b561c43b6e7f66e6e5e097d0
            Module libxerces-c-3.2.so with build-id cf4b9cbff052cdffbec867871ff2e40ae0d88c5b
            Module libqhull_r.so.8.0 with build-id 6476f3b5c4a9e38cc1f9b76444212be60836dcfd
            Module libOpenCL.so.1 with build-id fd888c7280e1e95f15313b126285ee5fcb03508c
            Module libblosc.so.1 with build-id 0e46db305d596cfb4284246b63d9fd8e86d8a523
            Module liblz4.so.1 with build-id e63600ab23b2f6997f42fac2fa56e1f02ce159a1
            Module libdeflate.so.0 with build-id 765ed9e5721c5f26143f85ff3bf116efcd7d51f0
            Module liblzma.so.5 with build-id 28b40c7af8098a66af6ee093b6986b91cad7694d
            Module libcrypto.so.1.1 with build-id 7981ea3d69f3c28e46ee312a815af96eab93775c
            Module libcryptopp.so.8 with build-id 4451af8aca2ad19750ecd9cbf78be78bbbfd29f3
            Module libxml2.so.2 with build-id 8cdf00fa954d9a27f2f184c4d354cb14677446ac
            Module libodbcinst.so.2 with build-id e13a94e0e9019b44c7a0bb5bc432214b7e79c5be
            Module libodbc.so.2 with build-id eefddbffcd83155bb48fd3b62c80f79a6fe25b5f
            Module libcurl.so.4 with build-id 8e801dde5d7263a70bb78c67350f5762277ab9c1
            Module libz.so.1 with build-id fefe3219a96d682ec98fcfb78866b8594298b5a2
            Module libc.so.6 with build-id 60df1df31f02a7b23da83e8ef923359885b81492
            Module libgdal.so.31 with build-id 094a8ed87be3e01f48bc4271a70bb622d1510f77
            Module gdalinfo with build-id 0dc2bde2558986c76b49293cb49daa6bf8d1e9d9
            Stack trace of thread 2333:
            #0  0x00007f6f75f13bc3 n/a (libarrow.so.800 + 0x7c2bc3)
            #1  0x00007f6f75f13b2b n/a (libarrow.so.800 + 0x7c2b2b)
            #2  0x00007f6f75f13b58 n/a (libarrow.so.800 + 0x7c2b58)
            #3  0x00007f6f75f1330b _ZN5arrow7compute16BloomFilterMasksC2Ev (libarrow.so.800 + 0x7c230b)
            #4  0x00007f6f7d626f3e n/a (ld-linux-x86-64.so.2 + 0x5f3e)
            #5  0x00007f6f7d62702c n/a (ld-linux-x86-64.so.2 + 0x602c)
            #6  0x00007f6f7c3524d4 _dl_catch_exception (libc.so.6 + 0x1594d4)
            #7  0x00007f6f7d62e097 n/a (ld-linux-x86-64.so.2 + 0xd097)
            #8  0x00007f6f7c35247e _dl_catch_exception (libc.so.6 + 0x15947e)
            #9  0x00007f6f7d62e42d n/a (ld-linux-x86-64.so.2 + 0xd42d)
            #10 0x00007f6f7c28163c n/a (libc.so.6 + 0x8863c)
            #11 0x00007f6f7c35247e _dl_catch_exception (libc.so.6 + 0x15947e)
            #12 0x00007f6f7c352533 _dl_catch_error (libc.so.6 + 0x159533)
            #13 0x00007f6f7c28110f n/a (libc.so.6 + 0x8810f)
            #14 0x00007f6f7c2816f1 dlopen (libc.so.6 + 0x886f1)
            #15 0x00007f6f7c70d4e3 CPLGetSymbol (libgdal.so.31 + 0x3044e3)
            #16 0x00007f6f7cff0685 _ZN17GDALDriverManager15AutoLoadDriversEv (libgdal.so.31 + 0xbe7685)
            #17 0x00007f6f7ccb1637 GDALAllRegister (libgdal.so.31 + 0x8a8637)
            #18 0x0000563eaf81605b n/a (gdalinfo + 0x105b)
            #19 0x00007f6f7c222290 n/a (libc.so.6 + 0x29290)
            #20 0x00007f6f7c22234a __libc_start_main (libc.so.6 + 0x2934a)
            #21 0x0000563eaf816525 n/a (gdalinfo + 0x1525)
            ELF object binary architecture: AMD x86-64

Some google results assume that such an error may be caused by CPU unsupportance, but I dont know how to confirm that. Anyway, my lscpu:

lscpu:

Архитектура: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Порядок байт: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
ID прроизводителя: AuthenticAMD
Имя модели: AMD FX(tm)-8350 Eight-Core Processor
Семейство ЦПУ: 21
Модель: 2
Thread(s) per core: 2
Ядер на сокет: 4
Сокетов: 1
Степпинг: 0
Frequency boost: enabled
CPU(s) scaling MHz: 38%
CPU max MHz: 4000,0000
CPU min MHz: 1400,0000
BogoMIPS: 8003.18
Флаги: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx f
xsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good no
pl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4
_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse
4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topo
ext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lo
ck nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Virtualization features:
Виртуализация: AMD-V
Caches (sum of all):
L1d: 128 KiB (8 instances)
L1i: 256 KiB (4 instances)
L2: 8 MiB (4 instances)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling
Srbds: Not affected
Tsx async abort: Not affected

@rouault
Copy link
Member

rouault commented Aug 2, 2022

Did you build GDAL yourself or used a pre-built GDAL ? The crash seems to occur in libarrow.so used by the OGR Parquet driver. If you built yourself and don't need Parquet support, then disable it.

@erydit
Copy link
Author

erydit commented Aug 2, 2022

I use pre-build GDAL from manjaro repositories. And yes, uninstalling arrow package almost fixed gdal behavior, (with an exception of warnings messages "ERROR 1: libarrow.so.800: cannot open shared object file: No such file or directory".
Thank you, I would try to report to arrow maintainer.

@rouault
Copy link
Member

rouault commented Aug 2, 2022

would try to report to arrow maintainer.

probably first to the person in charge of the manjaro repository / the one who built libarrow

@rouault rouault closed this as completed Aug 2, 2022
@jef-n
Copy link
Contributor

jef-n commented Aug 3, 2022

Might also be related to snappy (dependency of arrow) - both arrow and snappy uses instructions that are not available everywhere. I think arrow does check whether the instruction is available at runtime, but snappy doesn't.

@erydit
Copy link
Author

erydit commented Aug 3, 2022

Might also be related to snappy (dependency of arrow) - both arrow and snappy uses instructions that are not available everywhere. I think arrow does check whether the instruction is available at runtime, but snappy doesn't.

The problem is the arrow package. apache/arrow#12681

The solution: is to rebuild the arrow package manually with flags -DARROW_SIMD_LEVEL=SSE4_2 or -DARROW_SIMD_LEVEL=NONE, depending on the instructions set supported by your CPU

@ttencate
Copy link

ttencate commented Aug 30, 2022

Also ran into this, on Arch Linux (very similar to Manjaro). I filed a bug report for the packagers of the arrow package.

Edit: And another report for the packagers of the gdal package because of the ERROR 1: libarrow.so.800: cannot open shared object file: No such file or directory errors... assuming that -DGDAL_USE_ARROW=ON means that arrow becomes a required dependency in that build.

Edit 2: Arch maintainers consider this an upstream issue, so filed #6281 for the error spam.

@Firefishy
Copy link
Contributor

This "Illegal instruction (core dumped)" also affects the official docker image ghcr.io/osgeo/gdal:ubuntu-full-latest due to the libarrow requiring much newer CPU generation.

@rouault
Copy link
Member

rouault commented Feb 4, 2024

This "Illegal instruction (core dumped)" also affects the official docker image ghcr.io/osgeo/gdal:ubuntu-full-latest due to the libarrow requiring much newer CPU generation.

Does it happen when opening any dataset or just a Feather/Parquet one?

@rouault
Copy link
Member

rouault commented Feb 4, 2024

According to https://arrow.apache.org/docs/cpp/env_vars.html#envvar-ARROW_USER_SIMD_LEVEL , default builds should only required SSE4.2 which should be available, unless you run a very old hardware.

Can you check if the following returns a non-empty string: cat /proc/cpuinfo|grep sse4_2|head -n 1 ?

@Firefishy
Copy link
Contributor

CPU flags: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp tpr_shadow flexpriority ept vpid tsc_adjust smep erms dtherm ida arat vnmi md_clear

OK:

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-small-latest
root@371a5b0625b7:/# gdal_translate --version
GDAL 3.9.0dev-da8d8118a91b8f04d69ac3c1c6a6cfcfcc9969dd, released 2024/01/26

Fail:

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-full-latest
root@0f98a1ba34ac:/# gdal_translate --version
Illegal instruction (core dumped)

@Firefishy
Copy link
Contributor

Firefishy commented Feb 5, 2024

Another system: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d - Intel(R) Xeon(R) CPU L5630 @ 2.13GHz

Fail:

$ docker run --rm -it ghcr.io/osgeo/gdal:ubuntu-full-3.8.3
root@6d116f237bc3:/# gdal_translate --version
Illegal instruction (core dumped)

But weirdly OK...

$ docker run --rm -it ghcr.io/osgeo/gdal:ubuntu-full-latest
root@47c04a9c9ce6:/# gdal_translate --version
GDAL 3.9.0dev-a58174fe9b41f71a22d2fb1f27cc7ce0dcefdefa, released 2024/02/04

@rouault
Copy link
Member

rouault commented Feb 5, 2024

OK, so sse4.2 present, but no avx2 . Are you sure the issue is with libarrow ? Because we've also identified an issue with libtiledb which required avx2. Can you try to "docker pull ghcr.io/osgeo/gdal:ubuntu-full-latest" again? I've just refreshed it a few minutes ago with a tiledb build that no longer requires avx2. See discussion at https://lists.osgeo.org/pipermail/gdal-dev/2024-February/058392.html

@Firefishy
Copy link
Contributor

BINGO! Fixed! You are awesome!

$ docker pull ghcr.io/osgeo/gdal:ubuntu-full-latest
ubuntu-full-latest: Pulling from osgeo/gdal
...
Digest: sha256:f96e8fb499313c5d67e23c4780debc833e4a7ca62ca843e5d44b384038fad247
Status: Downloaded newer image for ghcr.io/osgeo/gdal:ubuntu-full-latest
ghcr.io/osgeo/gdal:ubuntu-full-latest

$ docker run -it --rm ghcr.io/osgeo/gdal:ubuntu-full-latest
root@a7f4e82d690c:/# gdal_translate --version
GDAL 3.9.0dev-24e151d1cb6281973714207afbbce3a59719fa6f, released 2024/02/05

Yes, likely: c4505ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants