Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] parquet reader Segfaults with illegal SIMD instruction #31119

Closed
asfimport opened this issue Feb 11, 2022 · 10 comments
Closed

[C++] parquet reader Segfaults with illegal SIMD instruction #31119

asfimport opened this issue Feb 11, 2022 · 10 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Feb 11, 2022

When compiling with -Os (or with release type MinRelSize), and we run parquet tests (in R at least, though I imagine the pyarrow and C++ will have the same issues!) we get a segfault with an illegal opcode on systems that don't have BMI2 available when trying to read parquet files. (It turns out, the github runners for macos don't have BMI2, so this is easily testable there!)

Somehow in the optimization combined with the way our runtime detection code works, the runtime detection we normally use for this fails (though it works just fine with -O2, -O3, etc.).

When diagnosing this, I created a branch + PR that runs our R tests after installing from brew which can reliably cause this to happen: #12364 other test suites that exercise parquet reading would probably have the same issue (or even C++ tests built with -Os.

Here's a coredump:

2491 Thread_829819
+ 2491 thread_start  (in libsystem_pthread.dylib) + 15  [0x7ff801c3e00f]
+   2491 _pthread_start  (in libsystem_pthread.dylib) + 125  [0x7ff801c424f4]
+     2491 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*)  (in arrow.so) + 380  [0x109203749]
+       2491 arrow::internal::FnOnce<void ()>::operator()() &&  (in arrow.so) + 26  [0x109201f30]
+         2491 arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::ChunkedArray> >&, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*)::$_4&, unsigned long&, std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl> > >::invoke()  (in arrow.so) + 98  [0x108f125c2]
+           2491 parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*)::$_4::operator()(unsigned long, std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const  (in arrow.so) + 47  [0x108f11ed5]
+             2491 parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int, std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*, std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 273  [0x108f0c037]
+               2491 parquet::arrow::ColumnReaderImpl::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 39  [0x108f0733b]
+                 2491 parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long long)  (in arrow.so) + 137  [0x108f0794b]
+                   2491 parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> >::ReadRecords(long long)  (in arrow.so) + 442  [0x108f4f53e]
+                     2491 parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> >::ReadRecordData(long long)  (in arrow.so) + 471  [0x108f50503]
+                       2491 void parquet::internal::standard::DefLevelsToBitmapSimd<false>(short const*, long long, parquet::internal::LevelInfo, parquet::internal::ValidityBitmapInputOutput*)  (in arrow.so) + 250  [0x108fc2a5a]
+                         2491 long long parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long long, long long, parquet::internal::LevelInfo, arrow::internal::FirstTimeBitmapWriter*)  (in arrow.so) + 63  [0x108fc34da]
+                           2491 ???  (in <unknown binary>)  [0x600001354518]
+                             2491 _sigtramp  (in libsystem_platform.dylib) + 29  [0x7ff801c57e2d]
+                               2491 sigactionSegv  (in libR.dylib) + 649  [0x1042598c9]  main.c:625
+                                 2491 Rstd_ReadConsole  (in libR.dylib) + 2042  [0x10435160a]  sys-std.c:1044
+                                   2491 R_SelectEx  (in libR.dylib) + 308  [0x104350854]  sys-std.c:178
+                                     2491 __select  (in libsystem_kernel.dylib) + 10  [0x7ff801c0de4a]

And then a disassembly (where you can see a SHLX that shouldn't be there):

Dump of assembler code from 0x13ac6db00 to 0x13ac6db99ff:
 ...
--Type <RET> for more, q to quit, c to continue without paging--
   0x000000013ac6db82:	mov    $0x8,%ecx
   0x000000013ac6db87:	sub    %rax,%rcx
   0x000000013ac6db8a:	lea    0xf1520b(%rip),%rdi        # 0x13bb82d9c
   0x000000013ac6db91:	movzbl (%rcx,%rdi,1),%edi
   0x000000013ac6db95:	mov    %esi,%ebx
   0x000000013ac6db97:	and    %edi,%ebx
=> 0x000000013ac6db99:	shlx   %rax,%rbx,%rax
   0x000000013ac6db9e:	or     0x18(%r15),%al
   0x000000013ac6dba2:	mov    %al,0x18(%r15)
   0x000000013ac6dba6:	cmp    %rdx,%rcx
   0x000000013ac6dba9:	jg     0x13ac6dbf5
   0x000000013ac6dbab:	mov    %al,(%r14)
   0x000000013ac6dbae:	inc    %r14
   0x000000013ac6dbb1:	shrx   %rcx,%rsi,%rax
   0x000000013ac6dbb6:	mov    %rax,-0x20(%rbp)
   0x000000013ac6dbba:	sub    %rcx,%rdx
   0x000000013ac6dbbd:	mov    %rdx,%rbx
   0x000000013ac6dbc0:	sar    $0x3,%rbx
   0x000000013ac6dbc4:	and    $0x7,%edx
   0x000000013ac6dbc7:	cmp    $0x1,%rdx
   0x000000013ac6dbcb:	sbb    $0xffffffffffffffff,%rbx
   0x000000013ac6dbcf:	lea    -0x20(%rbp),%rsi
   0x000000013ac6dbd3:	mov    %r14,%rdi
...

We discovered this because homebrew alters the default build flags and uses -Os, though we should include a test that tests this in our CI as well (at least as a nightly) to catch it earlier: Homebrew/homebrew-core#94724

Reporter: Jonathan Keane / @jonkeane

Related issues:

Note: This issue was originally created as ARROW-15664. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Can it be reproduced without Homebrew?

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
If you want to build with MinSizeRel, you'll need to add
https://issues.apache.org/jira/browse/ARROW-15664#

elseif("${CMAKE_BUILD_TYPE}" STREQUAL "MINSIZEREL")
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${C_FLAGS_MINSIZEREL}")
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CXX_FLAGS_MINSIZEREL}")

to

elseif("${CMAKE_BUILD_TYPE}" STREQUAL "FASTDEBUG")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${C_FLAGS_FASTDEBUG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CXX_FLAGS_FASTDEBUG}")
elseif("${CMAKE_BUILD_TYPE}" STREQUAL "RELEASE")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${C_FLAGS_RELEASE}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CXX_FLAGS_RELEASE}")
elseif("${CMAKE_BUILD_TYPE}" STREQUAL "PROFILE_GEN")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${C_FLAGS_PROFILE_GEN}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CXX_FLAGS_PROFILE_GEN}")
elseif("${CMAKE_BUILD_TYPE}" STREQUAL "PROFILE_BUILD")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${C_FLAGS_PROFILE_BUILD}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CXX_FLAGS_PROFILE_BUILD}")
else()
message(FATAL_ERROR "Unknown build type: ${CMAKE_BUILD_TYPE}")
endif()
for that to work

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:

Can it be reproduced without Homebrew?

Yes, if you build arrow with CMAKE_RELEASE_TYPE=MinSizeRel (and possibly even just -Os) you can experience the segfault

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
#12422 has a crossbow build that exercises -DCMAKE_RELEASE_TYPE=MinSizeRel outside of brew

@asfimport
Copy link
Collaborator Author

Krisztian Szucs / @kszucs:
@jonkeane what's the status of this issue?

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
We have a solution, though there is concern it will impact performance. @pitrou is taking a look, but we really should merge a fix in before our release. If we don't we'll likely have all of the SIMD optimizations turned off in homebrew.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
I'm not really taking a look, it needs someone to actually try to diagnose on a M1 machine.

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
This isn't an M1 issue, it's an issue with x86 + using the wrong implementation on machines that don't have avx512. Kou had a suggestion, that I think is "just" copy/paste the implementations back to .cc and then add the macro suggested. I can attempt to do just that when I'm done with the other stuff I have on my plate — I was mostly hopeful that the copy|pasting and getting the macro right was something that would be obvious for you @pitrou. Alternatively, we could merge the fix from weston as is, and deal with the performance regression separately.

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
Also, I'm going to close this in favor of ARROW-15678 — they are ultimately the same thing, and we already have a fix(ish) in a PR for that other one.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
I'm going to investigate this more today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants