Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][FlightRPC] Tests segfault on OSX in conda-forge #36908

Closed
h-vetinari opened this issue Jul 27, 2023 · 36 comments · Fixed by #37624
Closed

[Python][FlightRPC] Tests segfault on OSX in conda-forge #36908

h-vetinari opened this issue Jul 27, 2023 · 36 comments · Fixed by #37624

Comments

@h-vetinari
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

#14832 switched on the test suite within the conda-builds, which itself synced with conda-forge/arrow-cpp-feedstock#875.

Then as of conda-forge/arrow-cpp-feedstock#975 (synced back in #34652), we needed to add

# segfaults on OSX: to investigate ASAP
{% set tests_to_skip = tests_to_skip + " or test_flight" %} # [osx]

This is a pretty serious issue because potentially the whole flight module is broken/unusable on osx. I've raised this point in several discussions around the conda testing here, but never got around to posting an issue.

I've retried with arrow 13.0.0 recently and the segfault remains, already in the first test of the test_flight.py module

test_feather.py .............................ss......................... [ 27%]
.......ss....x........s.....s..                                          [ 27%]
test_filesystem.py .....                                                 [ 27%]
Fatal Python error: Segmentation fault

Thread 0x0000000109600e00 (most recent call first):
  File "$PREFIX/lib/python3.8/site-packages/pyarrow/tests/test_flight.py", line 1000 in test_flight_client_close
  File "$PREFIX/lib/python3.8/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "$PREFIX/lib/python3.8/site-packages/_pytest/python.py", line 1788 in runtest
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 341 in from_call
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "$PREFIX/lib/python3.8/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "$PREFIX/lib/python3.8/site-packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "$PREFIX/lib/python3.8/site-packages/_pytest/main.py", line 324 in _main
  File "$PREFIX/lib/python3.8/site-packages/_pytest/main.py", line 270 in wrap_session
  File "$PREFIX/lib/python3.8/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_manager.py", line 112 in _hookexec
  File "$PREFIX/lib/python3.8/site-packages/pluggy/_hooks.py", line 433 in __call__
  File "$PREFIX/lib/python3.8/site-packages/_pytest/config/__init__.py", line 166 in main
  File "$PREFIX/lib/python3.8/site-packages/_pytest/config/__init__.py", line 189 in console_main
  File "$PREFIX/bin/pytest", line 10 in <module>
/Users/runner/miniforge3/conda-bld/apache-arrow_1690235820790/test_tmp/run_test.sh: line 9: 43620 Segmentation fault: 11  pytest -rfEs -k "not (_not_a_real_test or test_cuda or test_cpp_extension_in_python or test_cython_api or test_visit_strings)"

To fix this, the line quoted above should be removed, and the conda tests still pass.

Component(s)

C++, FlightRPC, Packaging, Python

@h-vetinari h-vetinari changed the title [Python] flight tests segfault on OSX [Python] flight tests segfault on OSX in conda-forge Jul 27, 2023
@raulcd
Copy link
Member

raulcd commented Jul 27, 2023

I haven't seen this issue on our current CI jobs. From what I've seen we are using a pretty old version of ABSL:

ARROW_ABSL_BUILD_VERSION=20211102.0

I've created a testing PR to see if we can reproduce this issue if I try upgrading it: #36909

@raulcd
Copy link
Member

raulcd commented Jul 27, 2023

It doesn't seem to be related as for the wheels generation we use vcpkg and we are already using the newer abseil version:

 The following packages will be built and installed:
  * abseil[core]:arm64-osx -> 20230125.0#1
    abseil[core]:arm64-osx-static-release -> 20230125.0#1

or

   * abseil[core]:x64-osx -> 20230125.0#1
    abseil[core]:amd64-osx-static-release -> 20230125.0#1

@kou kou changed the title [Python] flight tests segfault on OSX in conda-forge [Python][FlightRPC] Tests segfault on OSX in conda-forge Jul 28, 2023
@h-vetinari
Copy link
Contributor Author

When I was looking into this at the time of conda-forge/arrow-cpp-feedstock#975, I noted:

It seems this is segfaulting in test_flight.py::test_flight_list_flights on osx-64 both for arrow 10 & 11, might be related to conda-forge/grpc-cpp-feedstock#281, which I haven't had time to look into yet...

and

Looking at the source of that test and the utility class ConstantFlightServer, it notes:

See ARROW-4796: this server implementation will segfault if Flight
does not properly hold a reference to the Table object.

@h-vetinari
Copy link
Contributor Author

I think the cautionary comment from the test code is probably still the best hint for now:

See ARROW-4796: this server implementation will segfault if Flight
does not properly hold a reference to the Table object.

Did someone follow this lead already? If not, could someone from flight side have a look?

@kou
Copy link
Member

kou commented Aug 11, 2023

@lidavidm Could you take a look at this?

@lidavidm lidavidm self-assigned this Aug 11, 2023
@lidavidm
Copy link
Member

I think that comment is a red herring.

Is this x64 only, or does arm also crash? (I only have an arm mac and I'm reluctant to set up a rosetta-d dev environment on my personal machine, though I suppose I may have no choice)

@h-vetinari
Copy link
Contributor Author

Is this x64 only, or does arm also crash?

We unfortunately don't have CI for osx-arm64, so the tests don't run there. From previous experience I suspect that the segfaults would be present on osx-arm as well.

But if you're on osx-arm already, you can relatively easily run the exact same test suite, by downloading an artefact for pyarrow-tests (newest build for osx-arm64 is this), and then run

conda build --test pyarrow-tests-12.0.1-py310hfbab16f_8_cpu.conda

Obviously you need to have conda-build installed somewhere for that to run, but that should hopefully only be at most an environment creation away. ;-)

@lidavidm
Copy link
Member

63 passed, 9 skipped in 106.74s (0:01:46)

Tests appear to pass on aarch64. (I had to run them manually, since, well, the package skips those tests.) I'll try to get a Rosetta-d dev environment next.

@lidavidm
Copy link
Member

Ok,

  1. https://github.com/conda-forge/miniforge/issues/165 makes it really easy to have an x64 dev environment on aarch64, that is incredible!
  2. It does crash under Rosetta.
  3. The failure appears to be random, e.g. running an individual test will only sometimes fail.

Backtrace:

* thread #26, stop reason = EXC_BAD_ACCESS (code=1, address=0x9)
  * frame #0: 0x0000000182fdd2b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) + 68
    frame #1: 0x00000001830bfe4d libgrpc.33.0.0.dylib`grpc_error_has_clear_grpc_status(absl::lts_20230125::Status) + 125
    frame #2: 0x0000000182eb876c libgrpc.33.0.0.dylib`close_transport_locked(grpc_chttp2_transport*, absl::lts_20230125::Status) + 92
    frame #3: 0x0000000182ebecd3 libgrpc.33.0.0.dylib`read_action_locked(void*, absl::lts_20230125::Status) + 2179
    frame #4: 0x0000000182fec70e libgrpc.33.0.0.dylib`grpc_combiner_continue_exec_ctx() + 158
    frame #5: 0x0000000182ff563e libgrpc.33.0.0.dylib`grpc_core::ExecCtx::Flush() + 78
    frame #6: 0x0000000182fef566 libgrpc.33.0.0.dylib`pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) + 2630
    frame #7: 0x0000000182ff19f6 libgrpc.33.0.0.dylib`pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) + 22
    frame #8: 0x0000000182ff8c73 libgrpc.33.0.0.dylib`grpc_pollset_work(grpc_pollset*, grpc_pollset_worker**, grpc_core::Timestamp) + 19
    frame #9: 0x00000001830aa399 libgrpc.33.0.0.dylib`cq_next(grpc_completion_queue*, gpr_timespec, void*) + 569
    frame #10: 0x0000000133bca4f0 libgrpc++.1.56.2.dylib`grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) + 80
    frame #11: 0x0000000133be7495 libgrpc++.1.56.2.dylib`grpc::Server::SyncRequestThreadManager::PollForWork(void**, bool*) + 101
    frame #12: 0x0000000133bf1ea1 libgrpc++.1.56.2.dylib`grpc::ThreadManager::MainWorkLoop() + 65
    frame #13: 0x0000000133bf2b61 libgrpc++.1.56.2.dylib`grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) + 17
    frame #14: 0x0000000132a2b1dc libgpr.33.0.0.dylib`grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) + 140
    frame #15: 0x00007ff810b0f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #16: 0x00007ff810b0abd3 libsystem_pthread.dylib`thread_start + 15

Unfortunately this doesn't give much to go off of other than 'gRPC is maybe doing something wrong'...

@lidavidm
Copy link
Member

I guess I need a debug build of gRPC under Conda next...

@lidavidm
Copy link
Member

lidavidm commented Aug 14, 2023

And it's not related to shutting down the server; splitting the server off into a separate file and having it serve forever, it'll randomly crash when running a client against it (with the same stack trace as above).

@h-vetinari
Copy link
Contributor Author

EXC_BAD_ACCESS reminds me of threading issues around thread_local we saw when trying to switch to shared builds on windows (though the osx builds had been shared for much longer). Not sure if that's something a newer SDK could fix...

I guess I need a debug build of gRPC under Conda next...

There's a way to easily persist and download artefacts from the CI in a conda-forge PR. If you open a PR to the grpc feedstock that switches to debug builds, I can help you switch that on, so then you wouldn't have to build it yourself.

@lidavidm
Copy link
Member

Thanks, I'll try that in a bit.

The EXC_BAD_ACCESS is because it's trying to manipulate an Abseil object, soooooo maybe that's also involved...

@h-vetinari
Copy link
Contributor Author

PS. The trick is to add

# TODO: remove after debug
azure:
  store_build_artifacts: true

to conda-forge.yml (in feedstock root) and then let the bot rerender.

@lidavidm
Copy link
Member

Let's hope conda-forge/grpc-cpp-feedstock#310 works...

@lidavidm
Copy link
Member

lidavidm commented Aug 14, 2023

Ah, it's building gRPC 1.57, not 1.56, even though I based the commit off of 1.56, so I guess I'll also need to rebuild PyArrow, bleh.

@h-vetinari
Copy link
Contributor Author

Wait, I'll create a branch here for 1.56

@lidavidm
Copy link
Member

thanks.

it appears we didn't get debug symbols, but I need to dig deeper/see if I did something dumb along the way...

@lidavidm
Copy link
Member

gah, CMAKE_ARGS sets CMAKE_BUILD_TYPE

@lidavidm
Copy link
Member

That didn't do it either, what the heck...

@lidavidm
Copy link
Member

argh, it looks like we need to package the debug info separately? https://stackoverflow.com/questions/74622676/check-if-mac-executable-has-debug-info

@lidavidm
Copy link
Member

And Rosetta doesn't want to build the package locally (I guess there's more finagling needed)

@lidavidm
Copy link
Member

I also can't build the package by having Rosetta fire up an x86 shell and trying to invoke conda-build from there...

@lidavidm
Copy link
Member

Ok, cleaning everything then trying again entirely under Rosetta worked...but Conda deleted the working directory so the artifacts weren't left over to poke at, one more try...

@lidavidm
Copy link
Member

Some dumb symlinking later and:

(lldb) bt
* thread #23, stop reason = EXC_BAD_ACCESS (code=1, address=0x9)
  * frame #0: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] int std::__1::__cxx_atomic_fetch_add[abi:v160006]<int>(__a=0x0000000000000009, __delta=4, __order=memory_order_relaxed) at atomic:1014:12 [opt]
    frame #1: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] std::__1::__atomic_base<int, true>::fetch_add[abi:v160006](this=0x0000000000000009, __op=4, __m=memory_order_relaxed) at atomic:1649:17 [opt]
    frame #2: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::cord_internal::RefcountAndFlags::Increment(this=0x0000000000000009) at cord_internal.h:155:12 [opt]
    frame #3: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::cord_internal::CordRep::Ref(rep=0x0000000000000001) at cord_internal.h:898:17 [opt]
    frame #4: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::Cord::InlineRep::InlineRep(this=0x00000003056ad560, src=0x00000003056ad578) at cord.h:1074:17 [opt]
    frame #5: 0x0000000171f836a5 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::Cord::InlineRep::InlineRep(this=0x00000003056ad560, src=0x00000003056ad578) at cord.h:1072:39 [opt]
    frame #6: 0x0000000171f836a5 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::Cord::Cord(this=0x00000003056ad560, src=0x00000003056ad578) at cord.h:1250:38 [opt]
    frame #7: 0x0000000171f836a5 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::Cord::Cord(this=0x00000003056ad560, src=0x00000003056ad578) at cord.h:1250:63 [opt]
    frame #8: 0x0000000171f836a5 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(status=(rep_ = 105553128871169)) at status_helper.cc:280:47 [opt]
    frame #9: 0x000000017207a32d libgrpc.33.0.0.dylib`grpc_error_has_clear_grpc_status(error=(rep_ = 105553128871169)) at error_utils.cc:164:40 [opt]
    frame #10: 0x0000000171e4f85c libgrpc.33.0.0.dylib`close_transport_locked(grpc_chttp2_transport*, absl::lts_20230125::Status) [inlined] end_all_the_calls(t=0x00000001250f9a00, error=(rep_ = 105553128871169)) at chttp2_transport.cc:2469:25 [opt]
    frame #11: 0x0000000171e4f835 libgrpc.33.0.0.dylib`close_transport_locked(t=0x00000001250f9a00, error=(rep_ = 105553128871169)) at chttp2_transport.cc:694:3 [opt]
    frame #12: 0x0000000171e568dd libgrpc.33.0.0.dylib`read_action_locked(tp=0x00000001250f9a00, error=(rep_ = 105553128871169)) at chttp2_transport.cc:2615:5 [opt]
    frame #13: 0x0000000171f94773 libgrpc.33.0.0.dylib`grpc_combiner_continue_exec_ctx() at combiner.cc:231:5 [opt]
    frame #14: 0x0000000171f9d6cd libgrpc.33.0.0.dylib`grpc_core::ExecCtx::Flush(this=0x00000003056afd98) at exec_ctx.cc:75:17 [opt]
    frame #15: 0x0000000171f977da libgrpc.33.0.0.dylib`pollset_work(pollset=0x00000001247768a8, worker_hdl=0x0000000000000000, deadline=<unavailable>) at ev_poll_posix.cc:1093:49 [opt]
    frame #16: 0x0000000171f99f3b libgrpc.33.0.0.dylib`pollset_work(pollset=0x00000001247768a8, worker=0x0000000000000000, deadline=(millis_ = 13003)) at ev_posix.cc:249:7 [opt]
    frame #17: 0x0000000171fa11d3 libgrpc.33.0.0.dylib`grpc_pollset_work(pollset=<unavailable>, worker=<unavailable>, deadline=<unavailable>) at pollset.cc:48:10 [opt]
    frame #18: 0x0000000172062fd9 libgrpc.33.0.0.dylib`cq_next(cq=0x00000001247766e0, deadline=<unavailable>, reserved=<unavailable>) at completion_queue.cc:1040:29 [opt]
    frame #19: 0x0000000170f4f6e0 libgrpc++.1.56.2.dylib`grpc::CompletionQueue::AsyncNextInternal(this=0x0000600002c00d00, tag=0x00000003056afef8, ok=0x00000003056afef7, deadline=(tv_sec = 17, tv_nsec = 2051250, clock_type = GPR_CLOCK_MONOTONIC)) at completion_queue_cc.cc:146:15 [opt]
    frame #20: 0x0000000170f6c6a5 libgrpc++.1.56.2.dylib`grpc::Server::SyncRequestThreadManager::PollForWork(void**, bool*) [inlined] grpc::CompletionQueue::NextStatus grpc::CompletionQueue::AsyncNext<gpr_timespec>(this=<unavailable>, tag=0x00000003056afef8, ok=0x00000003056afef7, deadline=<unavailable>) at completion_queue.h:200:12 [opt]
    frame #21: 0x0000000170f6c697 libgrpc++.1.56.2.dylib`grpc::Server::SyncRequestThreadManager::PollForWork(this=0x0000000124713be0, tag=0x00000003056afef8, ok=0x00000003056afef7) at server_cc.cc:807:25 [opt]
    frame #22: 0x0000000170f771f1 libgrpc++.1.56.2.dylib`grpc::ThreadManager::MainWorkLoop(this=0x0000000124713be0) at thread_manager.cc:155:30 [opt]
    frame #23: 0x0000000170f77fe1 libgrpc++.1.56.2.dylib`grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) [inlined] grpc::ThreadManager::WorkerThread::Run(this=0x0000600000c054d0) at thread_manager.cc:49:13 [opt]
    frame #24: 0x0000000170f77fd9 libgrpc++.1.56.2.dylib`grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) [inlined] grpc::ThreadManager::WorkerThread::WorkerThread(this=<unavailable>, th=0x0000600000c054d0)::$_0::operator()(void*) const at thread_manager.cc:41:69 [opt]
    frame #25: 0x0000000170f77fd9 libgrpc++.1.56.2.dylib`grpc::ThreadManager::WorkerThread::WorkerThread(th=0x0000600000c054d0)::$_0::__invoke(void*) at thread_manager.cc:41:7 [opt]
    frame #26: 0x000000012a09c06c libgpr.33.0.0.dylib`grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) [inlined] grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(this=<unavailable>, v=<unavailable>)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const at thd.cc:145:11 [opt]
    frame #27: 0x000000012a09bfe4 libgpr.33.0.0.dylib`grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(v=<unavailable>)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) at thd.cc:115:9 [opt]
    frame #28: 0x00007ff810b0f1d3 libsystem_pthread.dylib`_pthread_start + 125
    frame #29: 0x00007ff810b0abd3 libsystem_pthread.dylib`thread_start + 15

@lidavidm
Copy link
Member

Looking at the gRPC feedstock, it looks like we never actually run tests...do we know whether conda-forge gRPC works at all on macOS?

Poking around that backtrace with the debugger, it seems that gRPC just has an invalid absl::Cord somehow.

@lidavidm
Copy link
Member

Actually, that error linked before seems relevant still: conda-forge/grpc-cpp-feedstock#281

@lidavidm
Copy link
Member

The absl::Cord representation is...complicated, and normally I'd try to bring rr to figure out what might've happened, but this is macOS. Regardless, I'd chalk this down to gRPC, since it occurs entirely in gRPC code, and I'm not sure how much headway I can make here.

@h-vetinari
Copy link
Contributor Author

Thanks for digging into this that deeply! I can try to check if we can run the libgrpc test suite in the feedstock

@lidavidm
Copy link
Member

A simple C++ app may also reproduce it.

For posterity, to get a debug build of gRPC:

  • git clean -fdx . in the grpc-cpp-feedstock
  • env PATH=... arch -arch x86_64 /bin/bash (clean your $PATH of any gunk)
  • Edit .scripts/run_osx_build.sh and add --keep-old-work to conda mambabuild
  • Edit the recipe to set CMAKE_BUILD_TYPE and also clean CMAKE_BUILD_TYPE out of CMAKE_ARGS (since something in the conda build setup also sets it there)
  • python3 build-locally.py
  • Symlink ln -s .../grpc-cpp-feedstock/miniforge3/conda-bld/grpc-split_1692042959526/work_moved_libgrpc-1.56.2-h162c7d8_0_osx-64/ back to the original path .../grpc-cpp-feedstock/miniforge3/conda-bld/grpc-split_1692042959526/work
  • Install the resulting package, lldb should be able to find debug symbols now

@s22chan
Copy link

s22chan commented Aug 24, 2023

frame #3: 0x0000000171f836b4 libgrpc.33.0.0.dylib`grpc_core::StatusGetChildren(absl::lts_20230125::Status) [inlined] absl::lts_20230125::cord_internal::CordRep::Ref(rep=0x0000000000000001) at cord_internal.h:898:17 [opt]

this is an obvious nullptr[+1]/corruption.

@h-vetinari
Copy link
Contributor Author

Great news! Like with grpc, the problem turned out to resolve itself through upgrading abseil. :)

While looking at this, I did notice something else that's a bit strange:

                           NOTE                                                                WHY?
   [...]                   vvvv                                                                vvv
WARNING (pyarrow,lib/python3.10/site-packages/pyarrow/libarrow_python_flight.dylib): lib/python3.8/site-packages/pyarrow/libarrow_python.dylib not found in any packages
   INFO (pyarrow,lib/python3.10/site-packages/pyarrow/libarrow_python_flight.dylib): Needed DSO lib/libarrow_flight.1300.dylib found in Users/runner/miniforge3/conda-bld::libarrow-13.0.0-hb1d2c99_4_cpu
   INFO (pyarrow,lib/python3.10/site-packages/pyarrow/libarrow_python_flight.dylib): Needed DSO lib/libarrow.1300.dylib found in Users/runner/miniforge3/conda-bld::libarrow-13.0.0-hb1d2c99_4_cpu
   INFO (pyarrow,lib/python3.10/site-packages/pyarrow/libarrow_python_flight.dylib): Needed DSO lib/libc++.1.dylib found in conda-forge::libcxx-16.0.6-hd57cbcb_0
   INFO (pyarrow,lib/python3.10/site-packages/pyarrow/libarrow_python_flight.dylib): Needed DSO /Applications/Xcode_13.2.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk/usr/lib/libSystem.B.dylib found in $SYSROOT
   INFO (pyarrow,lib/python3.10/site-packages/pyarrow/_hdfs.cpython-310-darwin.so): lib/python3.10/site-packages/pyarrow/libarrow_python.dylib found in this package
   [...]                   ^^^^                                                                ^^^
                           NOTE                                                                Compare

It looks like libarrow_python_flight.dylib somehow gets linked to the wrong arrow_python, but only on osx. I've tried to determine where this might be coming from in the respective CMakeLists.txt, but didn't manage. Of course, it's also not impossible that this is somehow a false positive from conda.

In any case, it seems to not matter too much - the matching libarrow_python and libarrow_python_flight end up in the final environment, and that seems to be enough to run correctly.

@kou
Copy link
Member

kou commented Sep 11, 2023

Could you share the URL of the build log?

@h-vetinari
Copy link
Contributor Author

@kou
Copy link
Member

kou commented Sep 13, 2023

Thanks but sorry. I couldn't find the cause of this... rpath or DYLD_LIBRARY_PATH may be related...

@h-vetinari
Copy link
Contributor Author

No worries. From my POV this may well just be a false positive in the build. We run the full test suite without errors, so there's basically zero risk that something is broken as-packaged.

I'm going to close this issue as fixed - the recipe here will be updated in the ongoing sync in #37624.

It's unfortunate that we have ~half a year of builds exposed to this (which is roughly how long we've been on abseil 20230125), but that's just how it is it seems...

jorisvandenbossche pushed a commit that referenced this issue Sep 28, 2023
Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes #36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: #37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
etseidl pushed a commit to etseidl/arrow that referenced this issue Sep 28, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants