-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][FlightRPC] Tests segfault on OSX in conda-forge #36908
Comments
I haven't seen this issue on our current CI jobs. From what I've seen we are using a pretty old version of
I've created a testing PR to see if we can reproduce this issue if I try upgrading it: #36909 |
It doesn't seem to be related as for the wheels generation we use vcpkg and we are already using the newer abseil version:
or
|
When I was looking into this at the time of conda-forge/arrow-cpp-feedstock#975, I noted:
and
|
I think the cautionary comment from the test code is probably still the best hint for now:
Did someone follow this lead already? If not, could someone from flight side have a look? |
@lidavidm Could you take a look at this? |
I think that comment is a red herring. Is this x64 only, or does arm also crash? (I only have an arm mac and I'm reluctant to set up a rosetta-d dev environment on my personal machine, though I suppose I may have no choice) |
We unfortunately don't have CI for osx-arm64, so the tests don't run there. From previous experience I suspect that the segfaults would be present on osx-arm as well. But if you're on osx-arm already, you can relatively easily run the exact same test suite, by downloading an artefact for
Obviously you need to have |
Tests appear to pass on aarch64. (I had to run them manually, since, well, the package skips those tests.) I'll try to get a Rosetta-d dev environment next. |
Ok,
Backtrace:
Unfortunately this doesn't give much to go off of other than 'gRPC is maybe doing something wrong'... |
I guess I need a debug build of gRPC under Conda next... |
And it's not related to shutting down the server; splitting the server off into a separate file and having it serve forever, it'll randomly crash when running a client against it (with the same stack trace as above). |
There's a way to easily persist and download artefacts from the CI in a conda-forge PR. If you open a PR to the grpc feedstock that switches to debug builds, I can help you switch that on, so then you wouldn't have to build it yourself. |
Thanks, I'll try that in a bit. The EXC_BAD_ACCESS is because it's trying to manipulate an Abseil object, soooooo maybe that's also involved... |
PS. The trick is to add
to |
Let's hope conda-forge/grpc-cpp-feedstock#310 works... |
Ah, it's building gRPC 1.57, not 1.56, even though I based the commit off of 1.56, so I guess I'll also need to rebuild PyArrow, bleh. |
Wait, I'll create a branch here for 1.56 |
thanks. it appears we didn't get debug symbols, but I need to dig deeper/see if I did something dumb along the way... |
gah, CMAKE_ARGS sets CMAKE_BUILD_TYPE |
That didn't do it either, what the heck... |
argh, it looks like we need to package the debug info separately? https://stackoverflow.com/questions/74622676/check-if-mac-executable-has-debug-info |
And Rosetta doesn't want to build the package locally (I guess there's more finagling needed) |
I also can't build the package by having Rosetta fire up an x86 shell and trying to invoke conda-build from there... |
Ok, cleaning everything then trying again entirely under Rosetta worked...but Conda deleted the working directory so the artifacts weren't left over to poke at, one more try... |
Some dumb symlinking later and:
|
Looking at the gRPC feedstock, it looks like we never actually run tests...do we know whether conda-forge gRPC works at all on macOS? Poking around that backtrace with the debugger, it seems that gRPC just has an invalid absl::Cord somehow. |
Actually, that error linked before seems relevant still: conda-forge/grpc-cpp-feedstock#281 |
The absl::Cord representation is...complicated, and normally I'd try to bring |
Thanks for digging into this that deeply! I can try to check if we can run the libgrpc test suite in the feedstock |
A simple C++ app may also reproduce it. For posterity, to get a debug build of gRPC:
|
this is an obvious nullptr[+1]/corruption. |
Great news! Like with grpc, the problem turned out to resolve itself through upgrading abseil. :) While looking at this, I did notice something else that's a bit strange:
It looks like In any case, it seems to not matter too much - the matching |
Could you share the URL of the build log? |
|
Thanks but sorry. I couldn't find the cause of this... rpath or |
No worries. From my POV this may well just be a false positive in the build. We run the full test suite without errors, so there's basically zero risk that something is broken as-packaged. I'm going to close this issue as fixed - the recipe here will be updated in the ongoing sync in #37624. It's unfortunate that we have ~half a year of builds exposed to this (which is roughly how long we've been on abseil 20230125), but that's just how it is it seems... |
Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes #36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: #37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#37624) Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes apache#36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: apache#37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#37624) Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes apache#36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: apache#37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#37624) Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes apache#36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: apache#37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#37624) Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes apache#36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: apache#37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#37624) Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68) Relevant updates: * we're not building twice for different protobuf versions anymore * new abseil version (fixes apache#36908) * we've finally upgraded the aws-sdk to 1.11 * the default R versions (on unix) are now 4.2 & 4.3. Also some further hardening of the activation scripts & clean-ups for dependencies & test skips. * Closes: apache#37621 Lead-authored-by: H. Vetinari <[email protected]> Co-authored-by: h-vetinari <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
#14832 switched on the test suite within the conda-builds, which itself synced with conda-forge/arrow-cpp-feedstock#875.
Then as of conda-forge/arrow-cpp-feedstock#975 (synced back in #34652), we needed to add
arrow/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
Lines 398 to 399 in 5e1c112
This is a pretty serious issue because potentially the whole flight module is broken/unusable on osx. I've raised this point in several discussions around the conda testing here, but never got around to posting an issue.
I've retried with arrow 13.0.0 recently and the segfault remains, already in the first test of the
test_flight.py
moduleTo fix this, the line quoted above should be removed, and the conda tests still pass.
Component(s)
C++, FlightRPC, Packaging, Python
The text was updated successfully, but these errors were encountered: