Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43487: [Python] Sanitize Python reference handling in UDF implementation #43557

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Aug 5, 2024

  1. Remove spurious increfs (the function object is already incref'ed at an upper level)
  2. Add unit test with an ephemeral Python function object
  3. Streamline and improve Python reference handling

Copy link

github-actions bot commented Aug 5, 2024

⚠️ GitHub issue #43487 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2024

@github-actions crossbow submit -g python -g wheel

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2024

@rtpsw You're one of the people who made significant changes to the UDF implementation, can you perhaps take a look?

@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2024

@lysnikolaou Are you able to test this PR to see if this solves your issue?

@lysnikolaou
Copy link
Contributor

@pitrou Thanks for having a look at this.

Confirmed that the whole test suite passes when run under a debug build of 3.13 with this PR.

@pitrou
Copy link
Member Author

pitrou commented Aug 5, 2024

Thanks @lysnikolaou . By the way, do you build a debug mode Python yourself? Or is there a Conda package available somewhere?

@lysnikolaou
Copy link
Contributor

By the way, do you build a debug mode Python yourself?

I'm either building by myself or using pyenv.

@rtpsw
Copy link
Contributor

rtpsw commented Aug 5, 2024

@rtpsw You're one of the people who made significant changes to the UDF implementation, can you perhaps take a look?

Apologies, I currently don't have the resources to handle this.

…plementation

1. Remove spurious increfs (the function object is already incref'ed at an upper level)
2. Add unit test with an ephemeral Python function object
3. Streamline and improve Python reference handling
@pitrou pitrou force-pushed the gh43487-incref-udf branch from c22df05 to ac26d0d Compare August 6, 2024 07:57
@pitrou
Copy link
Member Author

pitrou commented Aug 6, 2024

@github-actions crossbow submit -g python -g wheel

Copy link

github-actions bot commented Aug 6, 2024

Revision: ac26d0d

Submitted crossbow builds: ursacomputing/crossbow @ actions-522226e29b

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0-numpy-1.19 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest-numpy-latest GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-20.04-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Aug 6, 2024

Our new test-conda-python-3.12-cpython-debug CI job now passes.

@icexelloss
Copy link
Contributor

@pitrou I can help review the best I can but please give me a little bit of time. I will try to get to this tomorrow.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with the UDF code, but I can follow the changes and that looks good to me

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Aug 7, 2024
Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this correctly, the new changes will only increase the refcount once UDF is registered. Since now there is a shared_ptr of OwnedRef in the registered function, it will not decrease ref until the function is unregistered or the registry goes away.

@pitrou pitrou merged commit 1f24799 into apache:main Aug 7, 2024
14 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Aug 7, 2024
@pitrou pitrou deleted the gh43487-incref-udf branch August 7, 2024 13:55
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 1f24799.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 23 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants