Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: update docs to use new APIs #1287

Merged
merged 1 commit into from
Nov 13, 2021
Merged

python: update docs to use new APIs #1287

merged 1 commit into from
Nov 13, 2021

Conversation

houqp
Copy link
Member

@houqp houqp commented Nov 12, 2021

Rationale for this change

The example in our python user doc is not working with the latest binding implementation. Found this while validating #1253

What changes are included in this PR?

  • Added tests
  • Fixed python docs
  • Implemented __str__ for PyExpr

Are there any user-facing changes?

NO

@houqp houqp added this to the 6.0.0 milestone Nov 12, 2021
@houqp houqp added the bug Something isn't working label Nov 12, 2021
@@ -36,6 +38,7 @@
"ScalarUDF",
"column",
"literal",
"functions",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this should be done by importing the functions submodule:

import datafusion
import datafusion.functions

datafusion.functions.abs(datafusion.column("a"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the compute functions are exposed in pyarrow as well:

>>> import pyarrow
>>> pyarrow.compute.cast
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kszucs/.conda/envs/ibis39/lib/python3.9/site-packages/pyarrow/__init__.py", line 266, in __getattr__
    raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'compute'
>>> import pyarrow.compute
>>> pyarrow.compute.cast
<function cast at 0x109349160>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, importing datafusion.functions works as expected currently, this change is to make the code in our python doc work again: https://arrow.apache.org/datafusion/python/index.html#how-to-use-it. @kszucs is there any downside in making f = datafusion.functions work on top of import datafusion.functions as f?

@kszucs
Copy link
Member

kszucs commented Nov 12, 2021

is there any downside in making f = datafusion.functions work on top of import datafusion.functions as f?

Not really, it just feels odd for me. If we want to expose the functions as a module then I would stick with import datafusion.functions. Otherwise we could also expose all of the functions as staticmethods on a class e.g. FunctionRegistry (which would make the rust code simpler perhaps) or we could also expose it as a python object instead of an actual module.

If we want to remain backward compatible then we can re-export the functions submodule, column as col and literal as lit symbols.

I don't have a strong opinion, so feel free to merge this PR as is - we can discuss it further after the release.

@@ -36,6 +38,7 @@
"ScalarUDF",
"column",
"literal",
"functions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the compute functions are exposed in pyarrow as well:

>>> import pyarrow
>>> pyarrow.compute.cast
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kszucs/.conda/envs/ibis39/lib/python3.9/site-packages/pyarrow/__init__.py", line 266, in __getattr__
    raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'compute'
>>> import pyarrow.compute
>>> pyarrow.compute.cast
<function cast at 0x109349160>

also implement __str__ for PyExpr and updated docs
@houqp
Copy link
Member Author

houqp commented Nov 13, 2021

@kszucs I pushed an update to keep the behavior consistent with pyarrow and updated the user doc to use the new api. Can you take another look?

@houqp houqp changed the title python: fix datafusion.functions access python: update docs to use new APIs Nov 13, 2021
@houqp houqp added documentation Improvements or additions to documentation and removed bug Something isn't working labels Nov 13, 2021
@houqp houqp merged commit b773802 into apache:master Nov 13, 2021
@houqp houqp deleted the qp_python branch November 13, 2021 21:18
@kszucs
Copy link
Member

kszucs commented Nov 13, 2021

A bit late, but LGTM. Thanks @houqp!

@houqp
Copy link
Member Author

houqp commented Nov 14, 2021

Thank you @kszucs for the epic refactor. Time to release it to pypi :)

matthewmturner pushed a commit to matthewmturner/arrow-datafusion that referenced this pull request Nov 16, 2021
also implement __str__ for PyExpr and updated docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants