python: update docs to use new APIs #1287

houqp · 2021-11-12T07:39:51Z

Rationale for this change

The example in our python user doc is not working with the latest binding implementation. Found this while validating #1253

What changes are included in this PR?

Added tests
Fixed python docs
Implemented __str__ for PyExpr

Are there any user-facing changes?

NO

houqp · 2021-11-12T07:41:14Z

python/datafusion/__init__.py

@@ -36,6 +38,7 @@
    "ScalarUDF",
    "column",
    "literal",
+    "functions",


this is the fix

Actually this should be done by importing the functions submodule:

import datafusion import datafusion.functions datafusion.functions.abs(datafusion.column("a"))

This is how the compute functions are exposed in pyarrow as well:

>>> import pyarrow >>> pyarrow.compute.cast Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/kszucs/.conda/envs/ibis39/lib/python3.9/site-packages/pyarrow/__init__.py", line 266, in __getattr__ raise AttributeError( AttributeError: module 'pyarrow' has no attribute 'compute' >>> import pyarrow.compute >>> pyarrow.compute.cast <function cast at 0x109349160>

yeah, importing datafusion.functions works as expected currently, this change is to make the code in our python doc work again: https://arrow.apache.org/datafusion/python/index.html#how-to-use-it. @kszucs is there any downside in making f = datafusion.functions work on top of import datafusion.functions as f?

kszucs · 2021-11-12T22:44:58Z

is there any downside in making f = datafusion.functions work on top of import datafusion.functions as f?

Not really, it just feels odd for me. If we want to expose the functions as a module then I would stick with import datafusion.functions. Otherwise we could also expose all of the functions as staticmethods on a class e.g. FunctionRegistry (which would make the rust code simpler perhaps) or we could also expose it as a python object instead of an actual module.

If we want to remain backward compatible then we can re-export the functions submodule, column as col and literal as lit symbols.

I don't have a strong opinion, so feel free to merge this PR as is - we can discuss it further after the release.

kszucs · 2021-11-12T10:42:38Z

python/datafusion/__init__.py

@@ -36,6 +38,7 @@
    "ScalarUDF",
    "column",
    "literal",
+    "functions",


This is how the compute functions are exposed in pyarrow as well:

>>> import pyarrow >>> pyarrow.compute.cast Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/kszucs/.conda/envs/ibis39/lib/python3.9/site-packages/pyarrow/__init__.py", line 266, in __getattr__ raise AttributeError( AttributeError: module 'pyarrow' has no attribute 'compute' >>> import pyarrow.compute >>> pyarrow.compute.cast <function cast at 0x109349160>

docs/source/python/index.rst

also implement __str__ for PyExpr and updated docs

houqp · 2021-11-13T08:41:29Z

@kszucs I pushed an update to keep the behavior consistent with pyarrow and updated the user doc to use the new api. Can you take another look?

kszucs · 2021-11-13T22:58:30Z

A bit late, but LGTM. Thanks @houqp!

houqp · 2021-11-14T00:57:35Z

Thank you @kszucs for the epic refactor. Time to release it to pypi :)

also implement __str__ for PyExpr and updated docs

houqp added this to the 6.0.0 milestone Nov 12, 2021

houqp requested review from jimexist, kszucs and jorgecarleitao November 12, 2021 07:39

github-actions bot added the python label Nov 12, 2021

houqp added the bug Something isn't working label Nov 12, 2021

houqp commented Nov 12, 2021

View reviewed changes

houqp force-pushed the qp_python branch from ea236ac to 8422492 Compare November 12, 2021 07:43

houqp requested a review from alamb November 12, 2021 07:48

kszucs approved these changes Nov 12, 2021

View reviewed changes

python: update user doc to use new api

c7452ad

also implement __str__ for PyExpr and updated docs

houqp force-pushed the qp_python branch from 8422492 to c7452ad Compare November 13, 2021 08:26

jimexist approved these changes Nov 13, 2021

View reviewed changes

houqp changed the title ~~python: fix datafusion.functions access~~ python: update docs to use new APIs Nov 13, 2021

houqp added documentation Improvements or additions to documentation and removed bug Something isn't working labels Nov 13, 2021

houqp merged commit b773802 into apache:master Nov 13, 2021

houqp deleted the qp_python branch November 13, 2021 21:18

matthewmturner pushed a commit to matthewmturner/arrow-datafusion that referenced this pull request Nov 16, 2021

python: update user doc to use new APIs (apache#1287)

5efbd43

also implement __str__ for PyExpr and updated docs

houqp mentioned this pull request Nov 18, 2021

function module in python binding is broken in 6.0 #1328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python: update docs to use new APIs #1287

python: update docs to use new APIs #1287

houqp commented Nov 12, 2021 •

edited

Loading

houqp Nov 12, 2021

kszucs Nov 12, 2021

kszucs Nov 12, 2021

houqp Nov 12, 2021

kszucs commented Nov 12, 2021 •

edited

Loading

kszucs Nov 12, 2021

houqp commented Nov 13, 2021

kszucs commented Nov 13, 2021

houqp commented Nov 14, 2021

python: update docs to use new APIs #1287

python: update docs to use new APIs #1287

Conversation

houqp commented Nov 12, 2021 • edited Loading

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

houqp Nov 12, 2021

Choose a reason for hiding this comment

kszucs Nov 12, 2021

Choose a reason for hiding this comment

kszucs Nov 12, 2021

Choose a reason for hiding this comment

houqp Nov 12, 2021

Choose a reason for hiding this comment

kszucs commented Nov 12, 2021 • edited Loading

kszucs Nov 12, 2021

Choose a reason for hiding this comment

houqp commented Nov 13, 2021

kszucs commented Nov 13, 2021

houqp commented Nov 14, 2021

houqp commented Nov 12, 2021 •

edited

Loading

kszucs commented Nov 12, 2021 •

edited

Loading