Skip to content

Commit

Permalink
Merge pull request #46 from pydiverse/doku
Browse files Browse the repository at this point in the history
Document verbs and more operators
  • Loading branch information
finn-rudolph authored Jan 2, 2025
2 parents 707ad47 + fb4089c commit 6fad241
Show file tree
Hide file tree
Showing 29 changed files with 1,319 additions and 229 deletions.
3 changes: 3 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
"member-order": "bysource",
}

autodoc_class_signature = "separated"
autodoc_default_options = {"exclude-members": "__new__"}

autosectionlabel_prefix_document = True

toc_object_entries_show_parents = "all"
Expand Down
1 change: 0 additions & 1 deletion docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Some examples how to use pydiverse.transform:
* [Best practices / beware the flatfile & embrace working with entities](/examples/best_practices_entities)

```{toctree}
/quickstart
/examples/joining
/examples/aggregations
/examples/window_functions
Expand Down
4 changes: 2 additions & 2 deletions docs/source/examples/aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ from pydiverse.transform.extended import *

tbl1 = pdt.Table(dict(a=[1, 1, 2], b=[4, 5, 6]))

tbl1 >> summarize(sum_a=sum(a), sum_b=sum(b)) >> show()
tbl1 >> group_by(tbl1.a) >> summarize(sum_b=sum(b)) >> show()
tbl1 >> summarize(sum_a=a.sum(), sum_b=b.sum()) >> show()
tbl1 >> group_by(tbl1.a) >> summarize(sum_b=b.sum()) >> show()
```

Typical aggregation functions are `sum()`, `mean()`, `count()`, `min()`, `max()`, `any()`, and `all()`.
Expand Down
21 changes: 19 additions & 2 deletions docs/source/reference/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,27 @@ API
verbs
operators/index
targets
types


.. currentmodule:: pydiverse.transform

Table
-----

.. currentmodule:: pydiverse.transform
.. autoclass:: Table
:noindex:

ColExpr
-------

.. autoclass:: ColExpr
:members: dtype
:exclude-members: __new__, __init__

Col
---

.. autoclass:: Col
:no-index:
:members: export
:exclude-members: __new__, __init__
10 changes: 10 additions & 0 deletions docs/source/reference/operators/aggregation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@
Aggregation
===========

Aggregation functions take a ``partition_by`` and ``filter`` keyword argument. The
``partition_by`` argument can only be given when used within ``mutate``. If a
``partition_by`` argument is given and there is a surrounding ``group_by`` /
``ungroup``, the ``group_by`` is ignored and the value of ``partition_by`` is used.

.. warning::
The ``filter`` argument works similar to ``Expr.filter`` in polars. But in contrast
to polars, if all values in a group are ``null`` or the group becomes empty after
filtering, the value of every aggregation function for that group is ``null``, too.

.. currentmodule:: pydiverse.transform.ColExpr
.. autosummary::
:toctree: _generated/
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/operators/arithmetic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Arithmetic

__add__
__floordiv__
__mod__
__mul__
__neg__
__pos__
Expand Down
14 changes: 14 additions & 0 deletions docs/source/reference/operators/conditional_logic.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
=================
Conditional Logic
=================

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:

when
coalesce
ColExpr.map
111 changes: 97 additions & 14 deletions docs/source/reference/operators/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,109 @@ Column Operations
window
sorting_markers
horizontal_aggregation
conditional_logic
type_conversion


.. currentmodule:: pydiverse.transform
Expression methods
------------------

.. autoclass:: ColExpr
:no-index:
:members: dtype
.. currentmodule:: pydiverse.transform.ColExpr

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:
:nosignatures:

__add__
__and__
__eq__
__floordiv__
__ge__
__gt__
__invert__
__le__
__lt__
__mod__
__mul__
__ne__
__neg__
__or__
__pos__
__pow__
__sub__
__truediv__
__xor__
abs
all
any
ascending
cast
ceil
count
dense_rank
descending
dt.day
dt.day_of_week
dt.day_of_year
dt.hour
dt.microsecond
dt.millisecond
dt.minute
dt.month
dt.second
dt.year
dur.days
dur.hours
dur.microseconds
dur.milliseconds
dur.minutes
dur.seconds
exp
fill_null
floor
is_in
is_inf
is_nan
is_not_inf
is_not_nan
is_not_null
is_null
log
map
max
mean
min
nulls_first
nulls_last
rank
round
shift
str.contains
str.ends_with
str.len
str.lower
str.replace_all
str.slice
str.starts_with
str.strip
str.to_date
str.to_datetime
str.upper
sum

lit
when
Global functions
----------------

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:
:nosignatures:

ColExpr.cast
ColExpr.map
coalesce
count
dense_rank
lit
max
min
rank
row_number
when
13 changes: 13 additions & 0 deletions docs/source/reference/operators/type_conversion.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
===============
Type Conversion
===============

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:

lit
ColExpr.cast
28 changes: 28 additions & 0 deletions docs/source/reference/types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
=====
Types
=====

.. currentmodule:: pydiverse.transform
.. autosummary::
:toctree: _generated/
:nosignatures:
:template: autosummary/short_title.rst

Dtype
Bool
Date
Datetime
Decimal
Float
Float32
Float64
Int
Int8
Int16
Int32
Int64
String
Uint8
Uint16
Uint32
Uint64
1 change: 1 addition & 0 deletions docs/source/reference/verbs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Verbs
filter
full_join
group_by
inner_join
join
left_join
mutate
Expand Down
58 changes: 56 additions & 2 deletions generate_col_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

COL_EXPR_PATH = "./src/pydiverse/transform/_internal/tree/col_expr.py"
FNS_PATH = "./src/pydiverse/transform/_internal/pipe/functions.py"
API_DOCS_PATH = "./docs/source/reference/operators/index.rst"

NAMESPACES = ["str", "dt", "dur"]

Expand Down Expand Up @@ -78,7 +79,8 @@ def generate_fn_decl(
}

annotated_kwargs = "".join(
f", {kwarg}: {context_kwarg_annotation[kwarg]} | None = None"
f", {kwarg.name}: {context_kwarg_annotation[kwarg.name]}"
+ f"{'' if kwarg.required else ' | None = None'}"
for kwarg in op.context_kwargs
)

Expand Down Expand Up @@ -116,7 +118,7 @@ def generate_fn_body(
args = add_vararg_star(args)

if op.context_kwargs is not None:
kwargs = "".join(f", {kwarg}={kwarg}" for kwarg in op.context_kwargs)
kwargs = "".join(f", {kwarg.name}={kwarg.name}" for kwarg in op.context_kwargs)
else:
kwargs = ""

Expand Down Expand Up @@ -246,3 +248,55 @@ def indent(s: str, by: int) -> str:
file.truncate()

os.system(f"ruff format {FNS_PATH}")

with open(API_DOCS_PATH, "r+") as file:
new_file_contents = ""

for line in file:
new_file_contents += line
if line.startswith("Expression methods"):
new_file_contents += (
"------------------\n\n"
".. currentmodule:: pydiverse.transform.ColExpr\n\n"
".. autosummary::\n"
" :nosignatures:\n\n "
)

new_file_contents += "\n ".join(
sorted(
[
op.name
for op in ops.__dict__.values()
if isinstance(op, Operator) and op.generate_expr_method
]
+ ["rank", "dense_rank", "map", "cast"]
)
)

new_file_contents += (
"\n\nGlobal functions\n"
"----------------\n\n"
".. currentmodule:: pydiverse.transform\n\n"
".. autosummary::\n"
" :nosignatures:\n\n "
)

new_file_contents += (
"\n ".join(
sorted(
[
op.name
for op in ops.__dict__.values()
if isinstance(op, Operator) and not op.generate_expr_method
]
+ ["when", "lit"]
)
)
+ "\n"
)

break

file.seek(0)
file.write(new_file_contents)
file.truncate()
5 changes: 2 additions & 3 deletions src/pydiverse/transform/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@

from ._internal.pipe.pipeable import verb
from ._internal.pipe.table import Table
from ._internal.tree.col_expr import ColExpr
from ._internal.tree.col_expr import Col, ColExpr
from .extended import *
from .extended import __all__ as __extended
from .types import *
from .types import __all__ as __types

__all__ = ["Table", "ColExpr", "verb"]
# __all__ += __extended + __types
__all__ = ["Table", "ColExpr", "Col", "verb"] + __extended + __types
8 changes: 1 addition & 7 deletions src/pydiverse/transform/_internal/backend/polars.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,8 @@ def export(
lf.name = nd.name
return lf

raise AssertionError

@staticmethod
def export_col(expr: ColExpr, target: Target) -> pl.Series:
if isinstance(target, Polars):
...
elif isinstance(target, Pandas):
...
return lf.collect().to_pandas(use_pyarrow_extension_array=True)

raise AssertionError

Expand Down
Loading

0 comments on commit 6fad241

Please sign in to comment.