-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for FugueSQL integrations (#523)
* Add documentation for FugueSQL integrations * Minor nitpick around autodoc obj -> class
- Loading branch information
1 parent
7b4bc55
commit b58989f
Showing
4 changed files
with
69 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
FugueSQL Integrations | ||
===================== | ||
|
||
`FugueSQL <https://fugue-tutorials.readthedocs.io/tutorials/fugue_sql/index.html>`_ is a related project that aims to provide a unified SQL interface for a variety of different computing frameworks, including Dask. | ||
While it offers a SQL engine with a larger set of supported commands, this comes at the cost of slower performance when using Dask in comparison to dask-sql. | ||
In order to offer a "best of both worlds" solution, dask-sql includes several options to integrate with FugueSQL, using its faster implementation of SQL commands when possible and falling back on FugueSQL when necessary. | ||
|
||
dask-sql as a FugueSQL engine | ||
----------------------------- | ||
|
||
FugueSQL users unfamiliar with dask-sql can take advantage of its functionality with minimal code changes by passing :class:`dask_sql.integrations.fugue.DaskSQLExecutionEngine` into the ``FugueSQLWorkflow`` being used to execute commands. | ||
For more information and sample usage, see `Fugue — dask-sql as a FugueSQL engine <https://fugue-tutorials.readthedocs.io/tutorials/integrations/dasksql.html>`_. | ||
|
||
Using FugueSQL on an existing ``Context`` | ||
----------------------------------------- | ||
|
||
dask-sql users attempting to expand their SQL querying options for an existing ``Context`` can use :func:`dask_sql.integrations.fugue.fsql_dask`, which executes the provided query using FugueSQL, using the tables within the provided context as input. | ||
The results of this query can then optionally be registered to the context: | ||
|
||
.. code-block:: python | ||
# define a custom prepartition function for FugueSQL | ||
def median(df: pd.DataFrame) -> pd.DataFrame: | ||
df["y"] = df["y"].median() | ||
return df.head(1) | ||
# create a context with some tables | ||
c = Context() | ||
... | ||
# run a FugueSQL query using the context as input | ||
query = """ | ||
j = SELECT df1.*, df2.x | ||
FROM df1 INNER JOIN df2 ON df1.key = df2.key | ||
PERSIST | ||
TAKE 5 ROWS PREPARTITION BY x PRESORT key | ||
TRANSFORM j PREPARTITION BY x USING median | ||
""" | ||
result = fsql_dask(query, c, register=True) # results aren't registered by default | ||
assert "j" in result # returns a dict of resulting tables | ||
assert "j" in c.tables # results are also registered to the context |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters