Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add duckdb support #1398

Merged
merged 18 commits into from
Sep 25, 2024
Merged

add duckdb support #1398

merged 18 commits into from
Sep 25, 2024

Conversation

ahuang11
Copy link
Collaborator

@ahuang11 ahuang11 commented Aug 22, 2024

Closes #1397

import hvplot.duckdb

import pandas as pd
import duckdb

df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6]
})

df.to_parquet('test.parquet')

connection = duckdb.connect(database=':memory:', read_only=False)
connection.from_parquet("test.parquet").hvplot(
    x='x', y='y', kind='scatter'
)
image

Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 94.79167% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.94%. Comparing base (efeda78) to head (e008752).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
hvplot/duckdb.py 87.50% 2 Missing ⚠️
hvplot/tests/testpatch.py 89.47% 2 Missing ⚠️
hvplot/plotting/core.py 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1398      +/-   ##
==========================================
+ Coverage   88.73%   88.94%   +0.20%     
==========================================
  Files          51       52       +1     
  Lines        7592     7751     +159     
==========================================
+ Hits         6737     6894     +157     
- Misses        855      857       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcSkovMadsen
Copy link
Collaborator

Is the long term goal to implement a real duckdb backend for more efficient usage?

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 22, 2024

Can you elaborate on real duckdb backend?

This PR doesn't change the converter at all; only patches duckdb so DuckDBPyRelation.hvplot() is available.

It does not do DuckDBPyRelation.fetchdf().hvplot() AFAIK.

@MarcSkovMadsen
Copy link
Collaborator

Yes.

Is the plan to implement a real HoloViz backend such that you only query the data needed for the plot? For example when using groupby.

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

I'm not sure I follow; this PR allows hvplot off of a DuckDBRelation, which is already the result of a query.

connection > relation (select * from table where ...) > hvplot

connection.execute("SELECT * FROM table WHERE col = 'A'").hvplot()

Ah I see the confusion:

connection.from_parquet("test.parquet").hvplot(
    x='x', y='y', kind='scatter'
)

Can be drilled down further

connection.from_parquet("test.parquet").execute(...).hvplot(
    x='x', y='y', kind='scatter'
)

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

Was slightly mistaken, but a little modification works:
https://github.com/holoviz/hvplot/pull/1398/files#diff-a47979cba5da76fbc1aec07d45b4403241da5cfbbbf6b36652f75c52fb644824R360

image

Not sure if we should manually add the optimization to subset x/y columns (e.g. relation.select(x, y), or let the user do it themselves.

@ahuang11 ahuang11 marked this pull request as ready for review August 28, 2024 20:52
@ahuang11 ahuang11 requested a review from hoxbro August 28, 2024 20:53
hvplot/duckdb.py Outdated Show resolved Hide resolved
hvplot/duckdb.py Outdated Show resolved Hide resolved
hvplot/duckdb.py Outdated Show resolved Hide resolved
@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

Oh I guess it is converting to pandas object first:
lambda self: hvPlotTabular(self.df())

https://github.com/holoviz/hvplot/pull/1398/files#diff-be8c4d86a0e1601a53aca277191c3a5d8e160fc39c2b4d75396c83b6a38ae610R15

Edit: Now it does what you proposed

hvplot/plotting/__init__.py Outdated Show resolved Hide resolved
@ahuang11 ahuang11 requested a review from maximlt September 11, 2024 09:19
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok on adding DuckDB support. It looks like you should update the diagram on the landing page too, unless you consider this experimental first.
image

doc/index.md Show resolved Hide resolved
doc/user_guide/Integrations.ipynb Show resolved Hide resolved
doc/user_guide/Integrations.ipynb Outdated Show resolved Hide resolved
@maximlt maximlt added this to the 0.11.0 milestone Sep 13, 2024
@maximlt
Copy link
Member

maximlt commented Sep 13, 2024

@ahuang11 can you also update this PR based on #1359 now it has been merged?

@ahuang11 ahuang11 requested a review from maximlt September 13, 2024 21:19
@ahuang11
Copy link
Collaborator Author

Okay I think this is ready. I added the diagrams + added changes from #1359

doc/assets/diagram.svg Outdated Show resolved Hide resolved
doc/index.md Outdated Show resolved Hide resolved
ahuang11 and others added 2 commits September 23, 2024 07:43
@ahuang11 ahuang11 requested a review from maximlt September 23, 2024 14:44
hvplot/plotting/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: Simon Høxbro Hansen <[email protected]>
@maximlt maximlt merged commit c689c60 into main Sep 25, 2024
9 checks passed
@maximlt maximlt deleted the add_duckdb branch September 25, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a duckdb backend
5 participants