Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #7390: push down limit filtering to adapter #7545

Merged
merged 4 commits into from
May 9, 2023
Merged

fix #7390: push down limit filtering to adapter #7545

merged 4 commits into from
May 9, 2023

Conversation

aranke
Copy link
Member

@aranke aranke commented May 8, 2023

resolves #7390

Description

Checklist

@aranke aranke requested a review from a team as a code owner May 8, 2023 13:09
@aranke aranke requested a review from a team May 8, 2023 13:09
@aranke aranke requested review from a team as code owners May 8, 2023 13:09
@aranke aranke requested review from colin-rogers-dbt, stu-k, emmyoop and VersusFacit and removed request for a team May 8, 2023 13:09
@cla-bot cla-bot bot added the cla:yes label May 8, 2023
@aranke aranke requested review from ChenyuLInx and iknox-fa May 8, 2023 13:11
tests/functional/show/test_show.py Outdated Show resolved Hide resolved
tests/functional/show/test_show.py Outdated Show resolved Hide resolved
Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love a great simpl impl (simple implementation). my first though is one @jtcohen6 already brought up in the source issue

In order to write the subquery in a cross-database-compatible way, we should either
- Use a (Jinja) macro or materialization to template the subquery with limit
- Use an adapter (Python) method — much easier to implement, and probably sufficient

with any mention of LIMIT my "incompatible with TSQL" sense tingles and prompts me to think

how will this work with SELECT TOP N?
Is this supported by pyodbc and MSFT's msodbc18?

However, the use of limit here is semantic. In reality, the change here is to make use of a DB API 2.0 Cursor method, fetchmany() (PEP 249: fetchmany()).

Now my brain is thinking about how our ConnectionManager class methods map to PEP249. As a less-in-the-weeds person, I have a hard time understanding when dbt makes use of all the execute* and fetch* variants of the DB API 2.0.

Also, is there a clear implementation recommendation for non-dbapi2 adapters (e.g. BQ, DB, and (increasingly RS)

core/dbt/adapters/sql/connections.py Show resolved Hide resolved
@aranke
Copy link
Member Author

aranke commented May 8, 2023

@dataders Looking at whether the implementations you mentioned support the Cursor object:

  1. pyodbc
  2. BigQuery
  3. DataBricks
  4. Redshift
  5. msodbc18 – Couldn't find on PyPI, should be supported via pyodbc

@iknox-fa iknox-fa removed their request for review May 8, 2023 21:56
@aranke aranke requested review from mikealfare and a team and removed request for colin-rogers-dbt, emmyoop and VersusFacit May 8, 2023 22:10
@aranke aranke merged commit 078a836 into main May 9, 2023
@aranke aranke deleted the fix_7390 branch May 9, 2023 05:22
@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

The backport to 1.5.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.5.latest 1.5.latest
# Navigate to the new working tree
cd .worktrees/backport-1.5.latest
# Create a new branch
git switch --create backport-7545-to-1.5.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 078a83679a2d77ea6da9a8fac113498791361fe8
# Push it to GitHub
git push --set-upstream origin backport-7545-to-1.5.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.5.latest

Then, create a pull request where the base branch is 1.5.latest and the compare/head branch is backport-7545-to-1.5.latest.

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 10, 2023

@aranke This is causing failures in dbt-bigquery, when I run locally and in automated tests:

13:49:25  Compilation Error in model my_model (models/my_model.sql)
13:49:25    execute() got an unexpected keyword argument 'limit'
13:49:25
13:49:25    > in macro create_or_replace_view (macros/materializations/models/view/create_or_replace_view.sql)
13:49:25    > called by macro materialization_view_bigquery (macros/materializations/view.sql)
13:49:25    > called by model my_model (models/my_model.sql)

dbt-bigquery overrides the execute method here. Granted, dbt-bigquery is something we maintain, so we can make this change ourselves — but this could impact other externally-maintained adapters as well. Is there any way to make this change without breaking/modifying the signature of the execute method?

Update: I just saw the PRs for this in dbt-labs/dbt-redshift#435, dbt-labs/dbt-bigquery#707, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2428] dbt show should include --limit in compiled query
4 participants