You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure if this is intentional or if there's something on my end, but I am unable to query tables in the DuckDB file that are created during the DBT run. I am using AWS S3/parquet as my 'data lake'. I'm using duckdb v 0.10.0 and dbt-duckdb v 1.7.2. Here's my profiles.yml:
The pipeline runs successfully. I get all the expected parquet files in S3. The curious issue is with the duckdb file.
For example, I have a model that creates a table in a db.duckdb file: db.main.my_model. I also run a DBT test on this model (it checks that values in a column are unique). This creates a table db.main_dbt_test__audit.unique_column_id. I like to then use python/duckdb to directly query the tables when I'm debugging or exploring, but it errors when querying db.main.my_model:
con = duckdb.connect('db.duckdb')
df = con.query(
"""
select *
from db.main.my_model
"""
)
yields: duckdb.duckdb.HTTPException: HTTP Error: HTTP GET error on 'https://my-bucket.s3.amazonaws.com/pipeline/my_model.parquet' (HTTP 403)
However, if I run:
con = duckdb.connect('db.duckdb')
df = con.query(
"""
select *
from db.main_dbt_test__audit.unique_column_id
"""
)
I am able to see the data. Is there something inherently different about tables create via duckdb models? How does the duckdb file know to point to S3? I think the HTTP 403 is a red herring because the DBT pipeline runs meaning that it is able to connect to AWS. I'd really appreciate any thoughts on this, thanks!
The text was updated successfully, but these errors were encountered:
The rub here is that the created DuckDB file won't have the S3 access key/secret present to be able to read the tables that are backed by files in S3 (the result of tests are just regular DuckDB tables, which is why you can query them.) You should be able to read the tables by loading the aws extension and running CALL load_aws_credentials(); to load your credentials.
To avoid needing to do this, you can use the new CREATE SECRET construct via a dbt macro and then DuckDB will store the credentials in an encrypted format in a special . directory in home: https://duckdb.org/docs/sql/statements/create_secret.html
I am not sure if this is intentional or if there's something on my end, but I am unable to query tables in the DuckDB file that are created during the DBT run. I am using AWS S3/parquet as my 'data lake'. I'm using
duckdb
v 0.10.0 anddbt-duckdb
v 1.7.2. Here's my profiles.yml:The pipeline runs successfully. I get all the expected parquet files in S3. The curious issue is with the duckdb file.
For example, I have a model that creates a table in a
db.duckdb
file:db.main.my_model
. I also run a DBT test on this model (it checks that values in a column are unique). This creates a tabledb.main_dbt_test__audit.unique_column_id
. I like to then use python/duckdb to directly query the tables when I'm debugging or exploring, but it errors when queryingdb.main.my_model
:yields:
duckdb.duckdb.HTTPException: HTTP Error: HTTP GET error on 'https://my-bucket.s3.amazonaws.com/pipeline/my_model.parquet' (HTTP 403)
However, if I run:
I am able to see the data. Is there something inherently different about tables create via duckdb models? How does the duckdb file know to point to S3? I think the
HTTP 403
is a red herring because the DBT pipeline runs meaning that it is able to connect to AWS. I'd really appreciate any thoughts on this, thanks!The text was updated successfully, but these errors were encountered: