Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new DuckDB Secrets manager #403

Merged
merged 31 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
199a1b2
Add add_secret method to creds
guenp Jun 14, 2024
824ad79
need this for Python 3.12
guenp Jun 14, 2024
d551db1
put secrets stuff in own module
guenp Jun 14, 2024
1672e38
add .env to gitignore
guenp Jun 14, 2024
51ed86f
rename test_connections to test_credentials
guenp Jun 14, 2024
84829e6
convert secrets dict to secrets in post constructor
guenp Jun 14, 2024
68394e8
run create secret on cursor init
guenp Jun 14, 2024
1716959
create or replace secret if a name is specified
guenp Jun 14, 2024
c4dd7b8
remove add_secret method
guenp Jun 14, 2024
670274f
pre-commit
guenp Jun 14, 2024
ec92f03
mypy fixes
guenp Jun 14, 2024
5b2be62
prepared statements doesn't work for secrets
guenp Jun 15, 2024
a0abb89
Update readme
guenp Jun 15, 2024
34f4465
Merge branch 'master' into guenp/secrets
guenp Jun 15, 2024
4928807
fix typo in test
guenp Jun 15, 2024
1a3ba12
make test more useful
guenp Jun 15, 2024
5d9e063
add Azure secret
guenp Jun 15, 2024
9325fec
fix typo
guenp Jun 15, 2024
6be0283
add scope
guenp Jun 15, 2024
9f681ab
add HF secret
guenp Jun 15, 2024
d44e1d6
clean up code, add docstrings
guenp Jun 15, 2024
a0f706c
Deprecate using settings for secrets, bump duckdb version requirement…
guenp Jun 17, 2024
42df1dd
formatting, mypy fix
guenp Jun 17, 2024
3dd50e7
pass credentials to plugin, get creds in glue plugin
guenp Jun 21, 2024
0a1a2f6
Formatting and types
guenp Jun 25, 2024
93b4cf9
type fixes
guenp Jun 25, 2024
593683b
Merge branch 'master' into guenp/secrets
guenp Jul 15, 2024
a8350a6
add scope to test
guenp Jul 15, 2024
0d71fb6
Let duckdb handle the validation logic
guenp Jul 16, 2024
9532524
dataclasses don't like extra kwargs so let's move List[Secret] to a p…
guenp Jul 16, 2024
cd0b9b2
formatting and typing
guenp Jul 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,4 @@ target/
.DS_Store
.idea/
.vscode/
.env
29 changes: 23 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ option that will be automatically enabled if you are connecting to a MotherDuck

You can load any supported [DuckDB extensions](https://duckdb.org/docs/extensions/overview) by listing them in
the `extensions` field in your profile. You can also set any additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration)
via the `settings` field, including options that are supported in any loaded extensions. For example, to be able to connect to S3 and read/write
via the `settings` field, including options that are supported in any loaded extensions. To use the [DuckDB Secrets Manager](https://duckdb.org/docs/configuration/secrets_manager.html), you can use the `secrets` field. For example, to be able to connect to S3 and read/write
Parquet files using an AWS access key and secret, your profile would look something like this:

```
Expand All @@ -73,10 +73,11 @@ default:
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
secrets:
- type: s3
region: my-aws-region
key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
secret: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
```

Expand Down Expand Up @@ -107,7 +108,23 @@ to load (so `s3`, `gcs`, `abfs`, etc.) and then an arbitrary set of other key-va
illustrates the usage of this feature to connect to a Localstack instance running S3 from dbt-duckdb [here](https://github.com/jwills/s3-demo).

#### Fetching credentials from context
Instead of specifying the credentials through the settings block, you can also use the use_credential_provider property. If you set this to `aws` (currently the only supported implementation) and you have `boto3` installed in your python environment, we will fetch your AWS credentials using the credential provider chain as described [here](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html). This means that you can use any supported mechanism from AWS to obtain credentials (e.g., web identity tokens).

Instead of specifying the credentials through the settings block, you can also use the `CREDENTIAL_CHAIN` secret provider. This means that you can use any supported mechanism from AWS to obtain credentials (e.g., web identity tokens). You can read more about the secret providers [here](https://duckdb.org/docs/configuration/secrets_manager.html#secret-providers). To use the `CREDENTIAL_CHAIN` provider and automatically fetch credentials from AWS, specify the `provider` in the `secrets` key:

```
default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
extensions:
- httpfs
- parquet
secrets:
- type: s3
provider: credential_chain
target: dev
```

#### Attaching Additional Databases

Expand Down
86 changes: 33 additions & 53 deletions dbt/adapters/duckdb/credentials.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import os
import time
from dataclasses import dataclass
from dataclasses import field
from functools import lru_cache
from typing import Any
from typing import Dict
from typing import List
Expand All @@ -14,6 +12,8 @@
from dbt_common.exceptions import DbtRuntimeError

from dbt.adapters.contracts.connection import Credentials
from dbt.adapters.duckdb.secrets import DEFAULT_SECRET_PREFIX
from dbt.adapters.duckdb.secrets import Secret


@dataclass
Expand Down Expand Up @@ -96,6 +96,10 @@ class DuckDBCredentials(Credentials):
# (and extensions may add their own pragmas as well)
settings: Optional[Dict[str, Any]] = None

# secrets for connecting to cloud services AWS S3, Azure, Cloudfare R2,
# Google Cloud and Huggingface.
secrets: Optional[List[Dict[str, Any]]] = None

# the root path to use for any external materializations that are specified
# in this dbt project; defaults to "." (the current working directory)
external_root: str = "."
Expand Down Expand Up @@ -148,6 +152,10 @@ class DuckDBCredentials(Credentials):
retries: Optional[Retries] = None

def __post_init__(self):
self.settings = self.settings or {}
self.secrets = self.secrets or []
self._secrets = []

# Add MotherDuck plugin if the path is a MotherDuck database
# and plugin was not specified in profile.yml
if self.is_motherduck:
Expand All @@ -156,6 +164,29 @@ def __post_init__(self):
if "motherduck" not in [plugin.module for plugin in self.plugins]:
self.plugins.append(PluginConfig(module="motherduck"))

# For backward compatibility, to be deprecated in the future
if self.use_credential_provider:
if self.use_credential_provider == "aws":
self.secrets.append({"type": "s3", "provider": "credential_chain"})
else:
raise ValueError(
"Unsupported value for use_credential_provider: "
+ self.use_credential_provider
)

if self.secrets:
self._secrets = [
Secret.create(
secret_type=secret.pop("type"),
name=secret.pop("name", f"{DEFAULT_SECRET_PREFIX}{num + 1}"),
**secret,
)
for num, secret in enumerate(self.secrets)
]

def secrets_sql(self) -> List[str]:
return [secret.to_sql() for secret in self._secrets]

@property
def is_motherduck(self):
parsed = urlparse(self.path)
Expand Down Expand Up @@ -230,54 +261,3 @@ def _connection_keys(self):
"plugins",
"disable_transactions",
)

def load_settings(self) -> Dict[str, str]:
settings = self.settings or {}
if self.use_credential_provider:
if self.use_credential_provider == "aws":
settings.update(
_load_aws_credentials(ttl=_get_ttl_hash(), profile=settings.get("s3_profile")),
)
else:
raise ValueError(
"Unsupported value for use_credential_provider: "
+ self.use_credential_provider
)
return settings


def _get_ttl_hash(seconds=300):
return round(time.time() / seconds)


@lru_cache()
def _load_aws_credentials(ttl=None, profile="default") -> Dict[str, Any]:
"""
Load AWS credentials from the environment.

This function is cached to prevent unnecessary calls to the AWS API.

:param ttl: Time to live for the cache. If None, the cache will not expire.
:return: A dictionary containing the AWS credentials which can be used to configure DuckDB settings.
"""
del ttl # make mypy happy
import boto3.session

session = boto3.session.Session(profile_name=profile)

# use STS to verify that the credentials are valid; we will
# raise a helpful error here if they are not
sts = session.client("sts")
sts.get_caller_identity()

# now extract/return them
aws_creds = session.get_credentials().get_frozen_credentials()

credentials = {
"s3_access_key_id": aws_creds.access_key,
"s3_secret_access_key": aws_creds.secret_key,
"s3_session_token": aws_creds.token,
"s3_region": session.region_name,
}
# only return if value is filled
return {k: v for k, v in credentials.items() if v}
16 changes: 11 additions & 5 deletions dbt/adapters/duckdb/environments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,10 +201,14 @@ def initialize_cursor(
plugins: Optional[Dict[str, BasePlugin]] = None,
registered_df: dict = {},
):
for key, value in creds.load_settings().items():
# Okay to set these as strings because DuckDB will cast them
# to the correct type
cursor.execute(f"SET {key} = '{value}'")
if creds.settings is not None:
for key, value in creds.settings.items():
# Okay to set these as strings because DuckDB will cast them
# to the correct type
cursor.execute(f"SET {key} = '{value}'")

for sql in creds.secrets_sql():
cursor.execute(sql)

# update cursor if something is lost in the copy
# of the parent connection
Expand All @@ -229,7 +233,9 @@ def initialize_plugins(cls, creds: DuckDBCredentials) -> Dict[str, BasePlugin]:
for plugin_def in creds.plugins or []:
config = base_config.copy()
config.update(plugin_def.config or {})
plugin = BasePlugin.create(plugin_def.module, config=config, alias=plugin_def.alias)
plugin = BasePlugin.create(
plugin_def.module, config=config, alias=plugin_def.alias, credentials=creds
)
ret[plugin.name] = plugin
return ret

Expand Down
19 changes: 16 additions & 3 deletions dbt/adapters/duckdb/plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def create(
*,
config: Optional[Dict[str, Any]] = None,
alias: Optional[str] = None,
credentials: Optional[DuckDBCredentials] = None,
) -> "BasePlugin":
"""
Create a plugin from a module name and optional configuration.
Expand All @@ -61,22 +62,34 @@ def create(
except ImportError as e:
raise ImportError(f"Unable to import module '{module}': {e}")

if config is None and credentials is not None:
config = credentials.settings

if not hasattr(mod, "Plugin"):
raise ImportError(f"Module '{module}' does not have a Plugin class.")
else:
return mod.Plugin(alias or name, config or {})

def __init__(self, name: str, plugin_config: Dict[str, Any]):
return mod.Plugin(
name=alias or name, plugin_config=config or {}, credentials=credentials
)

def __init__(
self,
name: str,
plugin_config: Dict[str, Any],
credentials: Optional[DuckDBCredentials] = None,
):
"""
Initialize the BasePlugin instance with a name and its configuration.
This method should *not* be overriden by subclasses in general; any
initialization required from the configuration dictionary should be
defined in the `initialize` method.

:param name: A string representing the plugin name.
:param credentials: The DuckDB credentials
:param plugin_config: A dictionary representing the plugin configuration.
"""
self.name = name
self.creds = credentials
self.initialize(plugin_config)

def initialize(self, plugin_config: Dict[str, Any]):
Expand Down
28 changes: 23 additions & 5 deletions dbt/adapters/duckdb/plugins/glue.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from . import BasePlugin
from ..utils import TargetConfig
from dbt.adapters.base.column import Column
from dbt.adapters.duckdb.secrets import Secret


class UnsupportedFormatType(Exception):
Expand Down Expand Up @@ -263,17 +264,32 @@ def _get_table_def(
return table_def


def _get_glue_client(settings: Dict[str, Any]) -> "GlueClient":
if settings:
return boto3.client(
def _get_glue_client(
settings: Dict[str, Any], secrets: Optional[List[Dict[str, Any]]]
) -> "GlueClient":
if secrets is not None:
for secret in secrets:
if isinstance(secret, Secret) and "config" == str(secret.provider).lower():
secret_kwargs = secret.secret_kwargs or {}
client = boto3.client(
"glue",
aws_access_key_id=secret_kwargs.get("key_id"),
aws_secret_access_key=secret_kwargs.get("secret"),
aws_session_token=secret_kwargs.get("session_token"),
region_name=secret_kwargs.get("region"),
)
break
elif settings:
client = boto3.client(
"glue",
aws_access_key_id=settings.get("s3_access_key_id"),
aws_secret_access_key=settings.get("s3_secret_access_key"),
aws_session_token=settings.get("s3_session_token"),
region_name=settings.get("s3_region"),
)
else:
return boto3.client("glue")
client = boto3.client("glue")
return client


def create_or_update_table(
Expand Down Expand Up @@ -327,7 +343,9 @@ def create_or_update_table(

class Plugin(BasePlugin):
def initialize(self, config: Dict[str, Any]):
self.client = _get_glue_client(config)
if self.creds is not None:
secrets = self.creds.secrets
self.client = _get_glue_client(config, secrets)
self.database = config.get("glue_database", "default")
self.delimiter = config.get("delimiter", ",")

Expand Down
56 changes: 56 additions & 0 deletions dbt/adapters/duckdb/secrets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from dataclasses import dataclass
from typing import Any
from typing import Dict
from typing import Optional

from dbt_common.dataclass_schema import dbtClassMixin


DEFAULT_SECRET_PREFIX = "_dbt_secret_"


@dataclass
class Secret(dbtClassMixin):
type: str
persistent: Optional[bool] = False
name: Optional[str] = None
provider: Optional[str] = None
scope: Optional[str] = None
guenp marked this conversation as resolved.
Show resolved Hide resolved
secret_kwargs: Optional[Dict[str, Any]] = None

@classmethod
def create(
cls,
secret_type: str,
persistent: Optional[bool] = None,
name: Optional[str] = None,
provider: Optional[str] = None,
scope: Optional[str] = None,
**kwargs,
):
# Create and return Secret
return cls(
type=secret_type,
persistent=persistent,
name=name,
provider=provider,
scope=scope,
secret_kwargs=kwargs,
)

def to_sql(self) -> str:
name = f" {self.name}" if self.name else ""
or_replace = " OR REPLACE" if name else ""
persistent = " PERSISTENT" if self.persistent is True else ""
tab = " "
params = self.to_dict(omit_none=True)
params.update(params.pop("secret_kwargs", {}))
params_sql = f",\n{tab}".join(
[
f"{key} {value}"
for key, value in params.items()
if value is not None and key not in ["name", "persistent"]
]
)
sql = f"""CREATE{or_replace}{persistent} SECRET{name} (\n{tab}{params_sql}\n)"""
return sql
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def _dbt_duckdb_version():
install_requires=[
"dbt-common>=1,<2",
"dbt-adapters>=1,<2",
"duckdb>=0.7.0,!=0.10.3",
"duckdb>=0.10.0,!=0.10.3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering when we want to simply require >= 1.0.0? Is this materially different than that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a little mentally fuzzy on how I should do minor/major version updates for adapters like dbt-duckdb since in theory we are now de-coupled from dbt-core versions via the dbt-adapters interface /cc @dbeatty10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chiming in here from adapters, you are totally free to version as you see fit. We generally recommend that major version bumps reflect major changes to the adapter and should therefore be reserved. Minor versions should reflect meaningful changes to behavior.
Beyond versioning we do want to encourage adapters to keep it easy for folks to upgrade and to not surprise people with changes. To do that we want to introduce breaking behavioral changes as opt-in with a warning that the default behavior will change in a future version. We are going to be picking up a project in Q3 to add behavior flags to formalize this process. This should provide a framework for adapters to manage the life cycle of these changes.

Copy link
Collaborator

@jwills jwills Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Colin, super-helpful context here 🙇

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to bump to 1.0.0 in a separate PR, I want to make sure there's a commit that still has 0.10.2 included in case someone turns up with that requirement

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR open here!

# add dbt-core to ensure backwards compatibility of installation, this is not a functional dependency
"dbt-core>=1.8.0",
],
Expand Down
Loading
Loading