Sqlalchemy destination #1734

steinitzu · 2024-08-23T02:00:12Z

Draft Sqlalchemy loader. Basic things work so far.

Description

Mysql generally works.
Sqlite currently only works with dataset name main.
Support parquet and typed-jsonl file formats
Implements SqlJobClient interface so most/all sql loader tests should run (many already run against mysql)

What's left:

Override sql_client methods when sqlite is used. create/drop dataset works differently
Limit query length and number of params sent per statement in load jobs
Test bulk insert
Test with sqlite and mysql (maybe other built in dialects that can run locally postgres/mssql)
Test table reflection and table building logic
Test sqlalchemy 1.4 and 2 in CI
Lots of cleanup

Related Issues

Resolves implement sql alchemy destination #21 , Add sqlite as a dlt destination #1627 , support MySQL #1329

Additional Context

netlify · 2024-08-23T02:00:27Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`dc4c29c`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/66e48dc764ccee0008ac6e82

rudolfix

LGTM!

I was expecting that we need some kind of settings per dialect (ie DialectCapabilities) so we have optimized inserts. but it seems it is not necessary.

There are a lot of standard tests where this thing should work:

restore state tests
drop command / refresh mode tests
all pipeline tests that do not requires merge write disposition

if we enable them we are good.

MERGE write disposition looks complicated, right? We surely can generate all required statements (we have only SELECT / INSERT / DELETE) but creating temporary tables looks like a stretch. Still we can do the Athena trick and create real tables for temporary data

A REPLACE mode that uses staging dataset would be also cool.

maybe those two could go to a separate ticket

dlt/destinations/impl/sqlalchemy/db_api_client.py

rudolfix · 2024-09-03T09:21:48Z

dlt/destinations/impl/sqlalchemy/sqlalchemy_job_client.py

+    def _create_date_time_type(self, sc_t: str, precision: Optional[int]) -> sa.types.TypeEngine:
+        """Use the dialect specific datetime/time type if possible since the generic type doesn't accept precision argument"""
+        precision = precision if precision is not None else self.capabilities.timestamp_precision
+        if sc_t == "timestamp":


we support datetimes without timezones now see #1492

dlt/destinations/impl/sqlalchemy/db_api_client.py

rudolfix

please merge current devel. all big changes for 1.0 are in. this looks really good!

what about merges? IMO our standard set of sql jobs will work just fine with mysql and sqllite. maybe you could try that out?

dlt/common/destination/reference.py

rudolfix · 2024-09-11T10:57:05Z

dlt/destinations/impl/sqlalchemy/alter_table.py

+from typing import List
+
+import sqlalchemy as sa
+from alembic.runtime.migration import MigrationContext


I hope this is not an overkill... alembic deps look quite minimal so probably OK

Yeah would have liked to avoid this, but at least it was less complicated than I expected since alembic is not really used like this normally.

dlt/destinations/impl/sqlalchemy/sqlalchemy_job_client.py

Begin implementing sqlalchemy loader SQLA load job, factory, schema storage, POC sqlalchemy tests attempt Implement SqlJobClient interface Parquet load, some tests running on mysql update lockfile Limit bulk insert chunk size, sqlite create/drop schema, fixes Generate schema update Get more tests running with mysql More tests passing Fix state, schema restore

remove secrets toml remove secrets toml Revert "remove secrets toml" This reverts commit 7dd189c. Fix default pipeline name test

rudolfix

pls see comments on engine.dispose

dlt/destinations/impl/sqlalchemy/db_api_client.py

rudolfix · 2024-09-13T08:53:43Z

dlt/destinations/impl/sqlalchemy/db_api_client.py

+            self.engine = credentials.engine
+            self.external_engine = True
+        else:
+            self.engine = sa.create_engine(


could we create engine when first connection is created? sometimes sql client is created to just use a few internal functions and never opens the connection

rudolfix

@steinitzu all good
but we do not need

self.engine.dispose()

we have null pool. so it will dispose of external engines which should not happen. WDYT?

steinitzu · 2024-09-14T00:53:07Z

@steinitzu all good but we do not need
self.engine.dispose()
we have null pool. so it will dispose of external engines which should not happen. WDYT?

Hmm I have the the flag for external engine so it's not closed:
https://github.com/dlt-hub/dlt/blob/sqlalchemy-loader/dlt/destinations/impl/sqlalchemy/db_api_client.py#L115-L117

But yeah not sure if there's any need for it since the connection is closed anyway. Maybe in the special case where you override poolclass?

rudolfix · 2024-09-14T08:01:55Z

@steinitzu all good but we do not need
self.engine.dispose()
we have null pool. so it will dispose of external engines which should not happen. WDYT?
Hmm I have the the flag for external engine so it's not closed: https://github.com/dlt-hub/dlt/blob/sqlalchemy-loader/dlt/destinations/impl/sqlalchemy/db_api_client.py#L115-L117

But yeah not sure if there's any need for it since the connection is closed anyway. Maybe in the special case where you override poolclass?

heh you are right. I overlooked the external flag. then all good

rudolfix

LGTM!

steinitzu force-pushed the sqlalchemy-loader branch 2 times, most recently from aee8ada to 1e7fa6a Compare August 31, 2024 01:45

rudolfix mentioned this pull request Sep 2, 2024

1.0.0 announcement and release notes #1778

Closed

rudolfix reviewed Sep 3, 2024

View reviewed changes

steinitzu force-pushed the sqlalchemy-loader branch 3 times, most recently from 792f7a1 to b9b317a Compare September 7, 2024 00:43

rudolfix requested changes Sep 11, 2024

View reviewed changes

steinitzu added 21 commits September 11, 2024 12:15

Support destination name in tests

31d9033

Some job client/sql client tests running on sqlite

e3eaa43

Fix more tests

2973526

ALl sqlite tests passing

8caf2f3

Add sqlalchemy tests in ci

2a30b36

Type errors

e7f56c9

Test sqlalchemy in own workflow

11d52db

Fix tests, type errors

9d37ea6

Fix config

cdeb17d

CI fix

a730a91

Add alembic to handle ALTER TABLE

3326580

FIx workflow

567359d

Install mysqlclient in venv

babcd3c

Mysql service version

9dec1c5

Single fail

3e282ea

mysql healtcheck

0439015

No localhost

61c8355

Remove weaviate

84dc4cf

Change ubuntu version

4bcc425

Debug sqlite version

a9b7e49

steinitzu added 6 commits September 11, 2024 12:21

Revert

e0a0781

Use py datetime in tests

98f8de2

Test on sqlalchemy 1.4 and 2

4f8d8f6

remove secrets toml remove secrets toml Revert "remove secrets toml" This reverts commit 7dd189c. Fix default pipeline name test

Lint, no cli tests

79631b2

Update lockfile

8068595

Fix test, complex -> json

a9c89a0

steinitzu force-pushed the sqlalchemy-loader branch from b9b317a to 45e386b Compare September 11, 2024 23:33

steinitzu added 2 commits September 11, 2024 19:37

Refactor type mapper

874c871

Update tests destination config

a69d749

steinitzu force-pushed the sqlalchemy-loader branch from 45e386b to a69d749 Compare September 11, 2024 23:37

Fix tests

c25932b

steinitzu force-pushed the sqlalchemy-loader branch from ed60447 to c25932b Compare September 12, 2024 01:10

Ignore sources tests

6c426e6

steinitzu marked this pull request as ready for review September 12, 2024 01:30

steinitzu added 5 commits September 12, 2024 11:30

Fix overriding destination in test pipeline

36b585e

Fix time precision in arrow test

0208c64

Lint

65f6ef7

Fix destination setup in test

6c29071

Fix

eec4e22

rudolfix requested changes Sep 13, 2024

View reviewed changes

Use nullpool, lazy create engine, close current connection

dc4c29c

rudolfix requested changes Sep 13, 2024

View reviewed changes

rudolfix approved these changes Sep 14, 2024

View reviewed changes

rudolfix merged commit 9580baf into devel Sep 14, 2024
61 checks passed

rudolfix deleted the sqlalchemy-loader branch September 14, 2024 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sqlalchemy destination #1734

Sqlalchemy destination #1734

steinitzu commented Aug 23, 2024

netlify bot commented Aug 23, 2024 •

edited

Loading

rudolfix left a comment

rudolfix Sep 3, 2024

rudolfix left a comment

rudolfix Sep 11, 2024

steinitzu Sep 12, 2024

rudolfix left a comment

rudolfix Sep 13, 2024

rudolfix left a comment

steinitzu commented Sep 14, 2024

rudolfix commented Sep 14, 2024

rudolfix left a comment

Sqlalchemy destination #1734

Sqlalchemy destination #1734

Conversation

steinitzu commented Aug 23, 2024

Description

Related Issues

Additional Context

netlify bot commented Aug 23, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Sep 3, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Sep 11, 2024

Choose a reason for hiding this comment

steinitzu Sep 12, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

rudolfix Sep 13, 2024

Choose a reason for hiding this comment

rudolfix left a comment

Choose a reason for hiding this comment

steinitzu commented Sep 14, 2024

rudolfix commented Sep 14, 2024

rudolfix left a comment

Choose a reason for hiding this comment

netlify bot commented Aug 23, 2024 •

edited

Loading