Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy load agate #263

Merged
merged 1 commit into from
Apr 8, 2024
Merged

lazy load agate #263

merged 1 commit into from
Apr 8, 2024

Conversation

dwreeves
Copy link
Contributor

TLDR: lazy-loading agate speeds up the load time of dbt by about 3.5%. Most instances of agate are unnecessary as dbt only uses it in a select few situations, and most instances of agate can be placed behind TYPE_CHECKING.

Already implemented in dbt-adapters and dbt-core.


Note: usually I perform the following check with each adapter. The following code should return an empty list:

import sys
import dbt.adapters.clickhouse.impl
print([i for i in sys.modules if "agate" in i])

However, because dbt-clickhouse has not yet been migrated to the dbt-core 1.8 ecosystem, this check is not successful:

>>> print([i for i in sys.modules if "agate" in i])
['agate.exceptions', 'agate.csv_py3', 'agate.aggregations.base', 'agate.data_types.base', 'agate.data_types.boolean', 'agate.data_types.date', 'agate.data_types.date_time', 'agate.data_types.number', 'agate.data_types.text', 'agate.data_types.time_delta', 'agate.data_types', 'agate.aggregations.all', 'agate.aggregations.any', 'agate.warns', 'agate.utils', 'agate.aggregations.count', 'agate.aggregations.has_nulls', 'agate.aggregations.percentiles', 'agate.aggregations.deciles', 'agate.aggregations.first', 'agate.aggregations.iqr', 'agate.aggregations.median', 'agate.aggregations.mad', 'agate.aggregations.max', 'agate.aggregations.max_length', 'agate.aggregations.max_precision', 'agate.aggregations.sum', 'agate.aggregations.mean', 'agate.aggregations.min', 'agate.aggregations.mode', 'agate.aggregations.quartiles', 'agate.aggregations.quintiles', 'agate.aggregations.variance', 'agate.aggregations.stdev', 'agate.aggregations.summary', 'agate.aggregations', 'agate.mapped_sequence', 'agate.columns', 'agate.computations.base', 'agate.computations.change', 'agate.computations.formula', 'agate.computations.percent', 'agate.computations.percent_change', 'agate.computations.rank', 'agate.computations.percentile_rank', 'agate.computations.slug', 'agate.computations', 'agate.config', 'agate.rows', 'agate.type_tester', 'agate.table.aggregate', 'agate.table.bar_chart', 'agate.table.bins', 'agate.table.column_chart', 'agate.table.compute', 'agate.table.denormalize', 'agate.table.distinct', 'agate.table.exclude', 'agate.table.find', 'agate.table.from_csv', 'agate.fixed', 'agate.table.from_fixed', 'agate.table.from_json', 'agate.table.from_object', 'agate.tableset.aggregate', 'agate.tableset.bar_chart', 'agate.tableset.column_chart', 'agate.tableset.from_csv', 'agate.tableset.from_json', 'agate.tableset.having', 'agate.tableset.line_chart', 'agate.tableset.merge', 'agate.tableset.print_structure', 'agate.tableset.proxy_methods', 'agate.tableset.scatterplot', 'agate.tableset.to_csv', 'agate.tableset.to_json', 'agate.tableset', 'agate.table.group_by', 'agate.table.homogenize', 'agate.table.join', 'agate.table.limit', 'agate.table.line_chart', 'agate.table.merge', 'agate.table.normalize', 'agate.table.order_by', 'agate.table.pivot', 'agate.table.print_bars', 'agate.table.print_html', 'agate.table.print_structure', 'agate.table.print_table', 'agate.table.rename', 'agate.table.scatterplot', 'agate.table.select', 'agate.table.to_csv', 'agate.table.to_json', 'agate.table.where', 'agate.table', 'agate.testcase', 'agate', 'dbt.clients.agate_helper']

So, the actual code change is not blocked by #260, but it won't "work" until #260 is resolved.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@CLAassistant
Copy link

CLAassistant commented Mar 31, 2024

CLA assistant check
All committers have signed the CLA.

@genzgd genzgd merged commit 0e4ef70 into ClickHouse:main Apr 8, 2024
15 of 16 checks passed
@genzgd
Copy link
Contributor

genzgd commented Apr 8, 2024

Thanks for the contribution!

Solovechik added a commit to tekliner/dbt-clickhouse that referenced this pull request May 27, 2024
* fix: SYSTEM SYNC REPLICA for on_cluster_clause (ClickHouse#156)

* fix SYSTEM SYNC REPLICA

* add schema

* Update version and pypi job

* Fix incompatible return type (ClickHouse#162)

* Distributed table materialization (ClickHouse#163)

* distributed table materialization

* fix rebase

* PR fixes

* Bump version

* Tweak PyPI build Python release

* Add space to exchange_tables_atomic macro (ClickHouse#168)

* Add space to exchange_tables_atomic macro

This changes the SYSTEM SYNC REPLICA query to have a space between the
ON CLUSTER clause and the table name.

* Move whitespace to on_cluster_clause

* Fix bad logging/error handling (ClickHouse#170)

* Distributed incremental materialization (ClickHouse#172)

* distributed table materialization

* fix rebase

* PR fixes

* distributed incremental materialization

* fix

* fix

* add insert_distributed_sync to README.md

* add checks on  insert_distributed_sync

* add checks on  insert_distributed_sync

* review fixes

* Update version and tweak docs

* Lw delete set fix (ClickHouse#174)

* Move lightweight delete settings to per query for HTTP stickiness fix

* Minor cleanup and doc updates

* Fix legacy incremental materialization (ClickHouse#178)

* fix: distributed_table materialization issue (ClickHouse#184)

* Bump version and changelog (ClickHouse#185)

* cluster names containing dash characters (ClickHouse#198) (ClickHouse#200)

Co-authored-by: the4thamigo-uk <the4thamigo-uk>

* Add basic error test, fix minor merge conflict (ClickHouse#202)

* Cluster setting and Distributed Table tests (ClickHouse#186)

* added can_on_cluster var in ClickhouseRelation

* add tests for cluster

* fix lint issue

* debug set cluster env variable

* debug test

* debug and add tests

* skip distributed table grant test

* debug workflow

* debug workflow

* debug test

* add tests fro distributed_incremental

* fix zk path error

* fix wrong alias for distributed materializations

update aliase test

* update base on review

* Update version and CHANGELOG, incorporate cluster name fix (ClickHouse#203)

* Release 1 5 0 (ClickHouse#210)

* Initial 1.5.0 commit

* Reorganize basic tests

* Fix lint

* Add case sensitive cache

* Fix s3 bucket bug

* Checkpoint for constraints/contracts

* Fix native column query

* Loosen replication test

* Checkpoint for constraints tests

* Checkpoint for constraints tests

* Add rendering of model level CHECK constraints

* Fix lint

* Reorganize test files

* Add one hooks test

* Fix lint

* Update test and dependency versions. (ClickHouse#211)

* Adjust the wrapper parenthesis around the table materialization sql code (ClickHouse#212)

* Update for 1.5.1 bug fix

* Fix creation of replicated tables when using legacy materialization (ClickHouse#208)

* On cluster sync cleanup

* Bug fixes related to model settings. (ClickHouse#214)

* Add materialization macro for materialized view (ClickHouse#207)

* Add materialization macro for materialized view

* fix isort issues in materialized view test

* Release 1 6 0 (ClickHouse#215)

* Initial dbt 1.6 update

* Add skipped clone test

* Clean up MV PR

* Release 1 6 1 (ClickHouse#217)

* Identifier quoting checkpoint

* Identifier quoting checkpoint

* Fix distributed table local quoting

* Fix issues with deduplication settings

* Release 1 6 2 (ClickHouse#219)

* Limited fix to completely broken `on_schema_change`

* Tweak changelog

* Release 1 7 0 (ClickHouse#220)

* Initial dependency updates for 1.7.x

* Initial dependency updates for 1.7.x

* Correctly warn or error if light weight deletes not available

* Wrap columns_in_query query in select statement (ClickHouse#222)

* Wrap columns_in_query query in select statement

* formatting

* Update changelog

* allows to add a comment in table's or view's metadata

* add settings_section flag as comment for code using settings

* override test sql macro and add limit-placer macro

* update CHANGELOG.md

* fix: use correct schema for MV target tables (ClickHouse#244)

* fix: use correct schema when updating MVs

The existing implementation passes just the name for `target_table`,
which ultimately means that the target schema is not included when the
final SQL is generated. By passing the entire relation object, the
correct target schema will be present in the final SQL.

* update MV tests

Provide a custom schema to make sure that the full target table
name (schema + relation name) is included in the CREATE MATERIALIZED
VIEW statement

* Update changelog

* rename end of query flag

* Bug/223 relationship test with limit (ClickHouse#245)

* add settings_section flag as comment for code using settings

* override test sql macro and add limit-placer macro

* update CHANGELOG.md

* rename end of query flag

* Revert "Bug/223 relationship test with limit (ClickHouse#245)" (ClickHouse#247)

This reverts commit d8afb93.

* always return --end_of_sql when asking for settings

* Add model settings based on materialization type

* support setting clause on view creation

* edit CHANGELOG.md

* Bump version and tweak changelog

* change list syntax to satisfy lint test

* change list syntax to satisfy lint test

* change imports order to satisfy lint test

* Add typing to satisfy lint

* Add snapshot materialization to default settings

* Fix tests - add distributed_table and distributed_incremental materializations

* Fix tests - make sure to call the get_model_settings only when materialization is view

* clean up recent changelog

* Add materialization macro for dictionaries

* address lint issue in dictionary test

* address lint issue with enum

* Fix model settings with custom materialization

* Release 1.7.4 housekeeping (ClickHouse#261)

* Bump black from 23.11.0 to 24.3.0 (ClickHouse#259)

Bumps [black](https://github.com/psf/black) from 23.11.0 to 24.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](psf/black@23.11.0...24.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Release 1 7 5 (ClickHouse#265)

* Release 1.7.5 housekeeping

* Upgrade setuptools requirement for clickhouse_driver install

* Remove flake8 checks for the moment

* Update workflow actions

* Fix black comma

* fix(clients): add newlines around subquery when retrieving columns to avoid a syntax error (ClickHouse#262)

* Fix lint

* lazy load agate (ClickHouse#263)

* feat: add TTL support (ClickHouse#254)

* Fix lint

* Update table relation after exchange command (ClickHouse#230)

Related to ClickHouse#226

* feat: allow to add connection overrides for dictionaries (ClickHouse#267)

* Housekeeping for 1.7.6 release (ClickHouse#268)

* Revert "allows to add a comment in table's or view's metadata"

* Fix bool_or behavior (ClickHouse#270)

* feat: support column codecs

* Use Column.data_type in ClickHouseAdapter.format_columns

* Always apply query_settings in clickhouse__insert_into macro

* Add ClickHouseColumn.is_low_cardinality

* Update column type test cases for LowCardinality

* Omit empty dictionary connection_overrides from materialization DDL

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Schum <[email protected]>
Co-authored-by: Geoff Genz <[email protected]>
Co-authored-by: Sergey Reshetnikov <[email protected]>
Co-authored-by: gladkikhtutu <[email protected]>
Co-authored-by: Damir Basic Knezevic <[email protected]>
Co-authored-by: Zhenbang <[email protected]>
Co-authored-by: Andy <[email protected]>
Co-authored-by: gfunc <[email protected]>
Co-authored-by: Kristof Szaloki <[email protected]>
Co-authored-by: Steven Reitsma <[email protected]>
Co-authored-by: Rory Sawyer <[email protected]>
Co-authored-by: ptemarvelde <[email protected]>
Co-authored-by: Dmitrii Tcimokha <[email protected]>
Co-authored-by: bentsileviav <[email protected]>
Co-authored-by: Dmitriy Sokolov <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: triou <[email protected]>
Co-authored-by: Daniel Reeves <[email protected]>
Co-authored-by: Cristhian Garcia <[email protected]>
Co-authored-by: Thomas Schmidt <[email protected]>
Co-authored-by: scrawfor <[email protected]>
Co-authored-by: Robin Norgren <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants