Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoting changes from 1.4.0 to 1.4.1 break MSCK REPAIR TABLE pre-hooks in models #459

Open
sacundim opened this issue Mar 17, 2023 · 5 comments
Labels
pkg:dbt-athena Issue affects dbt-athena type:bug Something isn't working as documented

Comments

@sacundim
Copy link

I have a project that makes use of pre_hook in a handful of models to run MSCK REPAIR TABLE on source tables. Example such model:

Just the other day I rebuilt my Docker image for this DBT project for the first time after several weeks, but the project started failing with errors like this (fuller stack trace at the bottom of the ticket):

An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:1: mismatched input 'MSCK'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMENT', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DENY', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'MERGE', 'PREPARE', 'REFRESH', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'TRUNCATE', 'UNLOAD', 'UPDATE', 'USE', 'USING', <query>

I have managed to determine that pinning dbt-athena-community at version 1.4.0 doesn't experience the problem, it happens when I use v1.4.1 instead:

Looking at dbt.log, I see that with adapter v1.4.1, the schema and table name are quoted with double quotes:

18:57:41.706289 [debug] [Thread-4 (]: On model.covid_19_puerto_rico_etl.casos: 
        MSCK REPAIR TABLE "covid19datos_v2_sources"."casos_parquet_v1"

And running that statement in the AWS Athena console gets the same error "mismatched input 'MSCK'" that I show above. Athena uses the Hive-style quoting in MSCK REPAIR TABLE statements, like this one that works:

MSCK REPAIR TABLE `covid19datos_v2_sources`.`casos_parquet_v1`

Full stack trace:

Failed to execute query.
--
Traceback (most recent call last):
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/pyathena/common.py", line 520, in _execute
query_id = retry_api_call(
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/pyathena/util.py", line 68, in retry_api_call
return retry(func, *args, **kwargs)
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter
return fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/root/.local/pipx/venvs/dbt-core/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:1: mismatched input 'MSCK'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMENT', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DENY', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'MERGE', 'PREPARE', 'REFRESH', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'TRUNCATE', 'UNLOAD', 'UPDATE', 'USE', 'USING', <query>
@sacundim
Copy link
Author

Workaround: explicitly use render_hive() in the pre-hooks:

@sacundim
Copy link
Author

An idea that doesn't solve the fundamental problem but may mitigate most of the instances people will encounter (and get mightily confused by): instead of automatically double quoting all identifiers, maybe check them first against a regexp that accepts only valid identifiers and quote them only if they don't match?

@nicor88
Copy link
Contributor

nicor88 commented Mar 27, 2023

Is it really an issue? as you find out using render_hive() is the key and needs only to be documented IMHO, the same is valid for other commands like vacuum and optimize for iceberg. I would avoid to introduce any regex that replace quoting just for specific scenarions.

@sacundim
Copy link
Author

@nicor88 The problem with "just document it" is that it's a surprising behavior with an obscure cause, which is likely to cause significant grief to unsuspecting users and, as a result, lead to repeated inquiries and bug reports to library maintainers from people who maybe even read the documentation but didn't remember that one sentence.

It's basically XKCD's toaster that stabs users in the face.

@nicor88
Copy link
Contributor

nicor88 commented Mar 27, 2023

@sacundim Well, we could make it clear for users that there are breaking changes in releases notes and adding know issues in the README. Don't people read docs when working with a tool/framework? If they don't they should ;) there are some tools that needs instructions and tools that don't :P - Therefore again: adding a note in Known issues will make the job IMHO.

Also, feel free to propose your solution if you want, contributions are more than welcome:)

@nicor88 nicor88 added the type:bug Something isn't working as documented label Apr 5, 2023
@mikealfare mikealfare added the pkg:dbt-athena Issue affects dbt-athena label Jan 10, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-athena Jan 13, 2025
mikealfare pushed a commit that referenced this issue Jan 13, 2025
* Bump pyodbc from 4.0.32 to 4.0.34

Bumps [pyodbc](https://github.com/mkleehammer/pyodbc) from 4.0.32 to 4.0.34.
- [Release notes](https://github.com/mkleehammer/pyodbc/releases)
- [Commits](mkleehammer/pyodbc@4.0.32...4.0.34)

---
updated-dependencies:
- dependency-name: pyodbc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Add automated changelog yaml from template for bot PR

* Remove newline

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Github Build Bot <[email protected]>
Co-authored-by: leahwicz <[email protected]>
mikealfare pushed a commit that referenced this issue Jan 20, 2025
* convert to reusable nightly tests

* fix triage label workflow

* point to main
mikealfare pushed a commit that referenced this issue Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-athena Issue affects dbt-athena type:bug Something isn't working as documented
Projects
None yet
Development

No branches or pull requests

3 participants