Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] spark__list_relations_without_caching expects legacy schema field #1048

Open
2 tasks done
JCZuurmond opened this issue Jun 2, 2024 · 2 comments
Open
2 tasks done
Labels
bug Something isn't working help_wanted Extra attention is needed

Comments

@JCZuurmond
Copy link
Collaborator

JCZuurmond commented Jun 2, 2024

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

spark__list_relations_without_caching expects legacy fieldrelation.schema

{% macro spark__list_relations_without_caching(relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation.schema }} like '*'
  {% endcall %}

  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

Expected Behavior

spark__list_relations_without_caching expects relation

{% macro spark__list_relations_without_caching(relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    show table extended in {{ relation }} like '*'
  {% endcall %}

  {% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}

Steps To Reproduce

N.A.

Relevant log output

No response

Environment

Irrelevant

Additional Context

See Spark SQL migration guide

@JCZuurmond JCZuurmond added bug Something isn't working triage labels Jun 2, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 3, 2024

Hey @JCZuurmond, good to hear from you!

Here's my understanding of the situation:

  • For consistency across adapters, dbt calls the third-level namespace database, the second-level namespace schema, and the first-level name identifier(also configurable asalias`)
  • Historically, SparkSQL had no third-level namespace, and it used the words schema and database interchangeably for the second-level namespace
  • In Spark 3.2, the official names for these became catalog (third-level) and namespace (second-level)

I think the right next step is to support catalog and namespace as official aliases for database and schema, respectively.

  • There's a mechanism to do that within credentials by defining _ALIASES, as dbt-databricks does here
  • We could also define catalog and namespace classmethods on SparkRelation that return database and schema, respectively

Is that something you'd be interested in contributing?

@jtcohen6 jtcohen6 added help_wanted Extra attention is needed and removed triage labels Jun 3, 2024
@stegus64
Copy link

stegus64 commented Sep 2, 2024

This issue is the root cause of this problem: dbt-labs/spark-utils#38

This code does not work any more:

https://github.com/dbt-labs/spark-utils/blob/f792c519e68b64e3411508bfa5f41a02e8646372/macros/maintenance_operation.sql#L4

{% for database in spark__list_schemas('not_used') %}
{% for table in spark__list_relations_without_caching(database[0]) %}

The value returned by spark__list_schemas() is the result of SHOW DATABASES which only contains one single column named "databaseName"

This means that relation.schema in spark__list_relations_without_caching returns an empty string which means that

show table extended in {{ relation.schema }} like '*'

causes a syntax error in SQL.

I am not sure why .schema was added in this commit #972. For my purpose just changing "relation.schema" to "relation" fixes the issue.

I do not know what other problems such a change might cause.

It seems that #972 is a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants