Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preview for iceberg partitioned tables(using trino) does not work. #26449

Closed
amir-bashir opened this issue Jan 10, 2024 · 3 comments
Closed

Comments

@amir-bashir
Copy link

A clear and concise description of what the bug is.

In SQL LAB, previewing iceberg partitioned tables using trino connector is failing.
1 - Superset is reading partition data to show it below the table list.
2 - Then it is showing all columns and data types.
3 - In the last step it is executing trino query to fetch 100 rows for preview.

But this step is failing in my case. It is appending record_count, file_count, total_size and data fields from partition file and appending these four columns as where clause in trino query. Since these fields are not part of the table, trino throws error as shown in the picture below.

How to reproduce the bug

  1. Create an iceberg partitioned table in trino
  2. Open SQL Lab
  3. Select catalog, schema and table from the drop downs.
  4. You will see an error "trino error: line 5:7: Column 'record_count' cannot be resolved"

Expected results

The preview should run properly and display the data preview

Actual results

The preview fails with following error.

image

Environment

(please complete the following information):

  • superset version: 3.0.0
  • python version: 3.9
  • trino version: 428

Checklist

Superset logs are:

Triggering query_id: 44
2024-01-10 13:13:55,636:INFO:superset.sqllab.commands.execute:Triggering query_id: 44
Query 44: Executing 1 statement(s)
2024-01-10 13:13:55,669:INFO:superset.sql_lab:Query 44: Executing 1 statement(s)
Query 44: Set query to 'running'
2024-01-10 13:13:55,669:INFO:superset.sql_lab:Query 44: Set query to 'running'
2024-01-10 13:13:55,752:DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): trino-dev2.digixt.ae:443
2024-01-10 13:13:55,939:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "POST /v1/statement HTTP/1.1" 200 328
2024-01-10 13:13:55,948:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/queued/20240110_131355_04768_yj9fy/y61c87885b3716c3aeb602e582689e4bda331199b/1 HTTP/1.1" 200 328
2024-01-10 13:13:55,956:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/queued/20240110_131355_04768_yj9fy/y845f4804062c99e367543efad25e818547c46f3b/2 HTTP/1.1" 200 337
2024-01-10 13:13:55,964:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/executing/20240110_131355_04768_yj9fy/ya39456ccf05d76175d3ad158e116c2dc82d675ab/0 HTTP/1.1" 200 535
2024-01-10 13:13:56,042:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/executing/20240110_131355_04768_yj9fy/ybca09ab00285e284f9ac19efa111537387f9f501/1 HTTP/1.1" 200 447
Query 44: Running statement 1 out of 1
2024-01-10 13:13:56,045:INFO:superset.sql_lab:Query 44: Running statement 1 out of 1
2024-01-10 13:13:56,165:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "POST /v1/statement HTTP/1.1" 200 327
2024-01-10 13:13:56,243:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/queued/20240110_131356_04769_yj9fy/y18debc4fc194e74fbff9c77d6936c3c0ef40b10f/1 HTTP/1.1" 200 327
2024-01-10 13:13:56,263:DEBUG:urllib3.connectionpool:https://trino-dev2.digixt.ae:443 "GET /v1/statement/queued/20240110_131356_04769_yj9fy/ybd14d8bd60657510e8663226e660c6f4aa3223b7/2 HTTP/1.1" 200 1242
SupersetErrorsException
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
return f(self, *args, **kwargs)
File "/app/superset/views/base_api.py", line 127, in wraps
raise ex
File "/app/superset/views/base_api.py", line 121, in wraps
duration, response = time_function(f, self, *args, **kwargs)
File "/app/superset/utils/core.py", line 1526, in time_function
response = func(*args, **kwargs)
File "/app/superset/views/base_api.py", line 93, in wraps
return f(self, *args, **kwargs)
File "/app/superset/utils/log.py", line 255, in wrapper
value = f(*args, **kwargs)
File "/app/superset/sqllab/api.py", line 310, in execute_sql_query
command_result: CommandResult = command.run()
File "/app/superset/sqllab/commands/execute.py", line 121, in run
raise ex
File "/app/superset/sqllab/commands/execute.py", line 103, in run
status = self._run_sql_json_exec_from_scratch()
File "/app/superset/sqllab/commands/execute.py", line 161, in _run_sql_json_exec_from_scratch
raise ex
File "/app/superset/sqllab/commands/execute.py", line 156, in _run_sql_json_exec_from_scratch
return self._sql_json_executor.execute(
File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
raise SupersetErrorsException(
superset.exceptions.SupersetErrorsException: [SupersetError(message="trino error: line 5:7: Column 'record_count' cannot be resolved", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Trino', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
2024-01-10 13:13:56,778:WARNING:superset.views.base:SupersetErrorsException
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
return f(self, *args, **kwargs)
File "/app/superset/views/base_api.py", line 127, in wraps
raise ex
File "/app/superset/views/base_api.py", line 121, in wraps
duration, response = time_function(f, self, *args, **kwargs)
File "/app/superset/utils/core.py", line 1526, in time_function
response = func(*args, **kwargs)
File "/app/superset/views/base_api.py", line 93, in wraps
return f(self, *args, **kwargs)
File "/app/superset/utils/log.py", line 255, in wrapper
value = f(*args, **kwargs)
File "/app/superset/sqllab/api.py", line 310, in execute_sql_query
command_result: CommandResult = command.run()
File "/app/superset/sqllab/commands/execute.py", line 121, in run
raise ex
File "/app/superset/sqllab/commands/execute.py", line 103, in run
status = self._run_sql_json_exec_from_scratch()
File "/app/superset/sqllab/commands/execute.py", line 161, in _run_sql_json_exec_from_scratch
raise ex
File "/app/superset/sqllab/commands/execute.py", line 156, in _run_sql_json_exec_from_scratch
return self._sql_json_executor.execute(
File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
raise SupersetErrorsException(
superset.exceptions.SupersetErrorsException: [SupersetError(message="trino error: line 5:7: Column 'record_count' cannot be resolved", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Trino', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]

Additional context

On left side under table name i.e. schools, superset is showing latest partition data. Then it is using this information to create a select query which I have copied from copy button and pasted in the query pad.

@jkleinkauff
Copy link

Same here. Seems the same as #25307
I see both errors happening, "partition cannot be resolved" and "column record_count cannot be resolved"

@anandnalya
Copy link

anandnalya commented Sep 4, 2024

I was able to get this working with the following patch which disables partitioning support for Iceberg:

--- a/site-packages/superset/db_engine_specs/trino.py
--- b/site-packages/superset/db_engine_specs/trino.py
@@ -445,6 +445,13 @@
         :returns: The indexes
         """
         try:
-            return super().get_indexes(database, inspector, table_name, schema)
+            indexes = super().get_indexes(database, inspector, table_name, schema)
+            # Handle iceberg tables. Even for non-partitioned tables, it returns a value
+            iceberg_cols_ignore = {"record_count", "file_count", "total_size", "data"}
+            if len(indexes) == 1 and indexes[0].get(
+                "name") == "partition" and iceberg_cols_ignore.issubset(
+                set(indexes[0].get("column_names", []))):
+                return []
+            return indexes
         except NoSuchTableError:
             return []

@rusackas rusackas added the data:connect:trino Related to Trino label Sep 4, 2024
@rusackas
Copy link
Member

rusackas commented Sep 4, 2024

Closing this in favor of #25307... but if you think the above diff fixes the issue, maybe it can be generalized a bit to (safely) address the issue on Iceberg and/or other data sources having similar issues?

@rusackas rusackas closed this as completed Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants