Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Number with Trendline coudn't work with PostgreSQL's date field #20155

Closed
3 tasks done
ensky opened this issue May 23, 2022 · 18 comments · Fixed by #20273
Closed
3 tasks done

Big Number with Trendline coudn't work with PostgreSQL's date field #20155

ensky opened this issue May 23, 2022 · 18 comments · Fixed by #20273
Labels
#bug Bug report data:connect:postgres Related to Postgres data:connect Namespace | Anything related to db connections / integrations preset:cares Preset cares about this issue v1.5

Comments

@ensky
Copy link
Contributor

ensky commented May 23, 2022

Big Number with Trendline coudn't work with PostgreSQL's date field.

How to reproduce the bug

  1. Create a table with a column which type is "date" in postgresql
  2. Add that database and table as dataset to Superset
  3. Try to create a Big Number with Trendline chart with that dataset
  4. It shows
Unexpected error
Error: '+08'

which +08 is the timezone about your pgsql settings

Expected results

The chart should be draw successfully

Actual results

Unexpected error
Error: '+08'

Screenshots

image

Additional Error Log

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/security/decorators.py", line 190, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/utils/log.py", line 245, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/views/core.py", line 2351, in results
    return self.results_exec(key)
  File "/app/superset/views/core.py", line 2419, in results_exec
    obj = _deserialize_results_payload(
  File "/app/superset/views/utils.py", line 625, in _deserialize_results_payload
    df = result_set.SupersetResultSet.convert_table_to_df(pa_table)
  File "/app/superset/result_set.py", line 177, in convert_table_to_df
    return table.to_pandas(integer_object_nulls=True)
  File "pyarrow/array.pxi", line 757, in pyarrow.lib._PandasConvertible.to_pandas
  File "pyarrow/table.pxi", line 1748, in pyarrow.lib.Table._to_pandas
  File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 789, in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
  File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in _table_to_blocks
    return [_reconstruct_block(item, columns, extension_columns)
  File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in <listcomp>
    return [_reconstruct_block(item, columns, extension_columns)
  File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 733, in _reconstruct_block
    dtype = make_datetimetz(item['timezone'])
  File "/usr/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 758, in make_datetimetz
    tz = pa.lib.string_to_tzinfo(tz)
  File "pyarrow/types.pxi", line 1927, in pyarrow.lib.string_to_tzinfo
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "/usr/local/lib/python3.8/site-packages/pytz/__init__.py", line 188, in timezone
    raise UnknownTimeZoneError(zone)
pytz.exceptions.UnknownTimeZoneError: '+08'
2022-05-23 03:27:32,051:ERROR:superset.views.base:'+08'

Environment

  • browser type and version: any browser
  • superset version: 1.5.0
  • python version: 3.8.12
  • any feature flags active:
    • VERSIONED_EXPORT
    • ENABLE_TEMPLATE_PROCESSING

Checklist

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

  • This issue doesn't exists in Superset-v1.4.0

The SQL in the chart is

SELECT DATE_TRUNC('day', publish_date) AS __timestamp,
       count(*) AS count
FROM
  (SELECT publish_date
   FROM software.package_file) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;

The error part seems to be the DATE_TRUNC function, it returns something like "2022-05-23 12:00:00+08". Which format seems to be errorly processed by pyarrow project.

Accourding to pyarrow project, the timezone format seems to be like +08:00 format, maybe it's the root cuase? https://github.com/apache/arrow/blob/apache-arrow-5.0.0/python/pyarrow/types.pxi#L1915

@ensky ensky added the #bug Bug report label May 23, 2022
@villebro
Copy link
Member

@ensky what is the datatype of publish_date in the software.package_file table?

@villebro villebro added data:connect:postgres Related to Postgres data:connect Namespace | Anything related to db connections / integrations v1.5 labels May 23, 2022
@ensky
Copy link
Contributor Author

ensky commented May 23, 2022

@villebro It's date type

SELECT 
   table_name, 
   column_name, 
   data_type 
FROM 
   information_schema.columns
WHERE 
   table_name = 'package_file' and column_name = 'publish_date';

image

@villebro
Copy link
Member

I'm unable to repro on master:

image

I unfortunately don't have an instance of 1.5.0 running right now, but I can start one if needed.

Are you using a cache to store results? And can you check if you're able to view samples for the dataset?

@ensky
Copy link
Contributor Author

ensky commented May 23, 2022

@villebro I can reproduce it in SQL Lab with your reproduce step with query

SELECT DATE_TRUNC('day', publish_date),
       count(*) AS count
FROM
  (SELECT '2021-01-01'::date as publish_date) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;

image

pgsql version
PostgreSQL 12.7 (Debian 12.7-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

image

@rusackas rusackas added the preset:cares Preset cares about this issue label May 25, 2022
@kamalkeshavani-aiinside
Copy link
Contributor

kamalkeshavani-aiinside commented Jun 1, 2022

I am facing same issue with all Time-series charts(echarts and legacy both). The issue seems to be handling of Date datatype in Postgres, casting such column to Timestamp solves the issue.

@uwine-fr
Copy link

uwine-fr commented Jun 1, 2022

It looks like an issue that popped somewhere else, and that was fixed by upgrading psycopg2 to 2.9.1 in requirements.txt

@michael-s-molina
Copy link
Member

@ensky I'm not able to reproduce the error with the query you provided:

SELECT DATE_TRUNC('day', publish_date),
       count(*) AS count
FROM
  (SELECT '2021-01-01'::date as publish_date) AS virtual_table
GROUP BY DATE_TRUNC('day', publish_date)
LIMIT 50000;

Executed tests:

  • master on Postgres 12
  • master on Postgres 12.11
  • 1.5.1rc1 on Postgres 12
  • 1.5.1rc1 on Postgres 12.11

@kamalkeshavani-aiinside
Copy link
Contributor

I am facing same issue with all Time-series charts(echarts and legacy both). The issue seems to be handling of Date datatype in Postgres, casting such column to Timestamp solves the issue.

Seems like issue is something else. Local docker setup doesn't show issue, but kubernetes setup shows issue. I have older version of postgres in k8s, so that may be the issue.

@ensky
Copy link
Contributor Author

ensky commented Jun 2, 2022

but kubernetes setup shows issue

True. Our environment is located on the k8s indeed.

@michael-s-molina
Copy link
Member

What's the Postgres version in k8s?

@ensky
Copy link
Contributor Author

ensky commented Jun 2, 2022

@michael-s-molina As mentioned above, it's 12.7
#20155 (comment)

@michael-s-molina
Copy link
Member

michael-s-molina commented Jun 2, 2022

I misunderstood 😄 , I thought that when @kamalkeshavani-aiinside commented:

I have older version of postgres in k8s

He was already considering the previous comment and that we tested on versions 12 and 12.11.

@fmos
Copy link

fmos commented Jun 2, 2022

I am facing same issue with all Time-series charts

Me too. For me, this issue was introduced with the upgrade to 1.5.0. Unfortunately, a downgrade does not work ("Databases" is empty after downgrade).

Maybe the FutureWarning at the top of my (redacted) stack trace contains a valuable hint for one of you:

"Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use pd.Series(values).dt.tz_localize(None) instead."

Jun 02 15:09:25 demeter gunicorn[1697193]: /srv/superset/local/lib/python3.8/site-packages/superset/common/query_context_processor.py:186: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.
Jun 02 15:09:25 demeter gunicorn[1697193]:   result = query_context.datasource.query(query_object.to_dict())
Jun 02 15:09:25 demeter gunicorn[1697193]: Query SELECT DATE_TRUNC('year', entry_date) AS __timestamp,
Jun 02 15:09:25 demeter gunicorn[1697193]:        sum(our_fees) AS "SUM(our_fees)"
Jun 02 15:09:25 demeter gunicorn[1697193]: FROM
Jun 02 15:09:25 demeter gunicorn[1697193]:   (SELECT i.date AS invoice_date,
Jun 02 15:09:25 demeter gunicorn[1697193]:           e.date AS entry_date,
...
Jun 02 15:09:25 demeter gunicorn[1697193]: WHERE entry_date >= TO_TIMESTAMP('2012-01-01 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
Jun 02 15:09:25 demeter gunicorn[1697193]:   AND entry_date < TO_TIMESTAMP('2022-01-01 00:00:00.000000', 'YYYY-MM-DD HH24:MI:SS.US')
Jun 02 15:09:25 demeter gunicorn[1697193]:   AND number = 970
Jun 02 15:09:25 demeter gunicorn[1697193]:   AND subnumber = 1
Jun 02 15:09:25 demeter gunicorn[1697193]: GROUP BY DATE_TRUNC('year', entry_date)
Jun 02 15:09:25 demeter gunicorn[1697193]: ORDER BY "SUM(our_fees)" DESC
Jun 02 15:09:25 demeter gunicorn[1697193]: LIMIT 10000 on schema public failed
Jun 02 15:09:25 demeter gunicorn[1697193]: Traceback (most recent call last):
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/superset/connectors/sqla/models.py", line 1719, in query
Jun 02 15:09:25 demeter gunicorn[1697193]:     df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/superset/models/core.py", line 453, in get_df
Jun 02 15:09:25 demeter gunicorn[1697193]:     df = result_set.to_pandas_df()
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/superset/result_set.py", line 202, in to_pandas_df
Jun 02 15:09:25 demeter gunicorn[1697193]:     return self.convert_table_to_df(self.table)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/superset/result_set.py", line 177, in convert_table_to_df
Jun 02 15:09:25 demeter gunicorn[1697193]:     return table.to_pandas(integer_object_nulls=True)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "pyarrow/array.pxi", line 757, in pyarrow.lib._PandasConvertible.to_pandas
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "pyarrow/table.pxi", line 1748, in pyarrow.lib.Table._to_pandas
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 789, in table_to_blockmanager
Jun 02 15:09:25 demeter gunicorn[1697193]:     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in _table_to_blocks
Jun 02 15:09:25 demeter gunicorn[1697193]:     return [_reconstruct_block(item, columns, extension_columns)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1130, in <listcomp>
Jun 02 15:09:25 demeter gunicorn[1697193]:     return [_reconstruct_block(item, columns, extension_columns)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 733, in _reconstruct_block                                                                         Jun 02 15:09:25 demeter gunicorn[1697193]:     dtype = make_datetimetz(item['timezone'])
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/srv/superset/local/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 758, in make_datetimetz
Jun 02 15:09:25 demeter gunicorn[1697193]:     tz = pa.lib.string_to_tzinfo(tz)
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "pyarrow/types.pxi", line 1927, in pyarrow.lib.string_to_tzinfo                                                                                                                     Jun 02 15:09:25 demeter gunicorn[1697193]:   File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
Jun 02 15:09:25 demeter gunicorn[1697193]:   File "/usr/lib/python3/dist-packages/pytz/__init__.py", line 181, in timezone
Jun 02 15:09:25 demeter gunicorn[1697193]:     raise UnknownTimeZoneError(zone)
Jun 02 15:09:25 demeter gunicorn[1697193]: pytz.exceptions.UnknownTimeZoneError: '+01' 
Jun 02 15:09:25 demeter gunicorn[1697193]: 192.168.0.121 - - [02/Jun/2022:15:09:25 +0200] "POST /api/v1/chart/data?form_data=%7B%22slice_id%22%3A211%7D&dashboard_id=13&force HTTP/1.0" 400 27 "https://[...]" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
Jun 02 15:09:25 demeter gunicorn[1697193]: [2022-06-02 15:09:25 +0200] [1697193] [DEBUG] Closing connection.

fmos added a commit to fmos/superset that referenced this issue Jun 2, 2022
As noted by @ensky in apache#20155 the format returned by Postgres' DATE_TRUNC is not compatible (any more?). This is a quick fix that works for me.
@ensky
Copy link
Contributor Author

ensky commented Jun 5, 2022

It looks like an issue that popped somewhere else, and that was fixed by upgrading psycopg2 to 2.9.1 in requirements.txt

I can confirm that with psycopg2=2.9.1 the problem is solved.

So it seems to be the problem about k8s setup example located in https://superset.apache.org/docs/installation/running-on-kubernetes , and which instructed to pip install psycopg2==2.8.5 in dependency section.

Anyone may change it to pip install psycopg2==2.9.1 then this problem is solved.

ensky added a commit to ensky/superset that referenced this issue Jun 5, 2022
@michael-s-molina
Copy link
Member

So it seems to be the problem about k8s setup example located in https://superset.apache.org/docs/installation/running-on-kubernetes , and which instructed to pip install psycopg2==2.8.5 in dependency section.

Thanks for reporting this @ensky and also for submitting a PR to update the guide 🙂

@fmos
Copy link

fmos commented Jul 4, 2022

I can confirm that with psycopg2=2.9.1 the problem is solved.

Anyone may change it to pip install psycopg2==2.9.1 then this problem is solved.

I cannot confirm this resolution. Upgrading superset to 1.5.1 via pip. The problem persists for me. pip even upgrades to psycopg2==2.9.3. Also, manually downgrading to 2.9.1 does not resolve the issue. The patch in 0b49ca6 still works.

michael-s-molina pushed a commit that referenced this issue Aug 30, 2022
(cherry picked from commit 77e326f)
@kenho811
Copy link

This is really useful.

Setting psycopg2==2.9.3 works for me (Using superset 2.0.0 Docker Image)

@fmos
Copy link

fmos commented Sep 21, 2022

Perhaps this is an issue with my PostgreSQL version (14). Anyway, c61ac76 is an updated patch to fix the timezone issue if (like me) the psycopg2 upgrade does not resolve it for you. This is better than my previous patch, because it is also compatible with using Prophet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report data:connect:postgres Related to Postgres data:connect Namespace | Anything related to db connections / integrations preset:cares Preset cares about this issue v1.5
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants