Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter-box charts broken with 0.31 with Druid datasources #7074

Closed
3 tasks done
elukey opened this issue Mar 20, 2019 · 10 comments
Closed
3 tasks done

Filter-box charts broken with 0.31 with Druid datasources #7074

elukey opened this issue Mar 20, 2019 · 10 comments
Labels

Comments

@elukey
Copy link
Contributor

elukey commented Mar 20, 2019

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Superset version

0.31.0rc18

Filter-box charts with Druid datasources have been working fine up to 0.26/0.28rc7, but when I have upgraded to 0.31 I always get the following stacktrace:

Mar 20 18:31:22 analytics-tool1004 superset[6854]: 2019-03-20 18:31:22,456:ERROR:root:list index out of range
Mar 20 18:31:22 analytics-tool1004 superset[6854]: Traceback (most recent call last):
Mar 20 18:31:22 analytics-tool1004 superset[6854]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 410, in get_df_payload
Mar 20 18:31:22 analytics-tool1004 superset[6854]:     df = self.get_df(query_obj)
Mar 20 18:31:22 analytics-tool1004 superset[6854]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 213, in get_df
Mar 20 18:31:22 analytics-tool1004 superset[6854]:     self.results = self.datasource.query(query_obj)
Mar 20 18:31:22 analytics-tool1004 superset[6854]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 1287, in query
Mar 20 18:31:22 analytics-tool1004 superset[6854]:     client=client, query_obj=query_obj, phase=2)
Mar 20 18:31:22 analytics-tool1004 superset[6854]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 883, in get_query_str
Mar 20 18:31:22 analytics-tool1004 superset[6854]:     return self.run_query(client=client, phase=phase, **query_obj)
Mar 20 18:31:22 analytics-tool1004 superset[6854]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 1150, in run_query
Mar 20 18:31:22 analytics-tool1004 superset[6854]:     order_by = list(qry['aggregations'].keys())[0]
Mar 20 18:31:22 analytics-tool1004 superset[6854]: IndexError: list index out of range

If I inspect qry I can see that 'aggregations' is a OrderedDict() with no elements in it (that explains the list index out of range). I tried to check changes for the python file that could be relevant but didn't find any :(

@elukey elukey changed the title Filter-box charts broken with 0.29/0.31 with Druid datasources Filter-box charts broken with 0.31 with Druid datasources Mar 20, 2019
@mistercrunch
Copy link
Member

Can you try to cherry-pick #7066 (SHA is b210742ad24d01ca05bc58ca3342c90e301fe073) and confirm it fixes things? I don't have access to a Druid cluster to test...

BTW we now recommend using pydruid's SQLAlchemy to connect to Druid. We'll likely deprecate the whole Druid connector in favor of the SQLA Druid connector in the next few months.

@elukey
Copy link
Contributor Author

elukey commented Mar 21, 2019

@mistercrunch I cherry picked the patch on top of the release--0.31 branch and now I get this:

Mar 21 07:43:10 analytics-tool1004 superset[27042]: 2019-03-21 07:43:10,755:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): druid1003.eqiad.wmnet
Mar 21 07:43:10 analytics-tool1004 superset[27042]: 2019-03-21 07:43:10,759:DEBUG:urllib3.connectionpool:http://druid1003.eqiad.wmnet:8082 "GET /status HTTP/1.1" 200 None
Mar 21 07:43:10 analytics-tool1004 superset[27042]: 2019-03-21 07:43:10,760:INFO:root:Running two-phase topn query for dimension [project]
Mar 21 07:43:10 analytics-tool1004 superset[27042]: 2019-03-21 07:43:10,774:ERROR:root:HTTP Error 500: Internal Server Error
Mar 21 07:43:10 analytics-tool1004 superset[27042]:  Druid Error: Internal Server Error
Mar 21 07:43:10 analytics-tool1004 superset[27042]:  Query is: {
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "aggregations": [],
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "dataSource": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "dimension": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "filter": {
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "dimension": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "type": "selector",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "value": "redacted"
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     },
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "granularity": "all",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "intervals": "2019-03-19T00:00:00+00:00/2019-03-20T00:00:00+00:00",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "metric": null,
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "postAggregations": [],
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "queryType": "topN",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "threshold": 1000
Mar 21 07:43:10 analytics-tool1004 superset[27042]: }
Mar 21 07:43:10 analytics-tool1004 superset[27042]: Traceback (most recent call last):
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pydruid/client.py", line 488, in _post
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     res = urllib.request.urlopen(req)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     return opener.open(url, data, timeout)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 531, in open
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     response = meth(req, response)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     'http', request, response, code, msg, hdrs)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 569, in error
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     return self._call_chain(*args)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     result = func(*args)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     raise HTTPError(req.full_url, code, msg, hdrs, fp)
Mar 21 07:43:10 analytics-tool1004 superset[27042]: urllib.error.HTTPError: HTTP Error 500: Internal Server Error
Mar 21 07:43:10 analytics-tool1004 superset[27042]: During handling of the above exception, another exception occurred:
Mar 21 07:43:10 analytics-tool1004 superset[27042]: Traceback (most recent call last):
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 410, in get_df_payload
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     df = self.get_df(query_obj)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 213, in get_df
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     self.results = self.datasource.query(query_obj)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 1286, in query
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     client=client, query_obj=query_obj, phase=2)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 883, in get_query_str
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     return self.run_query(client=client, phase=phase, **query_obj)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/connectors/druid/models.py", line 1158, in run_query
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     client.topn(**pre_qry)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pydruid/client.py", line 123, in topn
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     return self._post(query)
Mar 21 07:43:10 analytics-tool1004 superset[27042]:   File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pydruid/client.py", line 508, in _post
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     separators=(',', ': '))))
Mar 21 07:43:10 analytics-tool1004 superset[27042]: OSError: HTTP Error 500: Internal Server Error
Mar 21 07:43:10 analytics-tool1004 superset[27042]:  Druid Error: Internal Server Error
Mar 21 07:43:10 analytics-tool1004 superset[27042]:  Query is: {
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "aggregations": [],
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "dataSource": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "dimension": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "filter": {
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "dimension": "redacted",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "type": "selector",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:         "value": "redacted"
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     },
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "granularity": "all",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "intervals": "2019-03-19T00:00:00+00:00/2019-03-20T00:00:00+00:00",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "metric": null,
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "postAggregations": [],
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "queryType": "topN",
Mar 21 07:43:10 analytics-tool1004 superset[27042]:     "threshold": 1000
Mar 21 07:43:10 analytics-tool1004 superset[27042]: }

I have redacted some details of the Druid query that I thought were not needed, but if something is missing I'll provide it. It seems now that Druid returns 500, and that the aggregations parameter is still empty (while we have configured metrics in the Druid datasources panel).

I am going to check how to switch from the Druid connector to pydruid's sql alchemy, is it something present in the docs?

@elukey
Copy link
Contributor Author

elukey commented Mar 21, 2019

Ah no sorry I was wrong, I checked the Druid logs and this is the issue:

2019-03-21T07:43:10,765 ERROR io.druid.server.QueryResource: Exception handling request: {class=io.druid.server.QueryResource, exceptionType=class com.fasterxml.jackson.databind.JsonMappin
gException, exceptionMessage=Instantiation of [simple type, class io.druid.query.topn.TopNQuery] value failed: must specify a metric, exception=com.fasterxml.jackson.databind.JsonMappingEx
ception: Instantiation of [simple type, class io.druid.query.topn.TopNQuery] value failed: must specify a metric, query=unparseable query, peer=2620:0:861:106:10:64:36:116}
com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.query.topn.TopNQuery] value failed: must specify a metric

The metric parameter seems indeed null in the Druid call..

@elukey
Copy link
Contributor Author

elukey commented Mar 21, 2019

I have re-imported the db, ran again superset db upgrade, and now I can see the issue fixed!

Would it be possible to get it backported to 0.31 before the release?

@elukey
Copy link
Contributor Author

elukey commented Mar 21, 2019

Another regression that I have noticed in Filter Box is that now (in 0.31.0) instead of returning "No data", like it happens on 0.26, I get the following:

Traceback (most recent call last):
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/base.py", line 114, in wraps
    return f(self, *args, **kwargs)
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/core.py", line 1229, in explore_json
    samples=samples,
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/views/core.py", line 1160, in generate_json
    payload = viz_obj.get_payload()
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 375, in get_payload
    payload['data'] = self.get_data(df)
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 1821, in get_data
    df = df.sort_values(metric, ascending=flt.get('asc'))
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4421, in sort_values
    stacklevel=stacklevel)
  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 1382, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'count'

Count in this case is the sort metric of the first filter added.

EDIT: might be related to #7019 ?

@mistercrunch
Copy link
Member

That won't be easy to debug without being able to reproduce. The solution would be around trying to understand why the metric column is missing from the dataframe.

  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 1821, in get_data
    df = df.sort_values(metric, ascending=flt.get('asc'))

@elukey
Copy link
Contributor Author

elukey commented Mar 22, 2019

@mistercrunch thanks a lot for following up. In this case df is empty (metric is populated), and it is a legitimate case since the filter box is not configured correctly. The main issue that I was seeing is that rather than showing an empty white result in the filter box or something more meaningful like "no data returned" (as it happens on 0.26), it errors out with a KeyError exception, that is really cryptic to parse for Superset users.

@elukey
Copy link
Contributor Author

elukey commented Apr 1, 2019

Any comment? :)

@elukey
Copy link
Contributor Author

elukey commented May 2, 2019

Closing since the main issue has been resolved, the other one is very minor and probably can be skipped for the moment (namely it doesn't really impair the usage of Superset)

@elukey elukey closed this as completed May 2, 2019
@WChCh
Copy link

WChCh commented Aug 28, 2019

That won't be easy to debug without being able to reproduce. The solution would be around trying to understand why the metric column is missing from the dataframe.

  File "/srv/deployment/analytics/superset/venv/lib/python3.7/site-packages/superset/viz.py", line 1821, in get_data
    df = df.sort_values(metric, ascending=flt.get('asc'))

@mistercrunch hi mistercrunch , i got the same problem when have no any data in my table. i hope that it can be fixed. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants