-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google BigQuery Support #945
Comments
@rlanda take a look at druid support if you want an example on how this could be done |
Someone has any news about that? |
Yeah we're looking at this as well. |
Hi guys, any news on this? Thanks. |
Take a look at this thread. Some progress has been made on getting sqlalchemy to work with big query, but seems like it's not ready yet. There are also these projects: |
There's 2 approaches here, either going through SqlAlchemy (which may or may not be possible, depending on how messed up the GSQL dialect is), the thing is we only need for a subset of the dialect to work (no need for DML joins, ...). The other option is to create a brand new Supserset connector for it. I've done some work to make the A solid Python programmer familiar with Flask/Django should be able to implement this a matter of days, perhaps a week or two. I'd be open to mentor someone through the process. I'm sure we could host the engineer at Airbnb for a week while working on this task. Who's up for the challenge!? |
I would be open to working on this, if this has not been done. I'm familiar with Python and am using superset at my company internally to display various information from redshift (But would like to move us to BQ) |
@joshuacano I'd be happy helping you out while working on this. I see you are in SF and we love to host at Airbnb so we could host you for coding sessions or for a few days while you crank that out. I can also help improving and clarifying the connector interface while working on this so that it's more straightforward to add the next connectors. |
Wow it would be real honor to meet you @mistercrunch. I'd love to come over for a few coding sessions. Maybe I can get familiar with the code base over the next few days and ping you on gitter once I have an idea of where to get started. Does that sound good? |
@joshwalters @mistercrunch thats great, i tried to tackle this issue on my own, but i believe having a spark connector instead of a bigquery is a much neater solution, since you can connect spark to bigquery and much more, hadoop or elastic search. |
@Bamieh , what do you use to connect spark to big query? |
@mistercrunch I've started a bit of work on this, Not sure if you use Gitter, but I Wrote you some questions on there to just point me in the right direction. Thanks! |
[copied from gitter to keep the conversation here] |
So two main approaches:
The pyhive project does both these things on top of both Presto and Hive and can be a source or inspiration here.
This has been discussed here: |
So perhaps the interim solution for me is to go down the connector route, and then depending on the discussion in the issue referenced above, it should be fairly trivial to just turn off the bigquery connector section once that is done. However, it does seem like you might prefer option #1 (and it might be a more long term solution), so I'll defer to you on what direction I should head. |
I'd go with solution 1 starting by forking that repo and moving it forward to where it needs to be. This would become a largeish PR against that repo. |
BTW Druid added support for SQL and we'll be going down the same route eventually. |
Any updates on this feature? |
@vincentwongso I think this is currently blocked on my PR googleapis/google-cloud-python#2921 It's got a bit more work required before it can merge. Specifically, query parameters, |
If anyone needs this feature ASAP, you can try the interface that I've implemented to use BigQuery in Superset: https://github.com/mxmzdlv/pybigquery It is basic, but querying and query generation works fine. There is no support for Record type, so you'll have to flatten your schema using views. You will also need to include this snippet of code in superset_config.py for Superset's Time Grain to work. I haven't created a PR as I am not sure if it is a complete engine spec. Some issues that I've encountered while developing the interface:
So I had to use this hack to make it work. Not sure if there is a better way. |
thank you @mxmzdlv its working.. but I want to know about Time Grain.. How to activate it? |
@alamhanz You'll need to create superset_config.py and put this code in it. Make sure that you have wrapt module installed (pip install wrapt) and superset_config.py is in your PYTHONPATH. Otherwise you can just add BQEngineSpec class from the snippet to |
Thank you.. It is working when I put the BQEngineSpec Script on superset/superset/db_engine_specs.py |
But, I can't use SQL LAB. |
@alamhanz I think this is because your Celery is not configured. Try disabling |
@mxmzdlv What do you mean by "Celery is not configured"? Then how can I disabling Allow Run Async?? |
@alamhanz check out the installation docs around SQL Lab:
|
As contributed by @mxmzdlv on issue apache#945
@tswast @darylerwin @mistercrunch @mxmzdlv @Bamieh @Bamieh Idid not try using docker for running this. I have only heard about docker. Have never practically used it or worked with it. It would be great if you could point me to some resources on it so that I can learn the same. Now I can see tables and get their count and other things. Please refer below image for the same. But when I am trying to run the same query from sql lab I am not able to generate any result. It just keeps on running and timesout. The same is happening If I try to query from the default 'main' DB that gets installed for visualising World Bank Data and other dashboards. I get the following error
Currently I am creating dashboards using a manual process (Upload csv). Please help me resolve this error. Also is there a way to share the dashboard made on superset to others who dont have superset (even python) installed on them. Can I share it privately within the organisation I work in? |
@swamy16 seems to be an issue with calling signal(signal.SIGALRM) — SIGALRM is not available on Windows. https://github.com/apache/incubator-superset/blob/d533ce09676921ec2f72e1032bbca0d2a37f65b6/superset/utils.py#L508 I think the easiest way to handle this would be to run Superset in a Docker / Vagrant or just install Virtualbox with Linux on it and run Superset from there? |
@mxmzdlv thanks for the help. I shall try installing superset in virtualbox. also is it possible to share the dashboards I create on superset with others in my organisation who don't have superset or python installed? |
@swamy16 is that a typo in the alarm and perhaps related to this windows signal python issue? Basically it is timing out? Is the dataset too big? |
sorry it is a typo from my end. The actual error is SIGALRM as mentioned by @mxmzdlv Basically I am able to add tables to superset and am able to visualize them in dashboards (Be it tables from bigquery, or the default main db "world bank dataset" ) but I am not able to query them (even the main db tables from superset sql lab) As you can see the highlighted part in the above image "tsukuyomi" is a dataset in my bigquery and "2017_agency_mapping" is a table. I am able to create a slice view using tables from my bigquery.
But technically shudn't this error also happen when I am creating a slice. Since it is effectively running the same query. Please let em know if there is a way to solve this error in windows. |
I wouldn't recommend running on Windows since Superset isn't tested against it. You will hit problems like this one. Even if you'd fix this issue there could be regressions in the future since no one uses Windows to run Superset in production. |
* Chinese page * Using react-intl-universal to improve multi language in react page * Using react-intl-universal to improve multi language in react page * react_intl_universal * change * change * change * change * change * change * change * merge * multiple page in js * merge * merge * merge * merge * Js Translations * JS Translation * JS Translations * Js translation * JS translations * JS translations * Js translaion * JS en Translation * JS Translation * upgrade document Fixing the damn build (apache#3179) * Fixing the build * Going deeper [bugfix] only filterable columns should show up in FilterBox list (apache#3105) * [bugfix] only filterable columns should show up in FilterBox list * Touchups Datasource cannot be empty (apache#3035) add title description to model view (apache#3045) * add title description to model view * add missing import Add 'show/hide totals' option to pivot table vis (apache#3101) [bugfix] numeric value for date fields in table viz (apache#3036) Bug was present only when using the NOT GROUPED BY option fixes apache#3027 fix hive.fetch_logs (apache#2968) add Zalando to the list of organizations (apache#3171) docs: fixup installation examples code indentation (apache#3169) [bugfix] fix bar order (apache#3180) [bugfix] visualize flow error: 'Metric x is not valid' (apache#3181) The metric name in the frontend doesn't match the one generated on the backend. It turns out the explore view will default to the first metric so specifying one isn't needed. Fix the segment interval for pulling metadata (apache#3174) The end of the interval would be on the truncated today date, which means that you will exclude today. If your realtime ingestion job runs shorter than a day, the metadata cannot be pulled from the druid cluster. Bump cryptography to 1.9 (apache#3065) As 1.7.2 doesn't compile here with openssl 1.1.0f Escaping the user's SQL in the explore view (apache#3186) * Escaping the user's SQL in the explore view When executing SQL from SQL Lab, we use a lower level API to the database which doesn't require escaping the SQL. When going through the explore view, the stack chain leading to the same method may need escaping depending on how the DBAPI driver is written, and that is the case for Presto (and perhaps other drivers). * Using regex to avoid doubling doubles [sqllab] improve Hive support (apache#3187) * [sqllab] improve Hive support * Fix "Transport not open" bug * Getting progress bar to show * Bump pyhive to 0.4.0 * Getting [Track Job] button to show * Fix testzz Add BigQuery engine specifications (apache#3193) As contributed by @mxmzdlv on issue apache#945 [bugfix] fix merge conflict that broke Hive support (apache#3196) Adding 'apache' to docs (apache#3194) [druid] Allow custom druid postaggregators (apache#3146) * [druid] Allow custom druid postaggregators Also, fix the postaggregation for approxHistogram quantiles so it adds the dependent field and that can show up in the graphs/tables. In general, postAggregators add significant power, we should probably support including custom postAggregators. Plywood has standard postAggregators here, and a customAggregator escape hatch that allows you to define custom postAggregators. This commit adds a similar capability for Superset and a additional field/fields/fieldName breakdown of the typical naming for dependent aggregations, which should make it significantly easier to develop approxHistogram and custom postAggregation-required dashboards. * [druid] Minor style cleanup in tests file. * [druid] Apply code review suggestions * break out CustomPostAggregator into separate class. This just cleans up the creation of the postaggregator a little bit. * minor style issues. * move the function around so the git diff is more readable add combine config for metrics in pivot table (apache#3086) * add combine config for metrics in pivot table * change method to stack/unstack * update backendSync Autofocus search input in VizTypeControl modal onEnter (apache#2929) Speed up JS build time (apache#3203) Also bumping a few related libs JS Translation JS translations js translation fix issue 3204 (apache#3205) [bugfix] capture Hive job_id pre-url transformation (apache#3213) js translation fix issue 3204 (apache#3205) [bugfix] capture Hive job_id pre-url transformation (apache#3213) [docs] update url in CONTRIBUTING.md (apache#3212) [sqllab/cosmetics] add margin-top for labels in query history (apache#3222) [explore] nvd3 sort values in rich tooltip (apache#3197) [sqllab] fix UI shows 'The query returned no results' momentarily (apache#3214) this is visible when running async queries between the fetching and success state as the rows are getting cached in the component [explore] DatasourceControl to pick datasource in modal (apache#3210) * [explore] DatasourceControl to pick datasource in modal Makes it easier to change datasource, also makes it such that the list of all datasources doesn't need to be loaded upfront. * Adding more metadata * Js translation * js tran * js trans * js trans * js tran * js trans * js trans * js tran * js translation * js trans * js translation * try load language pack async * Backend translations things * create language pack inside common data * performance improvement for js i18n. - js bundle should not contain localized content - we populate translation content from server-side, in boostrap.common.language_pack - in client-side, use promise to wrap around translation content. text will be translated after translation content arrived/parsed. - fix linting * fix Timer unit test * 1. add global hook for all tests, to make translation pack avaialble before each test starts. 2. fix unit test for Timer component 3. remove noused method get_locale, and modules 4. fix page reload after user change page language * parse and build i18n dictionary as a module * fix sync-backend task, which should run without DOM
Just to make sure...one can now use BigQuery via SQLAlchemy in Superset? Not seeing any updates in the documentation: https://github.com/apache/incubator-superset yet |
Yes .. it just hasnt made it to the published page yet...
BigQuery pip install pybigquery bigquery://
…On Thu, May 24, 2018 at 9:03 AM DJB ***@***.***> wrote:
Just to make sure...one can now use BigQuery via SQLAlchemy in Superset?
Not seeing any updates in the documentation:
https://github.com/apache/incubator-superset yet
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#945 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AX24-eY8U3oVYJqnTvRCMgM1FmNdZ6mvks5t1q-tgaJpZM4JkTQB>
.
--
Daryl Erwin | Big Data Developer
*M.* +1 519.748.7953 | *BBM PIN.* 5D5F0375
*E.* [email protected] | *W. *bbm.com
Follow us on: Facebook <https://www.facebook.com/bbm/> | Twitter
<https://twitter.com/BBM> | Instagram
<https://www.instagram.com/discoverbbm/> | LinkedIn
<https://www.linkedin.com/company/discoverbbm> | YouTube
<https://www.youtube.com/bbm> | Vidio <https://www.vidio.com/@bbm>
*BBM used under license by Creative Media Works Pte Ltd **(Co. Regn. No.
201609444E)*
This e-mail is intended only for named addressee(s) and may contain
confidential and/or privileged information. If you are not the named
addressee or have received this e-mail in error, please notify the sender
immediately. The unauthourised disclosure, use or reproduction of this
email's content is prohibited. Unless expressly agreed, no reliance on this
email may be made.
|
Great! Thanks for the quick reply :) If I can add my 2 cents. Would be great to see an example of how to do the complete set-up in steps and perhaps an example covering the querying of a public BQ Dataset: https://cloud.google.com/bigquery/public-data/ It would help to explain the difference between running query in Standard SQL vs Legacy SQL. |
@darylerwin @dirkjohannesbosman I just updated the docs @ http://superset.apache.org/installation.html#database-dependencies |
@dirkjohannesbosman |
Great, thanks guys for the reply and the docs update 👍 |
How do you add BigQuery tables in the My BQ connection works and I can make queries in SQL Lab. I'm using |
Hi @amodig Not sure I understood your question correctly. Let me know if this helps. |
@mistercrunch I dont want to be salty but can you lock this thread since it has been closed and resolved, any issues related to this can be opened as new issues. |
@swamy16 Ah, NVM, I noticed a GCP Service Account permission issue with SQL Lab. The connection credentials didn't have enough permission to query BQ. I put the dataset name into the |
What is the status of this issue? Is somebody using Superset with BigQuery in production?
I'm evaluating Superset for my next project... |
@mistercrunch There is a button under participants to lock conversation, but i believe it needs certain privilages |
Oh good, I had not discovered that feature. |
Currently sqlalchemy does not support a bigquery dialect. Is there any way that BigQuery sources can be added to Caravel, apart from extending sqlalchemy to support BigQuery?
The text was updated successfully, but these errors were encountered: