Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ability to Save Results from Previous Queries Using S3 #1797

Closed
jefffeng opened this issue Dec 8, 2016 · 10 comments
Closed

Add Ability to Save Results from Previous Queries Using S3 #1797

jefffeng opened this issue Dec 8, 2016 · 10 comments
Labels
airbnb Airbnb related enhancement:request Enhancement request submitted by anyone from the community sqllab Namespace | Anything related to the SQL Lab

Comments

@jefffeng
Copy link

jefffeng commented Dec 8, 2016

Currently, we do not provide a way to save query results outside of CTAS (Create Table As). We should add a feature that allows a user to view/download the results of a previous query even after a new query is run, using S3 as the backend for holding those results.

@jefffeng jefffeng added airbnb Airbnb related enhancement:request Enhancement request submitted by anyone from the community sqllab Namespace | Anything related to the SQL Lab labels Dec 8, 2016
@rwsmith
Copy link

rwsmith commented Feb 10, 2017

Is anyone assigned this issue?

@bkyryliuk
Copy link
Member

It was partially implemented here: #2113

@jfeng15
Copy link

jfeng15 commented Feb 18, 2017

@bkyryliuk - it sounds like Ben did the work to implement the S3 cache. What's the remaining gap? Actually viewing/downloading the data?

@mistercrunch
Copy link
Member

Notice: this issue has been closed because it has been inactive for 429 days. Feel free to comment and request for this issue to be reopened.

@qyoz
Copy link

qyoz commented May 16, 2018

@mistercrunch I'd like to ask for this to be reopened.

Our use case is showing query results from today together with a "snapshot" of yesterdays data.

So like @jfeng15 asked, what's the remaining gap that needs to be developed?
Is the S3 cache working? I couldn't find any examples in the documentation.

@mistercrunch
Copy link
Member

s3 cache works as a place to hold SQL Lab results when the database is operating in async mode.

In async mode, the query is executed on a Superset worker, and the worker updates the state of the query and flushes the results to the "results backend" which can be s3, redis, or whatever else. It's documented in the installation docs.

@qyoz
Copy link

qyoz commented May 16, 2018

so it this the right issue for what we're asking? or should I open a feature request?
this isn't about running in async mode, the use case here is saving the result and being able to slice it

@mistercrunch
Copy link
Member

Why not creating a summary table (CREATE TABLE AS) and then slice/dicing the result set in that summary table?

@qyoz
Copy link

qyoz commented May 17, 2018

Because on a large scale, this would be abusive.
Here is the use case in a little more detail.
We have a database which stores user configurations. We want to know how many users have feature X enabled at this point in time, but we also want to compare it with how many users had feature X enabled yesterday (and a week ago, etc).
This isn't an event database, its closer to a metadata database, so essentially, we want to query snapshots of its state.

Consider doing this using CTAS, the database would become bloated very quickly.
We could also create dedicated snapshots of this DB, but for large instances this can become a costly operation.

My thought was to have a "save result" feature where you could slice on a saved results set.
This way we could show and compare today's data with the saved results.

@mistercrunch
Copy link
Member

Seems like a data engineering / data preparation would be best in this case. You'd ETL your user conf data into a data warehouse, while keeping track of the history/changes.

While Superset has caching features, there's no chart "snapshoting" feature and would make the cache saved as of a point in time for a specific chart.

I don't see that coming anytime soon, so your best bet is ETL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airbnb Airbnb related enhancement:request Enhancement request submitted by anyone from the community sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

6 participants