Add Ability to Save Results from Previous Queries Using S3 #1797

jefffeng · 2016-12-08T00:08:04Z

Currently, we do not provide a way to save query results outside of CTAS (Create Table As). We should add a feature that allows a user to view/download the results of a previous query even after a new query is run, using S3 as the backend for holding those results.

rwsmith · 2017-02-10T19:40:25Z

Is anyone assigned this issue?

bkyryliuk · 2017-02-10T22:20:36Z

It was partially implemented here: #2113

jfeng15 · 2017-02-18T08:40:03Z

@bkyryliuk - it sounds like Ben did the work to implement the S3 cache. What's the remaining gap? Actually viewing/downloading the data?

mistercrunch · 2018-04-23T15:31:14Z

Notice: this issue has been closed because it has been inactive for 429 days. Feel free to comment and request for this issue to be reopened.

qyoz · 2018-05-16T11:30:51Z

@mistercrunch I'd like to ask for this to be reopened.

Our use case is showing query results from today together with a "snapshot" of yesterdays data.

So like @jfeng15 asked, what's the remaining gap that needs to be developed?
Is the S3 cache working? I couldn't find any examples in the documentation.

mistercrunch · 2018-05-16T14:44:00Z

s3 cache works as a place to hold SQL Lab results when the database is operating in async mode.

In async mode, the query is executed on a Superset worker, and the worker updates the state of the query and flushes the results to the "results backend" which can be s3, redis, or whatever else. It's documented in the installation docs.

qyoz · 2018-05-16T16:19:14Z

so it this the right issue for what we're asking? or should I open a feature request?
this isn't about running in async mode, the use case here is saving the result and being able to slice it

mistercrunch · 2018-05-16T18:24:02Z

Why not creating a summary table (CREATE TABLE AS) and then slice/dicing the result set in that summary table?

qyoz · 2018-05-17T10:23:38Z

Because on a large scale, this would be abusive.
Here is the use case in a little more detail.
We have a database which stores user configurations. We want to know how many users have feature X enabled at this point in time, but we also want to compare it with how many users had feature X enabled yesterday (and a week ago, etc).
This isn't an event database, its closer to a metadata database, so essentially, we want to query snapshots of its state.

Consider doing this using CTAS, the database would become bloated very quickly.
We could also create dedicated snapshots of this DB, but for large instances this can become a costly operation.

My thought was to have a "save result" feature where you could slice on a saved results set.
This way we could show and compare today's data with the saved results.

mistercrunch · 2018-05-17T16:40:51Z

Seems like a data engineering / data preparation would be best in this case. You'd ETL your user conf data into a data warehouse, while keeping track of the history/changes.

While Superset has caching features, there's no chart "snapshoting" feature and would make the cache saved as of a point in time for a specific chart.

I don't see that coming anytime soon, so your best bet is ETL.

jefffeng added airbnb Airbnb related enhancement:request Enhancement request submitted by anyone from the community sqllab Namespace | Anything related to the SQL Lab labels Dec 8, 2016

jefffeng mentioned this issue Dec 16, 2016

Ability to see results from all SQL queries from user, not just latest SQL query #1548

Closed

mistercrunch closed this as completed Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ability to Save Results from Previous Queries Using S3 #1797

Add Ability to Save Results from Previous Queries Using S3 #1797

jefffeng commented Dec 8, 2016

rwsmith commented Feb 10, 2017

bkyryliuk commented Feb 10, 2017

jfeng15 commented Feb 18, 2017

mistercrunch commented Apr 23, 2018

qyoz commented May 16, 2018

mistercrunch commented May 16, 2018

qyoz commented May 16, 2018

mistercrunch commented May 16, 2018

qyoz commented May 17, 2018

mistercrunch commented May 17, 2018

Add Ability to Save Results from Previous Queries Using S3 #1797

Add Ability to Save Results from Previous Queries Using S3 #1797

Comments

jefffeng commented Dec 8, 2016

rwsmith commented Feb 10, 2017

bkyryliuk commented Feb 10, 2017

jfeng15 commented Feb 18, 2017

mistercrunch commented Apr 23, 2018

qyoz commented May 16, 2018

mistercrunch commented May 16, 2018

qyoz commented May 16, 2018

mistercrunch commented May 16, 2018

qyoz commented May 17, 2018

mistercrunch commented May 17, 2018