-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to import CSV as datasource #381
Comments
It should probably be done at the database level, maybe an upload icon in the database list view. pandas has some utility functions that make that trivial, first load the csv in a dataframe, then upload it to the db. |
+1, would love this feature. |
This would be a very handy feature for a data mining system. But would be curious how could the upload CSV file / data be saved, Will the data always be read the CSV file and parsed / during dashboard / graph generation? Or user has to select an existing database to save data on the CSV file? This would require the database user to have insert or even create table permission on the data source, which is not necessary on current design. Any thoughts here? |
I'd be willing to have a go at this. Something like:
Additional bonus is this could make replicating/debugging others' problems easier. @mistercrunch any thoughts? |
For all those wanting to use CSV in the interim, I had success using a csv2sqlite script, as detailed here: https://github.com/FOIA-data-hackathon/MuckRock-Caravel |
@mistercrunch, any updates on this issue? There is a workaround via miserlou but I was hoping to make a contribution. @andrewhn, did you make any progress with this? If so, could I see your code? |
@andrewhn @SalehHindi Has anyone given this a go yet? We also think this would be a great feature, but would be keen to hear of any new approaches that did/didn't work. |
For the record, I would suggest to anyone who wants to tackle this to the following pandas method, and expose as much as is possible/reasonable from their api in the upload form: Basically you'd have a form with sensible defaults and options based on the pandas api. |
@mistercrunch, thanks for laying that out. I think I'll make an attempt. |
I've made a start on this on a csv-import branch on my fork. The basic functionality is in place but needs some testing and additional validation on the fields presented by the new form. A button to import CSV has been added on the 'sources->database' page that brings up a new form exposing most of the pandas api. The CSV is added as a new table to an existing database. It can then be added like any other table on the 'sources->tables' page. |
@axitkhurana I was swamped with finals week last week so I didn't get to it so go for it. |
Nice one @Ryan4815, is there any update on your solution? |
@axitkhurana @Simeon-ayo I believe that @SalehHindi is going to add some tests to the branch and get it prepped for a merge request. |
Pandas is quite memory hungry. I can't load a sparse 1GB csv file on my 16GB system due to MemoryError. Plot.ly offers a tutorial on how to convert a CSV to SQLite chunk by chunk to avoid eating all the memory. https://plot.ly/python/big-data-analytics-with-pandas-and-sqlite/. It's probably useful for superset to use a similar conversion step so an arbitrarily sized csv can be converted. If speed is an issue as pandas.read_csv is single-threaded, an alternative is |
Nice catch @mratsim and thanks for the link. |
Any updates on this feature |
any updates of this feature |
import pandas as pd def df2sqlite(dataframe, db_name = "import.sqlite", tbl_name = "import"):
df2sqlite(pdsites, db_name="sites_4g.db", tbl_name = "sites_data_4g") import sqlite3 Create your connection.cnx = sqlite3.connect('sites_4g.db') df = pd.read_sql_query("SELECT * FROM sites_data_4g", cnx) ####Then go to superset data sources-databases and input 'sqlite///sites_4g.db' |
Update: I'm now working on this issue, continuing from @SalehHindi 's last commit to the csv-import branch. When I run the code, I don't see any "Add CSV Table to Database" button. Can you tell what I might be doing wrong? |
hey @timifasuba....the method I use is using pandas to transform the
data...all this is done on the server (back end), then I put the data in
sqlite as a .db file...see below code that if you replicate you will easily
import any csv, after you do this....then go to superset data
sources-databases and input 'sqlite///sites_4g.db' (in my example below I
created a db with name sites_4g.db)
import pandas as pd
pdsites = pd.read_csv("site_data.csv")
pdsites.columns
def df2sqlite(dataframe, db_name = "import.sqlite", tbl_name = "import"):
import sqlite3
conn=sqlite3.connect(db_name)
cur = conn.cursor()
wildcards = ','.join(['?'] * len(dataframe.columns))
data = [tuple(x) for x in dataframe.values]
cur.execute("drop table if exists %s" % tbl_name)
col_str = '"' + '","'.join(dataframe.columns) + '"'
cur.execute("create table %s (%s)" % (tbl_name, col_str))
cur.executemany("insert into %s values(%s)" % (tbl_name, wildcards),
data)
conn.commit()
conn.close()
df2sqlite(pdsites, db_name="sites_4g.db", tbl_name = "sites_data_4g")
import sqlite3
import pandas as pd
# Create your connection.
cnx = sqlite3.connect('sites_4g.db')
df = pd.read_sql_query("SELECT * FROM sites_data_4g", cnx)
df.head(5)
####Then go to superset data sources-databases and input
'sqlite///sites_4g.db'
…On Thu, Aug 24, 2017 at 3:16 AM, timifasubaa ***@***.***> wrote:
Update: I'm now working on this issue, continuing from @SalehHindi
<https://github.com/salehhindi> 's last commit to the csv-import branch.
When I run the code, I don't see any "Add CSV Table to Database" button.
Can you tell what I might be doing wrong?
You can look at the code I'm running on my fork (https://github.com/
timifasubaa/incubator-superset) with branch name import_csv.
Also, please post a snapshot of the new flow (e.g. the page with the new
button e.t.c.) .
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#381 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXH0HO5ENCIDMrIAA5EFyzbQ3QmpfshHks5sbMDNgaJpZM4ILd2G>
.
|
Hey @hillaryhitch, @timifasubaa, thanks for the comment. I just started a new job so this fell off my radar but I will push up my tests/updates/screenshots for this feature tonight after work so people can start using this. |
Notice: this issue has been closed because it has been inactive for 230 days. Feel free to comment and request for this issue to be reopened. |
Would love to work on this after March 5, if this feature is not available yet @mistercrunch |
I would really like this feature please :) |
Hi there,
Is there any plan to add support for uploading CSV data as well as data source ?
Maybe using sqlite3:
http://stackoverflow.com/questions/2580497/database-on-the-fly-with-scripting-languages
Thanks
The text was updated successfully, but these errors were encountered: