[feature] HDFS interface #2468

steremma · 2017-03-24T16:21:04Z

I have started to work on providing a new datasource, targeting semi structured data residing on a distributed file system (HDFS/S3) such as:

csv
json
xml

I have started to work on that by designing a spark API imitating the functionality offered by the already existing datasources.

I would be interested to know if people would appreciate the addition of such a feature.
I also have some ideas regarding the implementation, specifically I am now adding functionality in a new
module under connectors/

Any input is valuable at this point!

mistercrunch · 2017-03-26T16:33:57Z

You'll need an engine that can deserialize, aggregate & filter data as you implement the interface, and Superset isn't the place to do that. Depending on your serde you may want to write a deserializer for Presto or something like Apache Drill.

mistercrunch · 2018-04-23T15:29:14Z

Notice: this issue has been closed because it has been inactive for 392 days. Feel free to comment and request for this issue to be reopened.

steremma changed the title ~~proposed feature: HDFS interface~~ [feature] HDFS interface Mar 25, 2017

rhunwicks mentioned this issue Aug 16, 2017

Create a PandasDatasource #3302

Closed

1 task

mistercrunch closed this as completed Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] HDFS interface #2468

[feature] HDFS interface #2468

steremma commented Mar 24, 2017 •

edited

Loading

mistercrunch commented Mar 26, 2017 •

edited

Loading

mistercrunch commented Apr 23, 2018

[feature] HDFS interface #2468

[feature] HDFS interface #2468

Comments

steremma commented Mar 24, 2017 • edited Loading

mistercrunch commented Mar 26, 2017 • edited Loading

mistercrunch commented Apr 23, 2018

steremma commented Mar 24, 2017 •

edited

Loading

mistercrunch commented Mar 26, 2017 •

edited

Loading