Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] HDFS interface #2468

Closed
steremma opened this issue Mar 24, 2017 · 2 comments
Closed

[feature] HDFS interface #2468

steremma opened this issue Mar 24, 2017 · 2 comments

Comments

@steremma
Copy link

steremma commented Mar 24, 2017

I have started to work on providing a new datasource, targeting semi structured data residing on a distributed file system (HDFS/S3) such as:

  • csv
  • json
  • xml

I have started to work on that by designing a spark API imitating the functionality offered by the already existing datasources.

I would be interested to know if people would appreciate the addition of such a feature.
I also have some ideas regarding the implementation, specifically I am now adding functionality in a new
module under connectors/

Any input is valuable at this point!

@steremma steremma changed the title proposed feature: HDFS interface [feature] HDFS interface Mar 25, 2017
@mistercrunch
Copy link
Member

mistercrunch commented Mar 26, 2017

You'll need an engine that can deserialize, aggregate & filter data as you implement the interface, and Superset isn't the place to do that. Depending on your serde you may want to write a deserializer for Presto or something like Apache Drill.

@mistercrunch
Copy link
Member

Notice: this issue has been closed because it has been inactive for 392 days. Feel free to comment and request for this issue to be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants