We have observed, over a period of time in Data Engineering, a need for a robust API on top of the existing frameworks so that we can declaratively provide (often used) parameters to read and write Data Sources with standard set of features.
We have built SourceLoader and SinkWriter APIs which can be imported and used everywhere whenever any project needs to load/save a Data Source.
It is also very easy to add new feature to an existing framework
Supported Frameworks |
---|
Apache Spark |
Apache Beam |
- Builder like pattern where it is easy to add or remove features related to reading/writing a data source.
- Most of the features are related to three major components of an IO operation namely
- Reader (Example: DataFrameReader in Spark)
- Writer (Example: DataFrameWriter in Spark)
- Data Collection (Example: DataFrame in Spark, PCollection in Beam)
- An interface SupportedFeatures has been built which will house all the reader/writer features.
Please refer to our Supported Features guide.
Please go through the Wiki to understand the usage of the library.
Project is build using sbt
See the CONTRIBUTING file for how to help out.