This is a list of summer projects for OSS incubators and research
projects affiliated with CROSS. If you have any questions, please visit our Gitter channel:
This is a proposal of GSoC projects for
LiveHD.
Popper is a workflow execution
engine based on Github actions.
This is a list of ideas for projects related to Popper:
|
|
Title |
Transparently run workflows in a Kubernetes cluster. |
Mentor(s) |
Ivo Jimenez |
Skills |
Python (strong), Go (basic), Kubernetes (basic) |
Description |
Given a kubeconfig file, allow Popper users to execute a workflow in a kubernetes cluster. This involves implementing a module that resembles a container engine (popper run --engine kubernetes ), with the difference that containers are deployed in the cluster. |
Link |
https://github.com/systemslab/popper |
Difficulty |
High |
|
|
Title |
Workflow Viewer and Editor |
Mentor(s) |
Ivo Jimenez |
Skills |
GUI development (strong), Javascript (strong) |
Description |
Create a javascript-based viewer of Popper pipelines that allows users to visually explore a pipeline and its contents, similar to Github's built-in (see live example here) workflow editor. As part of this project, action and workflows catalogs will also be implemented, allowing users to search from an existing list of workflows and actions. |
Link |
https://github.com/blkswanio/blackswan |
Difficulty |
Medium |
The Skyhook Data Management project extends object storage in the cloud with data management functionality. Skyhook enables storing and query database tables in Ceph distributed object storage, and supports multiple data formats including Google Flatbuffers and Apache Arrow as well as text and scientific file formats. Skyhook partitions and formats data as objects, and we utilize Ceph's object class extension mechanism to develop custom read/write and processing methods that can be executed directly within storage.
|
|
Title |
Compaction of formatted database partitions within objects |
Mentor(s) |
Jeff LeFevre |
Skills |
C++ |
Description |
This project will develop object class methods that will merge (or conversely split) formatted data partitions within an object. Self-contained partitions are written (appended) to objects and over time objects may contain a sequence of independent formatted data structures. A compaction request will invoke this method that will iterate over the data structures, combining (or splitting) them into a single larger data structure representing the complete data partition. In essences, this methods will perform a read-modify-write operation on an object's local data. |
Link |
https://github.com/uccross/skyhookdm-ceph/issues/33 |
Difficulty |
high |
|
|
Title |
Database statistics collection on partitioned data |
Mentor(s) |
Jeff LeFevre |
Skills |
C++ |
Description |
This project will develop object-class methods to compute data statistics (histograms) for each object and store them in a query-able format within each storage server’s local RocksDB, then write client code to accumulate all the object-local statistics into global statistics for a given database table. |
Link |
https://github.com/uccross/skyhookdm-ceph/issues/77 |
Difficulty |
high |
|
|
Title |
Extend current aggregations to include sort/groupby for database partitions |
Mentor(s) |
Jeff LeFevre |
Skills |
C++ |
Description |
We have developed methods (C++) for data management including data processing and indexing. This project will develop object-class methods methods to sort/group query result sets. This requires extending the current code (select/project/basic-aggregations - min/max/sum/count) to support groupby and/or orderby. |
Link |
https://github.com/uccross/skyhookdm-ceph/issues/23 |
Difficulty |
medium |
Inconsistent relational databases are the ones that violate one or more integrity constraints defined over their schema. We are developing CAvSAT, which aims to be a scalable and comprehensive system for query answering over inconsistent databases.
|
|
Title |
CAvSAT (Consistent Answers via Satisfiability Solving) |
Mentors |
Akhil Dixit, Phokion G. Kolaitis |
Skills |
Java and SQL (required), Node.js and React (optional) |
Description |
In general, computing consistent answers over inconsistent databases is an intractable problem. However, for certain classes of SQL queries and integrity constraints, there is an efficient algorithm to compute consistent answers. The first task of this project is to make the student familiar with the literature and implement this algorithm. Second, the student will conduct experiments with both synthetic and real-world datasets, to compare the performance of this algorithm with other methods. If the student is interested in front-end or full-stack development, they may work on developing some of CAvSAT's user interface components using technologies such as React and write code that connects to CAvSAT's backend via RESTful APIs. |
Papers for Reference |
https://dl.acm.org/doi/10.1145/3299869.3300095, https://link.springer.com/chapter/10.1007/978-3-030-24258-9_8, https://dl.acm.org/doi/10.1145/303976.303983, https://dl.acm.org/doi/10.1145/3068334 |
Project Link |
https://github.com/uccross/cavsat |
Difficulty |
Medium |