Re-implement Study View's data binning algorithm using SQL (instead of Java) #117
Labels
cBioPortal
Difficulty: Medium
enhancement
GSoC-2025
GSoC 2025 Candidate Projects
Java
Size: Medium (175h)
SQL
Background:
cBioPortal is an open-source platform designed to provide a web interface for exploring, visualizing, and analyzing cancer genomics data, and has grown to be widely used by researchers and clinicians worldwide. The current interface provides comprehensive tools for individual patient data exploration, including mutations, copy number variations, and clinical information as well as cohort exploration, analytics, and cohort comparisons.
The endpoints which drive the histogram charts on the cBioPortal Study View calculate data bins and return them to the frontend for display. To do this, they must fetch the underlying data from the database and run it through custom binning logic written in Java. This is not performant for large data sets. The binning should be done in the database query so that we don't have to return voluminous data and keep it in web server memory. Clickhouse, the new database we are adopting, provides functions to do this.
Goal:
Optimize the cBioPortal Study View data binning algorithm by replacing the existing logic written in Java and re-implementing it so that the heavy lifting is performed by the database instead.
Approach:
We believe this is possible using the RoundDown function of Clickhouse (our new OLAP database).
This project requires:
Possible mentors:
@alisman
The text was updated successfully, but these errors were encountered: