Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

alisman · 2025-02-04T17:40:54Z

Background:
cBioPortal is an open-source platform designed to provide a web interface for exploring, visualizing, and analyzing cancer genomics data, and has grown to be widely used by researchers and clinicians worldwide. The current interface provides comprehensive tools for individual patient data exploration, including mutations, copy number variations, and clinical information as well as cohort exploration, analytics, and cohort comparisons.

The endpoints which drive the histogram charts on the cBioPortal Study View calculate data bins and return them to the frontend for display. To do this, they must fetch the underlying data from the database and run it through custom binning logic written in Java. This is not performant for large data sets. The binning should be done in the database query so that we don't have to return voluminous data and keep it in web server memory. Clickhouse, the new database we are adopting, provides functions to do this.

Goal:
Optimize the cBioPortal Study View data binning algorithm by replacing the existing logic written in Java and re-implementing it so that the heavy lifting is performed by the database instead.

Approach:
We believe this is possible using the RoundDown function of Clickhouse (our new OLAP database).

This project requires:

Understanding the specific requirements of binning in the cBioPortal (e.g. custom bin definitions)
Meeting these requirements using RoundDown.
If 2 proves unfeasible, we may resort to Clickhouse's User Defined Functions.

Possible mentors:
@alisman

alisman added GSoC-2025 GSoC 2025 Candidate Projects Size: Medium (175h) Difficulty: Medium labels Feb 4, 2025

ao508 added enhancement cBioPortal Java SQL labels Feb 6, 2025

alisman assigned alisman and unassigned alisman Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

alisman commented Feb 4, 2025 •

edited

Loading

Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

Comments

alisman commented Feb 4, 2025 • edited Loading

alisman commented Feb 4, 2025 •

edited

Loading