Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement Study View's data binning algorithm using SQL (instead of Java) #117

Open
alisman opened this issue Feb 4, 2025 · 0 comments

Comments

@alisman
Copy link

alisman commented Feb 4, 2025

Background:
cBioPortal is an open-source platform designed to provide a web interface for exploring, visualizing, and analyzing cancer genomics data, and has grown to be widely used by researchers and clinicians worldwide. The current interface provides comprehensive tools for individual patient data exploration, including mutations, copy number variations, and clinical information as well as cohort exploration, analytics, and cohort comparisons.

The endpoints which drive the histogram charts on the cBioPortal Study View calculate data bins and return them to the frontend for display. To do this, they must fetch the underlying data from the database and run it through custom binning logic written in Java. This is not performant for large data sets. The binning should be done in the database query so that we don't have to return voluminous data and keep it in web server memory. Clickhouse, the new database we are adopting, provides functions to do this.

Image

Goal:
Optimize the cBioPortal Study View data binning algorithm by replacing the existing logic written in Java and re-implementing it so that the heavy lifting is performed by the database instead.

Approach:
We believe this is possible using the RoundDown function of Clickhouse (our new OLAP database).

This project requires:

  1. Understanding the specific requirements of binning in the cBioPortal (e.g. custom bin definitions)
  2. Meeting these requirements using RoundDown.
  3. If 2 proves unfeasible, we may resort to Clickhouse's User Defined Functions.

Possible mentors:
@alisman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants