-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify clinical data binning related SQL #10823
Conversation
b34da51
to
a3624e0
Compare
@@ -310,17 +310,33 @@ public List<ClinicalDataBin> calculateStaticDataBins( | |||
List<String> filteredUniqueSampleKeys, | |||
List<String> filteredUniquePatientKeys | |||
) { | |||
return NewClinicalDataBinUtil.calculateStaticDataBins( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to revert the legacy implementation for the legacy endpoint. We can't use NewClinicalDataBinUtil.calculateDynamicDataBins
for legacy endpoint anymore since the underlying logic changed significantly.
<select id="getPatientClinicalDataCountsForBinning"> | ||
SELECT | ||
attribute_name as attributeId, | ||
if(attribute_value='', 'NA', attribute_value) AS value, | ||
count(value) as count | ||
FROM clinical_data_derived | ||
<include refid="patientClinicalDataFromStudyViewFilter" /> | ||
AND type = 'patient' | ||
GROUP BY attribute_name, value | ||
</select> | ||
|
||
<select id="getSampleClinicalDataCountsForBinning"> | ||
SELECT | ||
attribute_name as attributeId, | ||
if(attribute_value='', 'NA', attribute_value) AS value, | ||
count(value) as count | ||
FROM clinical_data_derived | ||
<include refid="sampleClinicalDataFromStudyViewFilter" /> | ||
AND type = 'sample' | ||
GROUP BY attribute_name, value | ||
</select> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These queries are very similar to getClinicalDataCounts
query. We may be able to directly use that one instead of adding new SQL.
cbioportal/src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewMapper.xml
Lines 66 to 74 in b436b2b
<select id="getClinicalDataCounts" resultType="org.cbioportal.model.ClinicalDataCount"> | |
<include refid="getCategoricalClinicalDataCountsQuerySample"> | |
<property name="table_name_prefix" value="sample"/> | |
</include> | |
UNION ALL | |
<include refid="getCategoricalClinicalDataCountsQueryPatient"> | |
<property name="table_name_prefix" value="patient"/> | |
</include> | |
</select> |
@@ -26,6 +26,26 @@ public ClinicalDataBinner( | |||
this.dataBinner = dataBinner; | |||
} | |||
|
|||
// TODO move this to a utility class? | |||
public List<ClinicalData> convertCountsToData(List<ClinicalDataCount> clinicalDataCounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be private
/** | ||
* Calculate number of clinical data marked actually as "NA", "NAN", or "N/A" | ||
*/ | ||
public Long countNAs(List<Binnable> clinicalData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cast to Long and then later it grabs just the intValue(). Do you need the wrapper class?
bin.setCount(countNAs(clinicalData).intValue());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we don't need a wrapper here. For now, just keeping it consistent with the legacy countNAs
method.
public Long countNAs(List<Binnable> clinicalData, ClinicalDataType clinicalDataType, List<String> ids) { |
We can revisit this once we completely get rid of the legacy functionality.
4b73039
to
1c383c6
Compare
@@ -168,10 +171,13 @@ | |||
</select> | |||
|
|||
|
|||
<sql id="getCategoricalClinicalDataCountsQuerySample"> | |||
<sql id="getClinicalDataCountsQuerySample"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed because this SQL returns both categorical and numeric
@@ -198,10 +204,13 @@ | |||
value | |||
</sql> | |||
|
|||
<sql id="getCategoricalClinicalDataCountsQueryPatient"> | |||
<sql id="getClinicalDataCountsQueryPatient"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed because this SQL returns both categorical and numeric
</include> | ||
</if> | ||
<if test="dataFilterValue.start != null || dataFilterValue.end != null"> | ||
AND match(attribute_value, '^[\d\.]+$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ignores edge cases like <18
, >90
, etc. there is a separate ticket for this cBioPortal/rfc80-team#32
00157cc
to
c4ae8d0
Compare
List<DataBin> dataBins, | ||
ClinicalDataType clinicalDataType, | ||
List<Binnable> clinicalData, | ||
List<String> ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you improve the variable name "ids" to something more decsriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is now only used by the legacy endpoint. The new one doesn't need to deal with sample/patient ids anymore. We should probably add comments to the methods that are now only needed for the legacy endpoint so that we can remove them later once we completely migrate to the new implementation.
src/main/java/org/cbioportal/web/columnar/util/NewClinicalDataBinUtil.java
Show resolved
Hide resolved
9e0d230
to
b2dba31
Compare
Quality Gate passedIssues Measures |
Looks like most of the sonar issues are around the deprecated fn. except for this one. https://sonarcloud.io/project/issues?resolved=false&sinceLeakPeriod=true&pullRequest=10823&id=cBioPortal_cbioportal&open=AZCdbAVZBEMFXTbxsIZU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* simplify clinical data binning related SQL * fix numericalClinicalDataCountFilter
clinical_data_derived
table instead of fetching all the actual clinical data