-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351
Comments
As part of your review it would be great to define criteria for reviewing segmentations, ideally with respect to use cases of clinical or research interest. Example use cases could be tracking loss of muscle mass over time or collecting population statistics of bone shapes. In the context of these use cases, variables like segmentation resolution, Dice scores, true/false positive/negatives, robustness with respect to noise or anatomical variations, and other metrics of segmentation quality could then be objectively compared across segmentation methods. I'm going to add myself to the project because I'm very interested these topics : ) |
@pieper I think we will start with collecting the set of cases segmented by each method, identifying cases that have obvious failures of different kind (based on TotalSegmentator results for NLST in IDC), segmenting those failed cases with other methods, and summarizing qualitative observations by Ron/David/any other radiologist. IMHO, quantitative measures are secondary, and can be done independently - they don't necessarily capture expert opinion.
For the sake of this specific project, I would like to limit the scope to NLST, at least as the initial step - just to make it manageable. But I agree in principle, absolutely. |
Yes, the examples I suggested would be the kinds of things I'd be interested in knowing if the NLST segmentation results could be used to assess. E.g. are the results accurate enough to measure something like changes in muscle mass with respect to age and how do the results compare with what's expected from the literature. I think such assessment would help establish the value of large population segmentation. |
I agree, and I will include that as an aspirational goal. But let's see how far we can get at the PW! |
@fedorov Has this been converted to a project yet? |
Not yet, sorry! I have a deadline tomorrow, and didn't get to this yet. |
Project Description
I will put together a complete project description, since @rkikinis and I definitely plan to work on this, but I do not have time for a complete proposal right now, and I want to make this visible to other participants.
Participants
Background
When initially released, TotalSegmentator was perceived to produce superior results, in comparison to the state-of-the-art at the time, anyway.
Over time, some of the deficiencies in the segmentations produced by TotalSegmentator have been identified. Further, new multi-organ segmentation models have been introduced.
Objective
Review segmentation results for a sample of images from IDC NLST collection, noting the problems, across the publicly available multi-organ segmentation models.
Approach
For any model to be evaluated, we need to have segmentation results available for the specific set of CT images from NLST.
At this moment, we have those for
We are hopeful to also get segmentations for those selected specific cases for the following (given the list of participants in PW):
We plan to use the SegmentationVerification extension developed by @cpinter in the PW41 earlier for the review (see https://projectweek.na-mic.org/PW41_2024_MIT/Projects/SegmentationVerificationModuleForFinalizingMultiLabelAiSegmentations/).
We plan to summarize the results of the review in a publicly available document.
@pieper or anyone else - any other methods we should consider?
The text was updated successfully, but these errors were encountered: