Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

Open
fedorov opened this issue Jan 15, 2025 · 6 comments

Comments

@fedorov
Copy link
Member

fedorov commented Jan 15, 2025

Project Description

I will put together a complete project description, since @rkikinis and I definitely plan to work on this, but I do not have time for a complete proposal right now, and I want to make this visible to other participants.

Participants

  • Andrey Fedorov (technical support) @fedorov
  • Ron Kikinis (radiology expertise) @rkikinis
  • David Clunie (radiology expertise) @dclunie (David, hope you don't mind that I enlisted you here!)
  • Steve Pieper (technical) @pieper

Background

When initially released, TotalSegmentator was perceived to produce superior results, in comparison to the state-of-the-art at the time, anyway.

Over time, some of the deficiencies in the segmentations produced by TotalSegmentator have been identified. Further, new multi-organ segmentation models have been introduced.

Objective

Review segmentation results for a sample of images from IDC NLST collection, noting the problems, across the publicly available multi-organ segmentation models.

Approach

For any model to be evaluated, we need to have segmentation results available for the specific set of CT images from NLST.

At this moment, we have those for

  • TotalSegmentator v1
  • MOOSE

We are hopeful to also get segmentations for those selected specific cases for the following (given the list of participants in PW):

We plan to use the SegmentationVerification extension developed by @cpinter in the PW41 earlier for the review (see https://projectweek.na-mic.org/PW41_2024_MIT/Projects/SegmentationVerificationModuleForFinalizingMultiLabelAiSegmentations/).

We plan to summarize the results of the review in a publicly available document.

@pieper or anyone else - any other methods we should consider?

@pieper
Copy link
Contributor

pieper commented Jan 15, 2025

@pieper or anyone else - any other methods we should consider?

As part of your review it would be great to define criteria for reviewing segmentations, ideally with respect to use cases of clinical or research interest.

Example use cases could be tracking loss of muscle mass over time or collecting population statistics of bone shapes.

In the context of these use cases, variables like segmentation resolution, Dice scores, true/false positive/negatives, robustness with respect to noise or anatomical variations, and other metrics of segmentation quality could then be objectively compared across segmentation methods.

I'm going to add myself to the project because I'm very interested these topics : )

@fedorov
Copy link
Member Author

fedorov commented Jan 15, 2025

@pieper I think we will start with collecting the set of cases segmented by each method, identifying cases that have obvious failures of different kind (based on TotalSegmentator results for NLST in IDC), segmenting those failed cases with other methods, and summarizing qualitative observations by Ron/David/any other radiologist. IMHO, quantitative measures are secondary, and can be done independently - they don't necessarily capture expert opinion.

As part of your review it would be great to define criteria for reviewing segmentations, ideally with respect to use cases of clinical or research interest.

Example use cases could be tracking loss of muscle mass over time or collecting population statistics of bone shapes.

For the sake of this specific project, I would like to limit the scope to NLST, at least as the initial step - just to make it manageable. But I agree in principle, absolutely.

@pieper
Copy link
Contributor

pieper commented Jan 15, 2025

Yes, the examples I suggested would be the kinds of things I'd be interested in knowing if the NLST segmentation results could be used to assess. E.g. are the results accurate enough to measure something like changes in muscle mass with respect to age and how do the results compare with what's expected from the literature. I think such assessment would help establish the value of large population segmentation.

@fedorov
Copy link
Member Author

fedorov commented Jan 15, 2025

I agree, and I will include that as an aspirational goal. But let's see how far we can get at the PW!

@sjh26
Copy link
Contributor

sjh26 commented Jan 22, 2025

@fedorov Has this been converted to a project yet?

@fedorov
Copy link
Member Author

fedorov commented Jan 22, 2025

Not yet, sorry! I have a deadline tomorrow, and didn't get to this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants