Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

fedorov · 2025-01-15T14:59:03Z

Project Description

I will put together a complete project description, since @rkikinis and I definitely plan to work on this, but I do not have time for a complete proposal right now, and I want to make this visible to other participants.

Participants

Andrey Fedorov (technical support) @fedorov
Ron Kikinis (radiology expertise) @rkikinis
David Clunie (radiology expertise) @dclunie (David, hope you don't mind that I enlisted you here!)
Steve Pieper (technical) @pieper

Background

When initially released, TotalSegmentator was perceived to produce superior results, in comparison to the state-of-the-art at the time, anyway.

Over time, some of the deficiencies in the segmentations produced by TotalSegmentator have been identified. Further, new multi-organ segmentation models have been introduced.

Objective

Review segmentation results for a sample of images from IDC NLST collection, noting the problems, across the publicly available multi-organ segmentation models.

Approach

For any model to be evaluated, we need to have segmentation results available for the specific set of CT images from NLST.

At this moment, we have those for

TotalSegmentator v1
MOOSE

We are hopeful to also get segmentations for those selected specific cases for the following (given the list of participants in PW):

OMAS (@aTamaz will you help?)
AutoSeg3D (@diazandr3s can you help?)

We plan to use the SegmentationVerification extension developed by @cpinter in the PW41 earlier for the review (see https://projectweek.na-mic.org/PW41_2024_MIT/Projects/SegmentationVerificationModuleForFinalizingMultiLabelAiSegmentations/).

We plan to summarize the results of the review in a publicly available document.

@pieper or anyone else - any other methods we should consider?

pieper · 2025-01-15T15:11:54Z

@pieper or anyone else - any other methods we should consider?

As part of your review it would be great to define criteria for reviewing segmentations, ideally with respect to use cases of clinical or research interest.

Example use cases could be tracking loss of muscle mass over time or collecting population statistics of bone shapes.

In the context of these use cases, variables like segmentation resolution, Dice scores, true/false positive/negatives, robustness with respect to noise or anatomical variations, and other metrics of segmentation quality could then be objectively compared across segmentation methods.

I'm going to add myself to the project because I'm very interested these topics : )

fedorov · 2025-01-15T15:18:23Z

@pieper I think we will start with collecting the set of cases segmented by each method, identifying cases that have obvious failures of different kind (based on TotalSegmentator results for NLST in IDC), segmenting those failed cases with other methods, and summarizing qualitative observations by Ron/David/any other radiologist. IMHO, quantitative measures are secondary, and can be done independently - they don't necessarily capture expert opinion.

As part of your review it would be great to define criteria for reviewing segmentations, ideally with respect to use cases of clinical or research interest.

Example use cases could be tracking loss of muscle mass over time or collecting population statistics of bone shapes.

For the sake of this specific project, I would like to limit the scope to NLST, at least as the initial step - just to make it manageable. But I agree in principle, absolutely.

pieper · 2025-01-15T15:27:14Z

Yes, the examples I suggested would be the kinds of things I'd be interested in knowing if the NLST segmentation results could be used to assess. E.g. are the results accurate enough to measure something like changes in muscle mass with respect to age and how do the results compare with what's expected from the literature. I think such assessment would help establish the value of large population segmentation.

fedorov · 2025-01-15T15:29:09Z

I agree, and I will include that as an aspirational goal. But let's see how far we can get at the PW!

sjh26 · 2025-01-22T20:42:29Z

@fedorov Has this been converted to a project yet?

fedorov · 2025-01-22T20:51:23Z

Not yet, sorry! I have a deadline tomorrow, and didn't get to this yet.

fedorov · 2025-01-24T18:37:07Z

Superseded by #1393.

fedorov added proposal event:PW42_2025_GranCanaria labels Jan 15, 2025

fedorov assigned sjh26 Jan 15, 2025

fedorov closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

fedorov commented Jan 15, 2025 •

edited by pieper

Loading

pieper commented Jan 15, 2025

fedorov commented Jan 15, 2025 •

edited

Loading

pieper commented Jan 15, 2025

fedorov commented Jan 15, 2025

sjh26 commented Jan 22, 2025

fedorov commented Jan 22, 2025

fedorov commented Jan 24, 2025

Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

Proposal: Review of segmentation results quality across various multi-organ segmentation models #1351

Comments

fedorov commented Jan 15, 2025 • edited by pieper Loading

Project Description

Participants

Background

Objective

Approach

pieper commented Jan 15, 2025

fedorov commented Jan 15, 2025 • edited Loading

pieper commented Jan 15, 2025

fedorov commented Jan 15, 2025

sjh26 commented Jan 22, 2025

fedorov commented Jan 22, 2025

fedorov commented Jan 24, 2025

fedorov commented Jan 15, 2025 •

edited by pieper

Loading

fedorov commented Jan 15, 2025 •

edited

Loading