Edit this page

NA-MIC Project Weeks

Back to Projects List

Evaluating concordance of AI-based anatomy segmentation models

Key Investigators

Project Description

Quantitative analysis of large-scale medical imaging datasets can be streamlined using automated segmentation. The growing number of AI-based methods for anatomical segmentation raises a central challenge of choosing among functionally similar models due to: the absence of ground truth data for representative samples, and practical challenges in comparing segmentation results (inconsistent structure naming, non-uniform formats, and complexity of visualization). Our work alleviates these issues by evaluating six open-source segmentation models—TotalSegmentator 1.5 and 2.6, Auto3DSeg, Moose, MultiTalent, and OMAS—on a sample of CT scans from the publicly available National Lung Screening Trial (NLST) dataset. We analyzed 31 anatomical structures—lungs, vertebrae, ribs, and heart—after harmonizing segmentation results to follow consistent representation. To support visual comparison, we developed open-source tools in 3D Slicer automating loading, structure-wise inspection and comparison across models. For quantitative comparison we evaluated consensus segmentations per structure and assessed model agreement using Dice similarity and volume differences. Preliminary results show excellent agreement segmenting some (e.g., lung) but not all structures (e.g., some models produce invalid vertebrae or rib segmentations). Only one model, Moose, segmented the costovertebral joints—rib-to-spine connections. Overall, this work assists in model evaluation in absence of ground truth, ultimately enabling informed model selection.

Objective

This project builds upon the previous work “Review of segmentation results quality across various multi-organ segmentation models”, conducted during the last Project Week in Gran Canaria. The goal is to systematically evaluate and compare the segmentations of six publicly available multi-organ segmentation models. This evaluation is done by identifying areas of agreement and disagreement across anatomical structures in our dataset, for which ground truth segmentations are unavailable.

During this Project Week, we will improve upon and extend the previous analysis by extending the scope of comparison, and engaging with the users of the evaluated models and model developers.

Approach and Plan

  1. Discuss the current analysis results with developers and interested community members
    • discuss problematic results
    • learn about other observed or potential errors we should investigate
    • discuss approaches for selecting representative test data sample from NLST
    • collect feedback, observations, and suggestions for improvement
  2. Improve and extend the analysis based on this discussion
  3. Secondary: Slicer Segmentation Verification module extension used for visual comparison
    • revisit loading of DICOM SEG as Segmentation and see if any optimizations can be implemented to the code to speed up Segmentation node creation (need to profile the load process further first)
    • discuss developed new features designed to simplify the process of comparison of the results from different models
    • discuss possible improvements to the performance of populating Segmentation node loaded from DICOM SEG

Progress and Next Steps

Current Status of our Analysis:

The Slicer Segmentation Verification Module Extension currently:

Illustrations

An interactive poster with a summary of the results and the current state of the project can be found at the following link: https://www.dropbox.com/scl/fi/c84sm9djytyi80jk2ixfa/giebeler.lena.pptx?rlkey=g3sf82zuv5fgmuog0an3dsy96&dl=0

Current Status of the Slicer Segmentation Verification Module Extension:

Background and References