LOTUS: Learning to Optimize Task-based US representations
MICCAI 2023 (Oral Presentation)

Yordanka Velikova¹

Mohammad Farid Azampour^1,2

Walter Simson³

Vanessa Gonzalez Duque^1,4

Nassir Navab¹

¹Technical University of Munich

²Sharif University of Technology

³Stanford University

⁴Ecole Centrale Nantes

[Paper]

[Code]

[Bibtex]

Abstract

Anatomical segmentation of organs in ultrasound images is essential to many clinical applications, particularly for diagnosis and monitoring. Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance. Yet, in ultrasound, due to characteristic properties such as speckle and clutter, it is challenging to obtain accurate segmentation boundaries, and precise pixel-wise labeling of images is highly dependent on the expertise of physicians. In contrast, CT scans have higher resolution and improved contrast, easing organ identification. In this paper, we propose a novel approach for learning to optimize task-based ultra-sound image representations. Given annotated CT segmentation maps as a simulation medium, we model acoustic propagation through tissue via ray-casting to generate ultrasound training data. Our ultrasound simulator is fully differentiable and learns to optimize the parameters for generating physics-based ultrasound images guided by the downstream segmentation task. In addition, we train an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on US images in an end-to-end training setting. The proposed method is evaluated on aorta and vessel segmentation tasks and shows promising quantitative results. Furthermore, we also conduct qualitative results of optimized image representations on other organs.

Video

Differentiable Ultrasound Renderer

Differentiable Ultrasound Renderer pipeline builds upon the mathematical foundations of ray tracing and ultrasound echo generation. The renderer takes a 2D label map with tissue labels as input, where each tissue label is assigned five parameters that describe ultrasound-specific tissue characteristics and control the rendering generation.: attenuation coefficient, acoustic impedance, and three parameters defining speckle distribution. The renderer generates three sub-maps: attenuation, reflection, and scatter maps, by modeling ultrasound waves as rays starting from the transducer and propagating through media using physical laws.

Optimized Image Representations

We run LOTUS for for five segmentation tasks. The training data is generated on the fly and a separate segmentation network is trained for each task. Initially the images are rendered with default renderer's parameters and then iteratively refined during training to produce optimized images for each specific task. The spine, kidney, and liver progressively brighten, while vessels darken, and surrounding smaller organs fade, creating a homogeneous background and enhancing the primary organ's visibility in the ultrasound simulations.

End-to-End Training Pipeline

During training, US-like images are rendered from CT label maps and used as training data. The ultrasound renderer learns to optimize the parameters based on the downstream segmentation task. At the same time, we train an unpaired and unsupervised image style transfer network between real and rendered images to achieve simultaneous image synthesis as well as automatic segmentation on US images in an end-to-end training setting.

Results

Results during inference: real US images are first passed trough the image adaptation network and then through the segmentation network. The real US images are translated to the optimized US image appearance, which was learned during the training for the task of vessels segmentation.

Template webpage from source code.