Lung Nodule Localization

June 2018

Suman Gunnala, Anil Luthra, Tony Reina, Kyle Shannon

View on GitHub Thesis Presentation Download Poster

Computed tomography (CT) has recently replaced conventional X-ray as the primary screening tool for lung cancer because it has been shown to reduce mortality by as much as 20% in high risk patients. Unfortunately, there is a high false positive rate (FPR) associated with CT screening. At least one 'lung nodule' is detected in half of all CT scans, but only 10% of these 'nodules' are in fact cancerous. The goal of this project was to improve both localization and classification of lung nodules using deep learning methods with the LUNA16 dataset.

tensorflow amazon-aws cnn-unet

Lung Nodule Localization — Inference through CT scan showing 3 axial planes

Overview

This master's thesis project focused on developing deep learning pipelines for automated lung nodule detection and classification using CT scan data. The work addressed the critical challenge of high false positive rates in lung cancer screening, where roughly 50% of CT scans show at least one nodule, but only 10% are actually cancerous. Using the LUNA16 dataset, we developed a two-stage approach combining U-Net architectures for nodule localization with classification models for false positive reduction.

Technical Approach

Our methodology consisted of three primary components:

Data preprocessing pipeline to normalize and segment CT scan volumes, extracting regions of interest while maintaining spatial context
3D U-Net architecture for nodule detection and localization across axial, coronal, and sagittal planes
CNN-based classification model for false positive reduction, trained on labeled nodule candidates

Results

The models achieved competitive performance on the LUNA16 challenge benchmarks. Our U-Net-based detector demonstrated strong sensitivity for nodule localization, while the classification pipeline significantly reduced false positives. The Free-Response Receiver Operating Characteristic (FROC) curves showed improvements over baseline methods, particularly for smaller nodules that are most challenging to detect.

Technical Implementation

TensorFlow and Keras for deep learning model development and training
AWS EC2 P2 instances with NVIDIA K80 GPUs for model training
Custom data pipeline for processing large medical imaging datasets (DICOM format)
Extensive data augmentation including rotations, translations, and intensity transforms

Click to view full-size poster (PDF)

This project was completed as a Master's thesis in Data Science & Engineering at UC San Diego, demonstrating the application of deep learning to critical healthcare challenges in early cancer detection.