Lung Nodule Localization
Computed tomography (CT) has recently replaced conventional X-ray as the primary screening tool for lung cancer because it has been shown to reduce mortality by as much as 20% in high risk patients. Unfortunately, there is a high false positive rate (FPR) associated with CT screening. At least one 'lung nodule' is detected in half of all CT scans, but only 10% of these 'nodules' are in fact cancerous. The goal of this project was to improve both localization and classification of lung nodules using deep learning methods with the LUNA16 dataset.

Overview
This master's thesis project focused on developing deep learning pipelines for automated lung nodule detection and classification using CT scan data. The work addressed the critical challenge of high false positive rates in lung cancer screening, where roughly 50% of CT scans show at least one nodule, but only 10% are actually cancerous. Using the LUNA16 dataset, we developed a two-stage approach combining U-Net architectures for nodule localization with classification models for false positive reduction.

Technical Approach
Our methodology consisted of three primary components:
- Data preprocessing pipeline to normalize and segment CT scan volumes, extracting regions of interest while maintaining spatial context
- 3D U-Net architecture for nodule detection and localization across axial, coronal, and sagittal planes
- CNN-based classification model for false positive reduction, trained on labeled nodule candidates

Results
The models achieved competitive performance on the LUNA16 challenge benchmarks. Our U-Net-based detector demonstrated strong sensitivity for nodule localization, while the classification pipeline significantly reduced false positives. The Free-Response Receiver Operating Characteristic (FROC) curves showed improvements over baseline methods, particularly for smaller nodules that are most challenging to detect.

Technical Implementation
- TensorFlow and Keras for deep learning model development and training
- AWS EC2 P2 instances with NVIDIA K80 GPUs for model training
- Custom data pipeline for processing large medical imaging datasets (DICOM format)
- Extensive data augmentation including rotations, translations, and intensity transforms
This project was completed as a Master's thesis in Data Science & Engineering at UC San Diego, demonstrating the application of deep learning to critical healthcare challenges in early cancer detection.
