Project Overview

Key Takeaways from Preparing a 3D CT-Scan Dataset: Strategies for Efficiently Managing and Reducing Large Medical Imaging Data. In the course of our participation in the RSNA 2023 Abdominal Trauma Detection competition on Kaggle, we were confronted with the formidable task of managing a vast repository of CT-scan data. Our primary objective was to harness this extensive dataset for the construction of a machine learning model focused on abdominal trauma detection. The initial prospect of deciphering this voluminous dataset appeared daunting, but with unwavering support from colleagues and mentors, we gained invaluable insights into the intricacies of processing 3D medical imaging data.

The First Step: Resizing and Dynamic Interpolation

The CT scans were provided in the DICOM file format, a standard in medical image processing. A significant challenge that surfaced was the varying depth of these scans, despite consistent dimensions of 512x512 in height and width. To seamlessly integrate these scans into a unified 3D mesh and incorporate them into our detection pipeline, addressing this issue was imperative. Our goal was to construct 128x128x128 3D cubes for each CT-scan dataset. This was achieved by dynamically selecting a target number of slices from each scan and interpolating the slices in the correct sequence. This process was intricate, demanding meticulous consideration of voxel spacing and other metadata from the scan files to ensure the preservation of the original order.

The Second Step: Intensity Projection

After the initial step, our dataset had been considerably downsized to approximately 15.93 GB. However, this still represented a formidable amount of data to handle with our limited computational resources. It was at this point that we encountered the "Maximum Intensity Projection" technique, involving the projection of light along a specific axis of a translucent 3D mass to capture a resulting 2D image. In theory, this 2D image contains essential information from the entire scan. While we initially explored the Maximum Intensity Projection, the unique shape and dimensions of our data led to predominantly black images. Subsequently, we experimented with the "Minimum Intensity Projection" approach, which follows the same principle but selects the minimum intensity along a specific axis to construct the final 2D image. This alternative approach yielded excellent results with this dataset.

Conclusion:

Through the meticulous execution of these steps, we successfully prepared the dataset for training, optimizing resource utilization while minimizing the loss of critical details. The journey of exploring innovative data manipulation techniques was a rewarding and enlightening experience. Our heartfelt gratitude goes out to Apurba Technologies Ltd, our employer, and our dedicated teammate, Mutasim Billah, for their invaluable guidance and collaborative efforts throughout this endeavor. This project underscores our commitment to overcoming data challenges and pushing the boundaries of medical imaging in the pursuit of improved healthcare solutions.