Proceedings Volume 12033

Medical Imaging 2022: Computer-Aided Diagnosis

cover
Proceedings Volume 12033

Medical Imaging 2022: Computer-Aided Diagnosis

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 2 May 2022
Contents: 17 Sessions, 132 Papers, 86 Presentations
Conference: SPIE Medical Imaging 2022
Volume Number: 12033

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 12033
  • Novel Applications
  • COVID-19
  • Breast I
  • Translation of CAD-AI Methods to Clinical Practice: Are We There Yet?: Joint Session with Conferences 12033 and 12035
  • Deep Learning I
  • Translation of CAD-AI Methods to Clinical Practice: Are We There Yet?: Joint Session with Conferences 12033 and 12035
  • Deep Learning I
  • Detection
  • Breast II
  • Deep Learning II
  • Neurology
  • Head and Neck, Musculoskeletal
  • Radiomics, Radiogenomics, Multi-omics
  • Poster Session
  • Lung
  • Abdomen
  • Eye, Retina
  • Segmentation
Front Matter: Volume 12033
icon_mobile_dropdown
Front Matter: Volume 12033
This PDF file contains the front matter associated with SPIE Proceedings Volume 12033, including the Title Page, Copyright information, Table of Contents, and Conference Committee listings.
Novel Applications
icon_mobile_dropdown
A feasibility study of computer-aided diagnosis with DECT Bayesian reconstruction for polyp classification
Dual-energy computed tomography (DECT) has emerged as a promising imaging modality in the field of clinical diagnosis, which expands applications of CT imaging in its capability to acquire two datasets, one at high and the other at low energy. With the Bayesian reconstruction directly from projection measurements at two energies, the energy-independent densities of the two basis materials (e.g. bone/soft-tissue) of the scanned objects are obtained. This work investigated the feasibility of the computer-aided diagnosis with DECT Bayesian reconstruction (CADxDE) for polyp classification. Specifically, the above-reconstructed density images could generate a series of pseudo-single energy CT images multiplied with the corresponding mass attenuation coefficients at selected n energies. Given the augmented n-energy CT images, we proposed a convolution neural network (CNN) based CADx model to differentiate malignant from benign polyps by recognizing material features at different energies. The dataset consists of 63 polyp masses from fifty-nine patients were carried out to verify our CADxDE model. The classification results showed that the area under the receiver operating characteristic curve (AUC) score can be improved by 12.17% with CADxDE over the conventional single energy data only. This feasibility study indicates it is promising that the computer-aided diagnosis with DECT Bayesian reconstruction could be used to improve the clinical classification performance.
Early detection of oesophageal cancer through colour contrast enhancement for data augmentation
Xiaohong Gao, Stephen Taylor, Wei Pang, et al.
While white light imaging (WLI) of endoscopy has been set as the gold standard for screening and detecting oesophageal squamous cell cancer (SCC), the early signs of SCC are often missed (1 in 4) due to its subtle change of early onset of SCC. This study firstly enhances colour contrast of each of over 600 WLI images and their accompanying narrow band images (NBI) applying CIE colour appearance model CIECAM02. Then these augmented data together with the original images are employed to train a deep learning based system for classification of low grade dysplasia (LGD), SCC and high grade dysplasia (HGD). As a result, the averaged colour difference (∆E) measured using CIEL*a*b* increased from 11.60 to 14.46 for WLI and from 17.52 to 32.53 for NBI in appearance between suspected regions and their normal neighbours. When training a deep learning system with added enhanced contrasted WLI images, the sensitivity, specific and accuracy for LGD increases by 10.87%, 4.95% and 6.76% respectively. When training with enhanced both WLI and NBI images, these measures for LGD increases by 14.83%, 4.89% and 7.97% respectively, the biggest increase among three classes of SCC, HGD and LGD. In average, the sensitivity, specificity and accuracy for these three classes are 88.26%, 94.44% and 92.63% respectively for classification of SCC, HGD and LGD, being comparable or exceeding existing published work.
Real-time esophagus achalasia detection method for esophagoscopy assistance
This paper presents an automated real-time esophagus achalasia (achalasia) detection method for esophagoscopy assistance. Achalasia is a well-recognized primary esophageal motor disorder of unknown etiology. To diagnose the achalasia, endoscopic evaluation of the esophagus and stomach is recommended to ensure that there is not a malignancy causing the disease or esophageal squamous cell carcinoma complicating achalasia. However, esophagoscopy is low sensitive in the early-stage of achalasia, only about half of patients with early-stage achalasia can be identified. Thus, a quantitative detection system of real-time esophagoscopy video is required for diagnosis assistance of achalasia. This paper presents to use of a convolutional neural network (CNN) to detect all achalasia frames in esophagoscopy videos. The features of achalasia cannot be easily distinguished. To better extract features from esophagoscopy frames, we introduce dense pooling connections and dilated convolutions in the CNN. We trained and evaluated our network with an original dataset that is extracted from several esophagoscopy videos of achalasia patients. Furthermore, we develop a real-time achalasia detection ComputerAided Diagnosis (CAD) system with the trained network. The CAD system can detect each frame from the input esophagoscopy videos with only 0.1 milliseconds delay. The real-time achalasia detection system achieved 0.872 accuracy, and 0.943 AUC score on our dataset.
COVID-19
icon_mobile_dropdown
Virtual vs. reality: external validation of COVID-19 classifiers using XCAT phantoms for chest computed tomography
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models under different COVID-19 morphology and imaging properties. We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data and an in-house model trained with the open clinical dataset called MosMed. We further validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance. The open-source model was published with nearly perfect performance on the original Waterloo dataset but showed a consistent performance drop in external testing on another clinical dataset (AUC=0.77) and our simulated CVIT-COVID dataset (AUC=0.55). The in-house model achieved an AUC of 0.87 while testing on the internal test set (MosMed test set). However, performance dropped to an AUC of 0.65 and 0.69 when evaluated on clinical and our simulated CVIT-COVID dataset. The VIT framework offered control over imaging conditions, allowing us to show there was no change in performance as CT exposure was changed from 28.5 to 57 mAs. The VIT framework also provided voxel-level ground truth, revealing that performance of in-house model was much higher at AUC=0.87 for diffuse COVID-19 infection size <2.65% lung volume versus AUC=0.52 for focal disease with <2.65% volume. The virtual imaging framework enabled these uniquely rigorous analyses of model performance, which would be impracticable with real patients.
Detecting COVID-19 from respiratory sound recordings with transformers
Idil Aytekin, Onat Dalmaz, Haydar Ankishan, et al.
Auscultation is an established technique in clinical assessment of symptoms for respiratory disorders. Auscultation is safe and inexpensive, but requires expertise to diagnose a disease using a stethoscope during hospital or office visits. However, some clinical scenarios require continuous monitoring and automated analysis of respiratory sounds to pre-screen and monitor diseases, such as the rapidly spreading COVID-19. Recent studies suggest that audio recordings of bodily sounds captured by mobile devices might carry features helpful to distinguish patients with COVID-19 from healthy controls. Here, we propose a novel deep learning technique to automatically detect COVID-19 patients based on brief audio recordings of their cough and breathing sounds. The proposed technique first extracts spectrogram features of respiratory recordings, and then classifies disease state via a hierarchical vision transformer architecture. Demonstrations are provided on a crowdsourced database of respiratory sounds from COVID-19 patients and healthy controls. The proposed transformer model is compared against alternative methods based on state-of-the-art convolutional and transformer architectures, as well as traditional machine-learning classifiers. Our results indicate that the proposed model achieves on par or superior performance to competing methods. In particular, the proposed technique can distinguish COVID-19 patients from healthy subjects with over 94% AUC.
Clinical outcome prediction in COVID-19 using self-supervised vision transformer representations
Automated analysis of chest imaging in coronavirus disease (COVID-19) has mostly been performed on smaller datasets leading to overfitting and poor generalizability. Training of deep neural networks on large datasets requires data labels. This is not always available and can be expensive to obtain. Self-supervision is being increasingly used in various medical imaging tasks to leverage large amount of unlabeled data during pretraining. Our proposed approach pretrains a vision transformer to perform two self-supervision tasks - image reconstruction and contrastive learning on a Chest Xray (CXR) dataset. In the process, we generate more robust image embeddings. The reconstruction module models visual semantics within the lung fields by reconstructing the input image through a mechanism which mimics denoising and autoencoding. On the other hand, the constrastive learning module learns the concept of similarity between two texture representations. After pretraining, the vision transformer is used as a feature extractor towards a clinical outcome prediction task on our target dataset. The pretraining multi-kaggle dataset comprises 27499 CXR scans while our target dataset contains 530 images. Specifically, our framework predicts ventilation and mortality outcomes for COVID-19 positive patients using baseline CXR. We compare our method against a baseline approach using pretrained ResNet50 features. Experimental results demonstrate that our proposed approach outperforms the supervised method.
Multi-stage investigation of deep neural networks for COVID-19 B-line feature detection in simulated and in vivo ultrasound images
COVID-19 is a highly infectious disease with high morbidity and mortality, requiring tools to support rapid triage and risk stratification. In response, deep learning has demonstrated great potential to quicklyand autonomously detect COVID-19 features in lung ultrasound B-mode images. However, no previous work considers the application of these deep learning models to signal processing stages that occur prior to traditional ultrasound B-mode image formation. Considering the multiple signal processing stages required to achieve ultrasound B-mode images, our research objective is to investigate the most appropriate stage for our deep learning approach to COVID-19 B-line feature detection, starting with raw channel data received by an ultrasound transducer. Results demonstrate that for our given training and testing configuration, the maximum Dice similarity coefficient (DSC) was produced by B-mode images (DSC = 0.996) when compared with three alternative image formation stages that can serve as network inputs: (1) raw in-phase and quadrature (IQ) data before beamforming, (2) beamformed IQ data, (3) envelope detected IQ data. The best-performing simulation-trained network was tested on in vivo B-mode images of COVID-19 patients, ultimately achieving 76% accuracy to detect the same (82% of cases) or more (18% of cases) B-line features when compared to B-line feature detection by human observers interpreting B-mode images. Results are promising to proceed with future COVID-19 B-line feature detection using ultrasound B-mode images as the input to deep learning models.
Automated classification method of COVID-19 cases from chest CT volumes using 2D and 3D hybrid CNN for anisotropic volumes
This paper proposes an automated classification method of chest CT volumes based on likelihood of COVID-19 cases. Novel coronavirus disease 2019 (COVID-19) spreads over the world, causing a large number of infected patients and deaths. Sudden increase in the number of COVID-19 patients causes a manpower shortage in medical institutions. Computer-aided diagnosis (CAD) system provides quick and quantitative diagnosis results. CAD system for COVID-19 enables efficient diagnosis workflow and contributes to reduce such manpower shortage. This paper proposes an automated classification method of chest CT volumes for COVID-19 diagnosis assistance. We propose a COVID-19 classification convolutional neural network (CNN) that has a 2D/3D hybrid feature extraction flows. The 2D/3D hybrid feature extraction flows are designed to effectively extract image features from anisotropic volumes such as chest CT volumes for diagnosis. The flows extract image features on three mutually perpendicular planes in CT volumes and then combine the features to perform classification. Classification accuracy of the proposed method was evaluated using a dataset that contains 1288 CT volumes. An averaged classification accuracy was 83.3%. The accuracy was higher than that of a classification CNN which does not have 2D and 3D hybrid feature extraction flows.
Multiparameter analysis of vascular remodeling in post-acute sequelae of COVID-19
The COVID-19 infection, a current worldwide health concern, manifests as an alveolar-interstitial pneumonia with unknown long-term evolution. It is also associated with vascular dysfunction and shows a vascular remodeling with a changed balance between small- and large-caliber vessels. In this study, we question the existence of residual vascular alteration in post-acute sequelae of COVID-19 (PASC) by investigating possible associations between vascular remodeling biomarkers extracted from CT and functional, radiological and morphological parameters. The used vascular biomarkers concern the blood volume ratio of vessels with cross-section area inferior to 5 mm2 versus vessels of crosssection area inferior to 50 mm2 (BV5/BV50), an index of local peripheral vascular density and a peripheral composite vascular remodeling index, both measured in the antero-postero-lateral lung periphery (excluding mediastinal region). As a functional parameter, diffusing capacity of the lung for carbon monoxide (DLCO) is a measure depending on the vascular perfusion and the amount of interstitial thickening, a decreased DLCO value suggesting altered vascular perfusion. Imaging biomarkers can be extracted from the analysis of perfusion lung scintigraphy or CT scan. Some of them are included in our study. Radiological features include CT attenuation as a measure of persistence of ground glass opacity and development of changes suggestive to look for fibrosis, such as reticulations. As additional morphological parameter, lung deformation observed between inspiration/expiration maneuvers may be suggestive of the presence of reticulations inducing lung stiffness and breathing deficiency. The investigation of associations between vascular remodeling biomarkers obtained from CT and the above functional, radiological and morphological parameters revealed moderate to strong correlations highlighting the ability to capture the persistence of vascular alterations in PASC in relation with the development of fibrotic patterns, which is a promising direction for future research.
Breast I
icon_mobile_dropdown
Analyzing GAN artifacts for simulating mammograms: application towards finding mammographically-occult cancer
Generative adversarial networks (GANs) can synthesize various feasible looking images. We showed that a GAN, specifically, a conditional GAN (CGAN), can simulate breast mammograms with normal, healthy appearances, and can help detect mammographically-occult (MO) cancer. However, like other GANs, CGANs can suffer from various artifacts, e.g., checkerboard artifacts, that may impact the quality of the final synthesized image, as well as the performance of detecting MO cancer. In this study, we explored the types of GAN artifacts that exist in mammogram simulations and its effect on MO cancer detection. We first trained a CGAN using digital mammograms (FFDMs) of 1366 women with normal/healthy breasts. Then, we tested the trained CGAN on an independent MO cancer dataset with 333 women with dense breasts (97 MO cancer). We trained a convolutional neural network (CNN) on the MO cancer dataset, where real and simulated mammograms were fused, to identify women with MO cancer. We then randomly sampled 50 normal controls and found 11 and 7 cases with checkerboard and nipple artifacts, respectively. The mean and standard deviation score for the trained CNN for the cases with checkerboard and nipple artifacts were low, 0.236 ± 0.227 with [min, max] = [0.017, 0.761] and 0.069 ± 0.069 with [min, max] = [0.003, 0.213], respectively, showing the minimal effect of GAN artifacts on MO cancer detection.
Deep curriculum learning in task space for multi-class based mammography diagnosis
Jun Luo, Dooman Arefan, Margarita Zuley M.D., et al.
Mammography is used as a standard screening procedure for the potential patients of breast cancer. Over the past decade, it has been shown that deep learning techniques have succeeded in reaching near-human performance in a number of tasks, and its application in mammography is one of the topics that medical researchers most concentrate on. In this work, we propose an end-to-end Curriculum Learning (CL) strategy in task space for classifying the three categories of Full-Field Digital Mammography (FFDM), namely Malignant, Negative, and False recall. Specifically, our method treats this three-class classification as a “harder” task in terms of CL, and creates an “easier” sub-task of classifying False recall against the combined group of Negative and Malignant. We introduce a loss scheduler to dynamically weight the contribution of the losses from the two tasks throughout the entire training process. We conduct experiments on an FFDM dataset of 1,709 images using 5-fold cross validation. The results show that our curriculum learning strategy can boost the performance for classifying the three categories of FFDM compared to the baseline strategies for model training.
Improving lesion detection algorithm in digital breast tomosynthesis leveraging ensemble cross-validation models with multi-depth levels
We report an improved algorithm for detecting biopsy-proven breast lesions on digital breast tomosynthesis (DBT) where the given positive samples in the training set were limited. Instead of using a large scale inhouse dataset, our original algorithm used false positive findings (FPs) from non-biopsied (actionable) images to tackle the problem of a limited number of trainable samples. In this study, we further improved our algorithm by fusing multiple weak lesion detection models by using an ensemble approach. We used cross-validation (CV) to develop multiple lesion detection models. We first constructed baseline detection algorithms by varying the depth levels (medium and large) of the convolutional layers in the YOLOv5 algorithm using biopsied samples. We detected actionable FPs in non-biopsied images using a medium baseline model. We fine-tuned the baseline algorithms using the identified actionable FPs and the biopsied samples. For lesion detection, we processed the DBT volume slice-by-slice, then combined the estimated lesion of each slice along the depth of the DBT volume using volumetric morphological closing. Using 5-fold CV, we developed different multi-depth detection models for each depth level. Finally, we developed an ensemble algorithm by combining CV models with different depth levels. Our new algorithm achieved a mean sensitivity of 0.84 per DBT volume in the independent validation set from the DBTex challenge set, close to one of the top performing algorithms utilizing large inhouse data. These results show that our ensemble approach on different CV models is useful for improving the performance of the lesion detection algorithms.
Improving the performance of computer-aided classification of breast lesions using a new feature fusion method
Computer-Aided Diagnosis (CAD) schemes used to classify suspicious breast lesions typically include machine learning classifiers that are trained using features computed from either the segmented lesions or fixed regions of interest (ROIs) covering the lesions. Both methods have advantages and disadvantages. In this study, we investigate a new approach to train a machine learning classifier that fuses image features computed from both the segmented lesions and the fixed ROIs. We assembled a dataset with 2,000 mammograms. Based on lesion center, a ROI is extracted from each image. Among them, 1,000 ROIs depict verified malignant lesions and rest include benign lesions. An adaptive multilayer region growing algorithm is applied to segment suspicious lesions. Several sets of statistical features, texture features based on GLRLM, GLDM and GLCM, Wavelet transformed features and shape-based features are computed from the original ROI and segmented lesion, respectively. Three support vector machines (SVM) are trained using features computed from original ROIs, segmented lesions, and fusion of both, respectively, using a 10-fold cross-validation method embedded with a feature reduction method, namely a random projection algorithm. By applying the area under the ROC curve (AUC) as an evaluation index, our study results reveal no significant difference between AUC values computed using classification scores generated by two SVMs trained with features computed from original ROIs or segmented lesions. However, utilizing the fused features, AUC of SVM increases more than 10% (p<0.05). This study demonstrates that image features computed using the segmented lesions and the fixed ROIs contain complementary discriminatory information. Thus, fusing these features can significantly improve CAD performance.
Early prediction of metastasis in women with locally advanced breast cancer
Simona Rabinovici-Cohen, Tal Tlusty, Xosé M. Fernández, et al.
Women with locally advanced breast cancer are generally given neoadjuvant chemotherapy (NAC), in which chemotherapy and optionally targeted treatment is administered prior to the surgery. In current clinical practice, prior to the start of NAC, it is not possible to accurately predict whether the patient is likely to encounter metastasis after treatment. Metastasis (or distant recurrence) is the development of secondary malignant growths at a distance from a primary site of cancer. We explore the use of tumor thickness features computed from MRI imaging to predict the risk of post treatment metastasis. We performed a retrospective study on a cohort of 1738 patients who were administered NAC. Of these patients, 551 patients had magnetic resonance imaging (MRI) before the treatment started. We analyzed the multimodal data using deep learning and classical machine learning algorithms to increase the set of discriminating features. Our results demonstrate the ability to predict metastasis prior to the initiation of NAC treatment, using each modality alone. We then show the significant improvement achieved by combining the clinical and MRI modalities, as measured by the AUC, sensitivity, and specificity. The overall combined model achieved 0.747 AUC and 0.379 specificity at a sensitivity operation point of 0.99. We also use interpretability methods to explain the models and identify important clinical features for the early prediction of metastasis.
Translation of CAD-AI Methods to Clinical Practice: Are We There Yet?: Joint Session with Conferences 12033 and 12035
icon_mobile_dropdown
Effect of computerized decision support on diagnostic accuracy and intra-observer variability in multi-institutional observer performance study for bladder cancer treatment response assessment in CT urography
We have previously developed a computerized decision support system for bladder cancer treatment response assessment (CDSS-T) in CT urography (CTU). In this work, we conducted an observer study to evaluate the diagnostic accuracy and intra-observer variability with and without the CDSS-T system. One hundred fifty-seven pre- and posttreatment lesion pairs were identified in pre- and post- chemotherapy CTU scans of 123 patients. Forty lesion pairs had T0 stage (complete response) after chemotherapy. Multi-disciplinary observers from 4 different institutions participated in reading the lesion pairs, including 5 abdominal radiologists, 4 radiology residents, 5 oncologists, 1 urologist, and 1 medical student. Each observer provided estimates of the T0 likelihood after treatment without and then with the CDSST aid for each lesion. To assess the intra-observer variability, 51 cases were evaluated two times – the original and the repeated evaluation. The average area under the curve (AUC) of 16 observers for estimation of T0 disease after treatment increased from 0.73 without CDSS-T to 0.77 with CDSS-T (p = 0.003). For the evaluation with CDSS-T, the average AUC performance for different institutions was similar. The performance with CDSS-T was improved significantly and the AUC standard deviations were slightly smaller showing potential trend of more accurate and uniform performance with CDSS-T. There was no significant difference between the original and repeated evaluation. This study demonstrated that our CDSS-T system has the potential to improve treatment response assessment of physicians from different specialties and institutions, and reduce the inter- and intra-observer variabilities of the assessments.
Deep Learning I
icon_mobile_dropdown
Deep hybrid convolutional wavelet networks: application to predicting response to chemoradiation in rectal cancers via MRI
Amir Reza Sadri, Thomas DeSilvio, Prathyush Chirra, et al.
With increasing promise of radiomics and deep learning approaches in capturing subtle patterns associated with disease response on routine MRI, there is an opportunity to more closely combine components from both approaches within a single architecture. We present a novel approach to integrating multi-scale, multi-oriented wavelet networks (WN) into a convolutional neural network (CNN) architecture, termed a deep hybrid convolutional wavelet network (DHCWN). The proposed model comprises the wavelet neurons (wavelons) that use the shift and scale parameters of a mother wavelet function as its building units. Whereas the activation functions in a typical CNN are fixed and monotonic (e.g. ReLU), the activation functions of the proposed DHCWN are wavelet functions that are flexible and significantly more stable during optimization. The proposed DHCWN was evaluated using a multi-institutional cohort of 153 pre-treatment rectal cancer MRI scans to predict pathologic response to neoadjuvant chemoradiation. When compared to typical CNN and a multilayer wavelet perceptron (DWN-MLP) 2D and 3D architectures, our novel DHCWN yielded significantly better performance in predicting pathologic complete response (achieving a maximum accuracy of 91.23% and a maximum AUC of 0.79), across multi-institutional discovery and hold-out validation cohorts. Interpretability evaluation of all three architectures via Grad-CAM and Shapley visualizations revealed DHCWNs best captured complex texture patterns within tumor regions on MRI as associated with pathologic complete response classification. The proposed DHCWN thus offers a significantly more extensible, interpretable, and integrated solution for characterizing predictive signatures via routine imaging data.
Translation of CAD-AI Methods to Clinical Practice: Are We There Yet?: Joint Session with Conferences 12033 and 12035
icon_mobile_dropdown
Ensembling mitigates scanner effects in deep learning medical image segmentation with deep-U-Nets
Machine learning algorithms tend to perform better within the setting wherein they are trained, a phenomenon known as the domain effect. Deep learning-based medical image segmentation algorithms are often trained using data acquired from specific scanners; however, these algorithms are expected to accurately segment anatomy in images acquired from scanners different from the ones used to obtain training images for such algorithms. In this work, we present evidence of a scanner and magnet strength specific domain effect for a deep-U-Net trained to segment spinal canals on axial MR images. The trained network performs better on new data from the same scanner and worse on data from other scanners, demonstrating a scanner-specific domain effect. We then construct ensembles of the U-Nets, in which each U-Net in the ensemble differs from others only in initialization. Finally, we demonstrate that these UNet ensembles reduce the differential between in-domain and out-of-domain performance, thereby mitigating the domain effect associated with single U-Nets. Our study evidences the importance of developing software robust to scanner-specific domain effects to handle scanner bias in Deep Learning.
Deep Learning I
icon_mobile_dropdown
Open-world active learning for echocardiography view classification
Ghada Zamzmi, Tochi Oguguo, Sivaramakrishnan Rajaraman, et al.
Existing works for automated echocardiography view classification are designed under the assumption that the classes (views) in the testing set must be similar to those appeared in the training set (closed world classification). This assumption may be too strict for real-world environments that are open and often have unseen examples (views), thereby drastically weakening the robustness of the classical classification approaches. In this work, we developed an open world active learning approach for echocardiography view classification, where the network classifies images of known views into their respective classes and identifies images of unknown views. Then, a clustering approach is used to cluster the unknown views into various groups to be labeled by an echocardiologist. Finally, the new labeled samples are added to the initial set of known views and used to update the classification network. This process of actively labeling unknown clusters and integrating them into the classification model significantly increases the efficiency of data labeling and the robustness of the classifier. Our results using an echocardiography dataset containing known and unknown views showed the superiority of the proposed approach as compared to the closed world view classification approaches.
Applying a novel two-stage deep-learning model to improve accuracy in detecting retinal fundus images
Applications of artificial intelligence (AI) in medical imaging informatics have attracted broad research interest. In ophthalmology, for example, automated analysis of retinal fundus photography helps diagnose and monitor illnesses like glaucoma, diabetic retinopathy, hypertensive retinopathy, and cancer. However, building a robust AI model requires a large and diverse dataset for training and validation. While large number of fundus photos are available online, collecting them to create a clean, well-structured dataset is a difficult and manually intensive process. In this work, we propose a two-stage deep-learning system to automatically identify clean retinal fundus images and delete images with severe artifacts. In two stages, two transfer-learning models based the ResNet-50 architecture pre-trained using ImageNet data are built with Increased threshold values on SoftMax to reduce false positives. The first stage classifier identifies “easy” images, and the remaining “difficult” (or undetermined) images are further identified by the second stage classifier. Using the Google Search Engine, we initially retrieve 1,227 retinal fundus images. Using this two-stage deep-learning model yields a positive predictive value (PPV) of 98.56% for the target class compared to a single-stage model with a PPV of 95.74%. The two-stage model helps reduce by two-thirds the false positives for the retinal fundus image class. The PPV over all classes increases from 91.9% to 96.6% without compromising the number of images classified by the model. The superior performance of this two-stage model indicates that the building of an optimal training dataset can play an important role in increasing performance of deep-learning models.
Deciphering deep ensembles for lung nodule analysis
Empirical ensembles of deep convolutional neural network (DNN) models have been shown to outperform individual DNN models as seen in the last few ImageNet challenges. Several studies have also shown that ensemble DNNs are robust against out-of-sample data, making it an ideal approach for machine learning-enabled medical imaging tasks. In this work, we analyze deep ensembles for the task of classifying true and false lung nodule candidates in computed tomography (CT) volumes. Six ImageNet pretrained DNN models with minimal modifications for lung nodule classification were used to generate 63 ensemble DNNs using all possible combinations of DNN models. The checkpoint predictions during the training of the DNN models were used as a surrogate to understand the training trajectories each model took to result in the finalized model. The predictions from each checkpoint across the six DNN models were projected to a 2-dimensional space using uniform manifold approximation and projection method. The output scores from these six models were compared using a rank-biased overlap measure that incorporates larger weights for top scoring candidates and the ability to handle arbitrary sized list of candidates. Both these analyses indicate that diversity in the training process leads to diversity of the scores for the same training and test samples. The competition performance metric (CPM) from the free-response operating characteristic curve shows that as the number of DNN models in each ensemble increases, the CPM increases from an average of 0.750 to 0.79.
Stroke lesion localization in 3D MRI datasets with deep reinforcement learning
The efficacy of stroke treatments is highly time-sensitive, and any computer-aided diagnosis support method that can accelerate diagnosis and treatment initiation may improve patient outcomes. Within this context, lesion identification in MRI datasets can be time consuming and challenging, even for trained clinicians. Automatic lesion localization can expedite diagnosis by flagging datasets and corresponding regions of interest for further assessment. In this work, we propose a deep reinforcement learning agent to localize acute ischemic stroke lesions in MRI images. Therefore, we adapt novel techniques from the computer vision domain to medical image analysis, allowing the agent to sequentially localize multiple lesions in a single dataset. The proposed method was developed and evaluated using a database consisting of fluid attenuated inversion recovery (FLAIR) MRI datasets from 466 ischemic stroke patients acquired at multiple centers. 372 patients were used for training while 94 patients (20% of available data) were employed for testing. Furthermore, the model was tested using 58 datasets from an out-of-distribution test set to investigate the generalization error in more detail. The model achieved a Dice score of 0.45 on the hold-out test set and 0.43 on images from the out-of-distribution test set. In conclusion, we apply deep reinforcement learning to the clinically well-motivated task of localizing multiple ischemic stroke lesions in MRI images, and achieve promising results validated on a large and heterogeneous collection of datasets.
Segmentation of multiple myeloma plasma cells in microscopy images with noisy labels
Álvaro García Faura, Dejan Štepec, Tomaž Martinčič, et al.
A key component towards an improved cancer diagnosis is the development of computer-assisted tools. In this article, we present the solution that won the SegPC-2021 competition for the segmentation of multiple myeloma plasma cells in microscopy images. The labels in the competition dataset were generated semi-automatically and presented noise. To deal with it, new labels were generated from existing ones, heavy image augmentation was carried out and predictions were combined by a custom ensemble strategy. These techniques, along with state-of-the-art feature extractors and instance segmentation architectures, resulted in a mean Intersection-over-Union of 0.9389 on the SegPC-2021 final test set.
Detection
icon_mobile_dropdown
Exploring directed network connectivity in complex systems using large-scale augmented Granger causality (lsAGC)
Unveiling causal relationships among time-series in multivariate observational data is a challenging research topic. Such data may be represented by graphs, where nodes represent time-series, and edges directed causal influence scores between them. If the number of nodes exceeds the number of temporal observations, conventional methods, such as standard Granger causality, are of limited value, because estimating the free parameters of time-series predictors lead to under-determined problems. A typical example for this situation is functional Magnetic Resonance Imaging (fMRI), where the number of nodal observations is large, usually ranging from 102 to 105 time-series, while the number of temporal observations is low, usually less than 103. Hence, innovative approaches are required to address the challenges arising from such data sets. Recently, we have proposed the large-scale Augmented Granger Causality (lsAGC) algorithm, which is based on augmenting a dimensionalityreduced representation of the system’s state-space by supplementing data from the conditional source timeseries taken from the original input space. Here, we apply lsAGC on synthetic fMRI data with known ground truth and compare its performance to state-of-the-art methods leveraging the benefits of information-theoretic metrics. Our results suggest that the proposed lsAGC method significantly outperforms existing methods, both in diagnostic accuracy with Area Under the Receiver Operating Characteristic (AUROC = 0.894 vs. [0.727, 0.762] for competing methods, p < 10−9), and computation time (0.7 sec vs. [9.7, 4.8×103] sec for competing methods) benchmarks, demonstrating the potential of lsAGC for large-scale observations in neuroimaging studies of the human brain.
Integrating zonal priors and pathomic MRI biomarkers for improved aggressive prostate cancer detection on MRI
Automated detection of aggressive prostate cancer on Magnetic Resonance Imaging (MRI) can help guide targeted biopsies and reduce unnecessary invasive biopsies. However, automated methods of prostate cancer detection often have a sensitivity-specificity trade-off (high sensitivity with low specificity or vice-versa), making them unsuitable for clinical use. Here, we study the utility of integrating prior information about the zonal distribution of prostate cancers with a radiology-pathology fusion model in reliably identifying aggressive and indolent prostate cancers on MRI. Our approach has two steps: 1) training a radiology-pathology fusion model that learns pathomic MRI biomarkers (MRI features correlated with pathology features) and uses them to selectively identify aggressive and indolent cancers, and 2) post-processing the predictions using zonal priors in a novel optimized Bayes’ decision framework. We compare this approach with other approaches that incorporate zonal priors during training. We use a cohort of 74 radical prostatectomy patients as our training set, and two cohorts of 30 radical prostatectomy patients and 53 biopsy patients as our test sets. Our rad-path-zonal fusion-approach achieves cancer lesion-level sensitivities of 0.77±0.29 and 0.79±0.38, and specificities of 0.79±0.23 and 0.62±0.27 on the two test sets respectively, compared to baseline sensitivities of 0.91±0.27 and 0.94±0.21 and specificities of 0.39±0.33 and 0.14±0.19, verifying its utility in achieving balance between sensitivity and specificity of lesion detection.
Renal tumor analysis using multi-phase abdominal CT images
Contrast-enhanced multi-slice 3D CT images enable highly accurate analysis, diagnosis, and treatment of the kidney. Contrast-enhanced CT images provide more accurate information about blood vessels, organs, and lesions. Abdominal multiphase contrast-enhanced CT images are used and analyzed renal tumors of pallidocystic renal cell carcinoma (ccRCC), pigmented renal cell carcinoma (chRCC), papillary renal cell carcinoma (pRCC), angiomyolipoma, and oncocytoma. By analyzing these multiphase changes, the features of kidney tumors are extracted and classify them with high accuracy.
Deep learning based multiple sclerosis lesion detection utilizing synthetic data generation and soft attention mechanism
In this work we focus on identifying healthy brain slices vs brain slices with Multiple sclerosis (MS) lesions. MS is an autoimmune, demyelinating disease characterized by inflammatory lesions in the central nervous system. MRI is commonly used for diagnosis of MS, and enables accurate detection and classification of lesions for early diagnosis and treatment. Visual attention mechanisms may be beneficial for the detection of MS brain lesions, as they tend to be small. The attention mechanism prevents overfitting of the background when the amount of data is limited. In addition, enough data is necessary for training a successful machine learning algorithms for medical image analysis. Data with insufficient variability leads to poor classification performance. This is problematic in medical imaging where abnormal findings are uncommon and data labeling requires expensive expert’s time. In this work, we suggest a new network architecture, based on Y-net and EfficientNet models, with attention layers to improve the network performance and reduce overfitting. Furthermore, the attention layers allow extraction of lesion locations. In addition, we show an innovative regularization scheme on the attention weight mask to make it focus on the lesions while letting it search in different areas. Finally, we explore an option to add synthetic lesions in the training process. Based on recent work, we generate artificial lesions in healthy brain MRI scans to augment our training data. Our system achieves 91% accuracy in identifying cases that contain lesions (vs. healthy cases) with more than 13% improvement over an equivalent system without the attention and the data added.
Breast II
icon_mobile_dropdown
Association between DCE MRI background parenchymal enhancement and mammographic texture features
Natalie Baughan, Lindsay Douglas, Maya Ballard, et al.
Background parenchymal enhancement (BPE) from dynamic contrast-enhanced (DCE) MRI exams has been found in recent studies to be an indicator of breast cancer risk. To further understand the current framework of metrics to evaluate risk, we evaluated the association between human-engineered radiomic texture features calculated from mammograms and radiologist BPE ratings from corresponding DCE-MRI exams. This study included 100 unilaterally affected patients which had undergone both mammographic and DCE-MR breast imaging. BPE levels were provided from the radiology report and included four categories with the following numbers of patients: 14 minimal, 56 mild, 27 moderate, and 3 marked. All mammograms (12-bit quantization and 70-micron pixels) had been acquired with a Hologic Lorad Selenia system and were retrospectively collected under an IRB-approved protocol. A 512x512 pixel region of interest was selected in the central region behind the nipple on the mammogram of the unaffected breast and texture analysis was conducted to extract 45 features. Kendall’s tau-b and a two-sample t-test were used to evaluate relationships between mammographic texture and MRI BPE levels in five selected radiomic features. BPE categories were grouped into low (minimal/mild) and high (moderate/marked) for the t-test. Kendall test results indicated statistically significant correlations in all selected texture features after Holm-Bonferroni multiple comparisons correction. Two-sample t-test results found statistically significant differences between the high and low BPE categories for the selected texture feature of GLCM Sum Variance after Holm-Bonferroni multiple comparisons correction. These results indicate a significant association between coarse, low spatial frequency mammographic patterns and increased BPE.
Effect of different molecular subtype reference standards in AI training: implications for DCE-MRI radiomics of breast cancers
A single breast cancer lesion can have different luminal molecular subtyping when using either immunohistochemical (IHC) staining alone or the St. Gallen criteria that includes Ki-67. This may impact artificial intelligence/computer aided diagnosis (AI/CADx) for determining molecular subtype from medical images. We investigated this using 28 radiomic features extracted from DCE-MR images of 877 unique lesions segmented by a fuzzy c-means method, for three groups of lesions: (1) Luminal A lesions by both reference standards (“agreement”), (2) lesions that were Luminal A by IHC and Luminal B by St. Gallen (“disagreement”), and (3) Luminal B lesions by both reference standards (“agreement”). The Kruskal-Wallis (KW) test for statistically significant differences in groups of lesions was sequentially followed by the Mann-Whitney U test to determine pair-wise statistical difference between groups for relevant features from the KW test. Classification of lesions as Luminal A or Luminal B using all available radiomic features was conducted using three sets of lesions: (1) lesions with IHC alone molecular subtyping, (2) lesions with St. Gallen molecular subtyping, and (3) agreement lesions. Five-fold cross-validation using stepwise feature selection/linear discriminant analysis classifier classified lesions in each set, with performance measured by the area under the receiver operating characteristic curve (AUC). Six features (sphericity, irregularity, surface area to volume ratio, variance of radial gradient histogram, sum average, and volume of most enhancing voxels) were significantly different among the three groups of features with mixed difference of the disagreement group of lesions to the two agreement luminal groups. When using agreement lesions, more features were selected for classification and the AUC was significantly higher (P < 0.003) than using lesions subtyped by either reference standard. The results suggest that the disagreement of reference standards may impact the development of medical imaging AI/CADx methods for determining molecular subtype.
Mammary duct detection using self-supervised encoders
Shannon Doyle, Francesco Dal Canton, Jelle Wesseling, et al.
Ductal Carcinoma in Situ (DCIS) constitutes 20–25% of all diagnosed breast cancers and is a well known potential precursor for invasive breast cancer.1 The gold standard method for diagnosing DCIS involves the detection of calcifications and abnormal cell proliferation in mammary ducts in Hematoxylin and Eosin (H&E) stained whole-slide images (WSIs). Automatic duct detection may facilitate this task as well as downstream applications that currently require tedious, manual annotation of ducts. Examples of these are grading of DCIS lesions2 and prediction of local recurrence of DCIS.3 Several methods have been developed for object detection in the field of deep learning. Such models are typically initialised using ImageNet transfer-learning features, as the limited availability of annotated medical images has hindered the creation of domain-specific encoders. Novel techniques such as self-supervised learning (SSL) promise to overcome this problem by utilising unlabelled data to learn feature encoders. SSL encoders trained on unlabelled ImageNet have demonstrated SSL’s capacity to produce meaningful representations, scoring higher than supervised features on the ImageNet 1% classification task.4 In the domain of histopathology, feature encoders (Histo encoders) have been developed.5, 6 In classification experiments with linear regression, frozen features of these encoders outperformed those of ImageNet encoders. However, when models initialised with histopathology and ImageNet encoders were fine-tuned on the same classification tasks, there were no differences in performance between the encoders.5, 6 Furthermore, the transferability of SSL encodings to object detection is poorly understood.4 These findings show that more research is needed to develop training strategies for SSL encoders that can enhance performance in relevant downstream tasks. In our study, we investigated whether current state-of-the-art SSL methods can provide model initialisations that outperform ImageNet pre-training on the task of duct detection in WSIs of breast tissue resections. We compared the performance of these SSL-based histopathology encodings (Histo-SSL) with ImageNet pre-training (supervised and self-supervised) and training from scratch. Additionally, we compared the performance of our Histo-SSL encodings with published Histo encoders by Ciga5 and Mormont6 on the same task.
WRDet: a breast cancer detector for full-field digital mammograms
Yen Nhi Truong Vu, Brent Mombourquette, Thomas Paul Matthews, et al.
Regular breast screening with mammography allows for early detection of cancer and reduces breast cancer mortality. However, significant false positive and false negative rates leave opportunities for improving diagnostic accuracy. Computer-aided detection (CAD) softwares have been available to radiologists for decades to address these issues. However, traditional CAD products have failed to improve interpretation of full-field digital mammography (FFDM) images in clinical practice due to low sensitivity and a large number of false positives per image. Usage of deep learning models have shown promise in improving performance of radiologists. Unfortunately, they still have a large amount of false positives per images at reasonable sensitivities. In this work, we propose a simple and intuitive two-stage detection framework, named WRDet. WRDet consists of two stages: a region proposal network that has been optimized to enhance sensitivity and a second-stage patch classifier that boosts specificity. We highlight different rules for matching predicted proposals and ground truth boxes that are commonly used in the mammography CAD literature and compare these rules in light of the high variability in quality of ground truth annotations of mammography datasets. We additionally propose a new criterion to match predicted proposals with loose bounding box annotations that is useful for two-stage CAD systems like WRDet. Using the common CAD matching criterion that considers a prediction true positive if its center falls within the ground truth annotation, our system achieves an overall sensitivity of 81.3% and 89.4% at 0.25 and 1 false positive mark per image, respectively. For the task of mass detection, we achieve a sensitivity of 85.3% and 92% at 0.25 and 1 false positive mark per image, respectively. We also compare our results with select models reported in literature using different matching criteria. Our results demonstrate the possibility of a CAD system that could be beneficial in improving accuracy of screening mammography worldwide.
Computer-aided detection for architectural distortion: a comparison of digital breast tomosynthesis and digital mammography
Architectural distortion (AD) is one of the breast abnormal signs in digital breast tomosynthesis (DBT) and digital mammography (DM). It is hard to be detected because of its subtle appearance and similar intensity with surrounding tissue. Since DBT is a three-dimensional imaging, it can address the problem of tissue superimposition in DM, so as to reduce false positives and false negatives. Several clinical studies have confirmed that radiologists can detect more ADs in DBT than in DM. These conclusions are based on subjective experience. To explore whether the engineering model and the experience of radiologists are consistent in AD detection tasks, this study compared the AD detection performance of a deep-learning-based computer-aided detection (CADe) model in DBT and DM images of the same group of cases. 394 DBT volumes and their corresponding DM images were collected retrospectively from 99 breast cancer screening cases. Among them, 203 DBT volumes and DM images contained ADs and the remaining 191 ones were negative group without any AD. Ten-fold cross-validation was used to train and evaluate the models and mean true positive fraction (MTPF) was used as figure-of-merit. The results showed that the CADe model achieved significantly better detection performance in DBT than DM (MTPF: 0.7026±0.0394 for DBT vs. 0.5870±0.0407 for DM, p=0.002). Qualitative analysis illustrated that DBT indeed had the ability to overcome tissue superimposition and showed more details of breast tissue. It helped the CADe model detect more ADs, which was consistent with clinical experience.
Deep Learning II
icon_mobile_dropdown
Addressing imaging accessibility by cross-modality transfer learning
Zhiyang Zheng, Yi Su, Kewei Chen, et al.
Multi-modality images usually exist for diagnosis/prognosis of a disease, such as Alzheimer’s Disease (AD), but with different levels of accessibility and accuracy. MRI is used in the standard of care, thus having high accessibility to patients. On the other hand, imaging of pathologic hallmarks of AD such as amyloid-PET and tau-PET has low accessibility due to cost and other practical constraints, even though they are expected to provide higher diagnostic/prognostic accuracy than standard clinical MRI. We proposed Cross-Modality Transfer Learning (CMTL) for accurate diagnosis/prognosis based on standard imaging modality with high accessibility (mod_HA), with a novel training strategy of using not only data of mod_HA but also knowledge transferred from the model based on advanced imaging modality with low accessibility (mod_LA). We applied CMTL to predict conversion of individuals with Mild Cognitive Impairment (MCI) to AD using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets, demonstrating improved performance of the MRI (mod_HA)-based model by leveraging the knowledge transferred from the model based on tau-PET (mod_LA).
Incorporate radiograph-reading behavior and knowledge into deep reinforcement learning for lesion localization
We design an intelligent tool that imitates radiologists' reading behavior and knowledge for lesion localization on radiographs using deep reinforcement learning (DRL). We formulate a human visual search behavior, i.e., 'first searching for, then focusing on' a region of interest (ROI), as a Markov Decision Process (MDP). The state of MDP is represented by a candidate ROI’s imaging features and historical actions of lesion searching. The imaging features are extracted by a convolutional neural network that is pre-trained for disease classification. The state transition in MDP is achieved by Qlearning. Specifically, we design a Q network to simulate a radiologist's succession of saccades and fixations - iteratively choosing the next ROI of radiograph to pay attention to while reading images. Guided by a novel rewarding scheme, our algorithm learns to iteratively zoom in for a close-up assessment of the potential abnormal sub-regions, until the termination condition is met. We train and test our model with 80% and 20% of the ChestX-ray8 dataset with pathologically confirmed bounding boxes (B-Boxes), respectively. The localization accuracy is measured at different thresholds of intersection over union (IoU) between the DRL-generated and the ground truth B-Box. The proposed method achieves accuracy of 0.996, 0.412, 0.667, 0.650 at threshold 0.1 respectively for cardiomegaly, mass, pneumonia, and pneumothorax. While our study is a preliminary work, it demonstrates a potential approach and the feasibility of incorporating human expert experience into a lesion detection-based machine learning task. Further investigation and evaluation of our method will be needed in future work.
Evaluating the sensitivity of deep learning to inter-reader variations in lesion delineations on bi-parametric MRI in identifying clinically significant prostate cancer
Deep learning based convolutional neural networks (CNNs) for prostate cancer (PCa) risk stratification employ radiologist delineated regions of interest (ROIs) on MRI. These ROIs contain the reader’s interpretation of the region of PCa. Variations in reader annotations change the features that are extracted from the ROIs, which may in turn affect classification performance of CNNs. In this study, we sought to analyze the effect of variations in inter-reader delineations of PCa ROIs on training of CNNs with regards to distinguishing clinically significant (csPCa) and insignificant PCa (ciPCa). We employed 180 patient studies (n=274 lesions) from 3 cohorts who underwent 3T multi-parametric MRI followed by MRI-targeted biopsy and/or radical prostatectomy. ISUP Gleason grade groups (GGG) obtained from pathology were used to determine csPCa (GGG≥2) and ciPCa (GGG=1). 5 experienced radiologists, with over 5 years of experience in prostate imaging, delineated PCa ROIs on bi-parametric MRI (bpMRI including T2 weighted (T2W) and diffusion weighted (DWI) sequences) within the training set (n1=160 lesions) using targeted biopsy locations. Patches were extracted using the ROIs which were then used to train individual CNNs (N1-N5) using the SqueezeNet architecture. The average volume for readerdelineated ROIs used for training varied greatly, ranging between 1106 and 2107 mm across all readers. The resulting networks showed no significant difference in classification performance (AUC= 0.82 ± 0.02) indicating that they were relatively robust to inter-reader variations in ROI. These models were evaluated on independent test sets (n2=85 lesions, n3=29 lesions) where ROIs were obtained by co-registration of MRI with post-surgical pathology, unaffected by inter-reader variations in ROIs. Network performance across D2 and D3 was 0.80±0.02 and 0.62 ± 0.03, respectively. The CNN predictions were moderately consistent, with ICC(2,1) scores across D2 and D3 being 0.74 and 0.54, respectively. Higher agreement in ROI overlap produced higher correlation in predictions on external test sets (R = 0.89, p < 0.05). Furthermore, higher average ROI volume produced greater AUC scores on D3, indicating that comprehensive ROIs may provide more features for DL networks to use in classification. Inter-reader variations in ROIs on MRI may influence the reliability and generalizability of CNNs trained for PCa risk stratification.
Categorization of tumor-derived cells from lung cancer with compact deep learning
Yongjian Yu, Jue Wang
Cell characterization is key to research medical signaling of cancer-derived cells in the peripheral blood sample under the high-resolution fluorescence microscope. The task has been challenging with traditional image processing and machine learning techniques due to imaging artifacts, noise, debris, de-focusing, shallow depth of field, and high variability in cell morphotypes and fluorescence. We present a compact deep learning method that combines cell component segmentation and grouping with a guided feature learning for categorizing circulating tumor cells from lung cancer liquid biopsy. The method demonstrates a promising performance with a small training dataset. It is effective, efficient, and valuable in low-cost clinical applications. Characterization of cancer-derived cells could provide vital insights into cancer metastasis and contribute to development of novel targeted therapies.
Bayesian uncertainty estimation for detection of long-tail and unseen conditions in abdominal images
Deep supervised learning provides an effective approach for developing robust models for various computer-aided diagnosis tasks. However, the underlying assumption is that the frequency of the samples between the different classes of the training dataset is similar or balanced. In real-world medical data, the positive classes often occur too infrequently to satisfy this assumption. Thus, there is an unmet need for deep learning systems that could automatically identify and adapt to the real-world conditions of imbalanced data. In this paper, we propose a novel Bayesian deep ensemble learning framework to address the problem of the representation learning of longtailed and out-of-distribution samples in medical images. By estimating the relative uncertainties of the input data, our framework is able to adapt to the imbalanced data for learning generalizable classifiers. To evaluate the framework, we trained and tested our framework on two public medical imaging datasets that consist of different imbalance ratios and imaging modalities. Our results on the semantic segmentation of high-resolution CT and MRI images achieved 0.93% recall, which represents a 3% relative improvement over previous state-of-the-art ensemble GANs in the handling of the associated long-tailed data and detection of out-of-distribution samples.
DG-GRU: dynamic graph based gated recurrent unit for age and gender prediction using brain imaging
Deep learning has revolutionized neuroimaging studies of psychiatric and neurological disorders. The difference between brain age and chronological age is a useful biomarker for identifying neurological disorders. Furthermore, delineating both age and gender is important for the study of illnesses exhibiting the phenotypic difference in these. In this paper, we focus on the prediction of age and gender from brain connectomes data which is a step further to full automation of disease prediction. We model the connectomes as brain graphs. Data is collected as functional MRI (fMRI) signals and the graphs are created by binarizing the correlation among the fMRI signals at the brain parcels considered as nodes. Such a graph represents the neurobiological functional connectivity. We further differentiate between static and dynamic connectivity. The former is constructed with the correlation of the overall signal at the nodes, while the latter is modeled as a sequence of brain graphs constructed over sequential time periods. Our hypothesis is that leveraging information from both the static and dynamic functional connectivity is beneficial to the task at hand. Our main contribution lies in our proposed novel input data representation and proposed recurrent graph-based deep learning model setting together. The proposed Dynamic Graph-based Gated Recurrent Unit (DG-GRU) comprises a mechanism to process both types of connectivities. In addition, it can be easily incorporated into any deep neural model. We show a thorough analysis of the model on two publicly available datasets HCP and ABIDE for two tasks to show the superiority of the model.
Neurology
icon_mobile_dropdown
Effects of feature type and multiple scanners on brain metastasis stereotactic radiosurgery outcome prediction
David DeVries, Frank Lagerwaard, Jaap Zindler, et al.
The prediction of brain metastasis (BM) response to stereotactic radiosurgery could assist clinicians when choosing BM treatments. This study investigates the prediction of in-field progression, out-of-field progression, and 1-year overall survival (OS) endpoints using a machine learning classifier. It also investigates the effect of feature type and magnetic resonance imaging (MRI) scanner variability on classifier performance. The study data set consisted of n = 110 BMs across 91 patients for which endpoints, seven clinical features, and MRI scans were available. 635 radiomic features were extracted from the MRI for the BM region-of-interest (ROI) and a 5mm BM ROI dilation. A 1000-iteration bootstrap experimental design was used with a random forest classifier to provide area under the receiver operating characteristic curve (AUC) estimates. This experimental design was used for multiple endpoints, groups of features, and data partitioning by scanner model. In-field progression, out-of-field progression, and 1-year OS were predicted with respective AUC estimates of 0.70, 0.57 and 0.66. For all endpoints, clinical and/or radiomic features from the BM ROI provided optimal performance. MR scanner variability was found to decrease classifier AUC in general, though pre-processing methods were found to counteract this effect for some scanner models. This study shows that in-field progression, out-of-field progression, and 1-year OS can all be predicted to some degree, with in-field progression being predicted most accurately. The effects of scanner variability indicate that more diverse data sets and robust methods to account for scanner variability are required before clinical translation.
Unsupervised anomaly detection in 3D brain MRI using deep learning with multi-task brain age prediction
Lesion detection in brain Magnetic Resonance Images (MRIs) remains a challenging task. MRIs are typically read and interpreted by domain experts, which is a tedious and time-consuming process. Recently, unsupervised anomaly detection (UAD) in brain MRI with deep learning has shown promising results to provide a quick, initial assessment. So far, these methods only rely on the visual appearance of healthy brain anatomy for anomaly detection. Another biomarker for abnormal brain development is the deviation between the brain age and the chronological age, which is unexplored in combination with UAD. We propose deep learning for UAD in 3D brain MRI considering additional age information. We analyze the value of age information during training, as an additional anomaly score, and systematically study several architecture concepts. Based on our analysis, we propose a novel deep learning approach for UAD with multi-task age prediction. We use clinical T1-weighted MRIs of 1735 healthy subjects and the publicly available BraTs 2019 data set for our study. Our novel approach significantly improves UAD performance with an AUC of 92.60% compared to an AUC-score of 84.37% using previous approaches without age information.
A fully convolutional neural network for explainable classification of attention deficit hyperactivity disorder
Attention deficit/hyperactivity disorder (ADHD) is characterized by symptoms of inattention, hyperactivity, and impulsivity, which affects an estimated 10.2% of children and adolescents in the United States. However, correct diagnosis of the condition can be challenging, with failure rates up to 20%. Machine learning models making use of magnetic resonance imaging (MRI) have the potential to serve as a clinical decision support system to aid in the diagnosis of ADHD in youth to improve diagnostic validity. The purpose of this study was to develop and evaluate an explainable deep learning model for automatic ADHD classification. 254 T1-weighted brain MRI datsets of youth aged 9-11 were obtained from the Adolescent Brain Cognitive Development (ABCD) Study, and the Child Behaviour Checklist DSM-Oriented ADHD Scale was used to partition subjects into ADHD and non-ADHD groups. A fully convolutional neural network (CNN) adapted from a state-of-the-art adult brain age regression model was trained to distinguish between the neurologically normal children and children with ADHD. Saliency voxel attribution maps were generated to identify brain regions relevant for the classification task. The proposed model achieved an accuracy of 71.1%, sensitivity of 68.4%, and specificity of 73.7%. Saliency maps highlighted the orbitofrontal cortex, entorhinal cortex, and amygdala as important regions for the classification, which is consistent with previous literature linking these regions to significant structural differences in youth with ADHD. To the best of our knowledge, this is the first study applying artiicial intelligence explainability methods such as saliency maps to the classification of ADHD using a deep learning model. The proposed deep learning classification model has the potential to aid clinical diagnosis of ADHD while providing interpretable results.
PET image harmonization using smoothing-CycleGAN
Alzheimer’s disease (AD) is the most common cause of dementia. It is characterized by irreversible memory loss and degradation of cognitive skills. Amyloid PET imaging has been used in the diagnosis of AD to measure the amyloid burden in the brain. It is quantified by the Standard Uptake Value Ratio (SUVR). However, there is great variability in SUVR measurements when different scanner models are used. Therefore, standardization and harmonization is required for quantitative assessments of amyloid PET scans in a multi-center or longitudinal study. Conventionally, PET image harmonization has been tackled either by standardization protocols at the time of image reconstruction, or by applying a smoothing function to bring PET images to a common resolution using phantom data. In this work, we propose an automatic approach that aims to match the data distribution of PET images through unsupervised learning. To that end, we propose Smoothing-CycleGAN, a modified cycleGAN that uses a 3D smoothing kernel to learn the optimum Point Spread Function (PSF) for bringing PET images into a common spatial resolution. We validate our approach using two sets of datasets, and we analyze the SUVR agreement before and after PET image harmonization. Our results show that the PSF of PET images that have different spatial resolutions can be estimated automatically using Smoothing-cycleGAN, which results in better SUVR agreement after image translation.
Lesion-preserving unpaired image-to-image translation between MRI and CT from ischemic stroke patients
Depending on the application, multiple imaging modalities are available for diagnosis in the clinical routine. As a result of this, repositories of patient scans often contain mixed modalities. This poses a challenge for image analysis methods, which require special modifications to work with multiple modalities. This is especially critical for deep learning-based methods, which require large amounts of data. Within this context, a typical example is follow-up imaging in acute ischemic stroke patients, which is an important step in determining potential complications from the evolution of a lesion. In this study, we addressed the mixed modalities issue by translating unpaired images between two of the most relevant follow-up stroke modalities, namely non-contrast computed tomography (NCCT) and fluid-attenuated inversion recovery (FLAIR) MRI. For the translation, we use the widely used cycle-consistent generative adversarial network (CycleGAN). To preserve stroke lesions after translation, we implemented and tested two modifications to regularize them: (1) we use manual segmentations of the stroke lesions as an attention channel when training the discriminator networks, and (2) we use an additional gradient-consistency loss to preserve the structural morphology. For the evaluation of the proposed method, 238 NCCT and 244 FLAIR scans from acute ischemic stroke patients were available. Our method showed a considerable improvement over the original CycleGAN. More precisely, it is capable to translate images between NCCT and FLAIR while preserving the stroke lesion’s shape, location, and modality-specific intensity (average Kullback-Leibler divergence improved from 2,365 to 396). Our proposed method has the potential of increasing the amount of available data used for existing and future applications while conserving original patient features and ground truth labels.
Head and Neck, Musculoskeletal
icon_mobile_dropdown
A CT-based radiomics model for predicting feeding tube insertion in oropharyngeal cancer
Patients with oropharyngeal cancer (OPC) treated with chemoradiation suffer treatment-related toxicities which can lead to nutritional deficiencies and weight loss. As a result, many of these patients will require supportive care interventions, such as a feeding tube. We aimed to develop a machine learning model to predict feeding tube insertion in patients with OPC (n=343). A total of 116 patients (34%) required a feeding tube. Primary gross tumor volumes were contoured on planning CT images for patients prior to treatment. PyRadiomics was used to compute 1212 radiomic features from these volumes on the original and filtered images. The dataset was split into independent training (n=244) and testing (n=99) datasets. LASSO feature selection was applied to select the optimal features to predict feeding tube insertion. Support vector machine (SVM) and random forest (RF) classifiers were built using the selected features on the training dataset. The machine learning models’ performances were assessed in the testing dataset based on the metric of the AUC. Through feature selection, seven predictive features were selected. This included one original texture, two filtered first order, three filtered texture, and one clinical feature. The top performing classifier was the RF model which achieved an AUC of 0.69 [95% CI: 0.57-0.80] in the testing dataset. To the best of our knowledge, this is the first study to use radiomics to predict feeding tube insertion. This model could assist physicians in identifying patients who may benefit from prophylactic feeding tube insertion, ultimately improving quality of life for patients with OPC.
A deep network ensemble for segmentation of cervical spinal cord and neural foramina
The automated interpretation of spinal imaging using machine learning has emerged as a promising method for standardizing the assessment and diagnosis of numerous spinal column pathologies. While magnetic resonance images (MRIs) of the lumbar spine have been extensively studied in this context, the cervical spine remains vastly understudied. Our objective was to develop a method for automatically delineating cervical spinal cord and neural foramina on axial MRIs using machine learning. In this study, we train a state-of-the-art algorithm, namely a multiresolution ensemble of deep U-Nets, to delineate cervical spinal cord and neural foramina on 50 axial T2-weighted MRI-series segmented by a team of expert clinicians. We then evaluate algorithm performance against two independent human raters using 50 separate MRI-series. Dice coefficients, Hausdorff coefficients, and average surface distances (ASDs) were computed for this final set between the algorithm and each rater, and between raters, in order to evaluate algorithm performance for each segmentation task. The resulting cervical cord Dice coefficients were 0.76 (auto vs human, average) and 0.87 (human vs human), and the cervical foramina Dice coefficients were 0.57 (auto vs human, average) and 0.59 (human vs human). Hausdorff coefficients and ASDs reflected similar results. We conclude that the algorithm achieved a higher degree of consistency with human raters for cervical cord than for cervical foramina, and that cervical foramina are challenging to segment accurately for both humans and machine. Further technical development in machine learning is necessary to accurately segment the highly anatomically variable neural foramina of the human spine.
BAPGAN: GAN-based bone age progression of femur and phalange x-ray images
Shinji Nakazawa, Changhee Han, Joe Hasei, et al.
Convolutional Neural Networks play a key role in bone age assessment for investigating endocrinology, genetic, and growth disorders under various modalities and body regions. However, no researcher has tackled bone age progression/regression despite its valuable potential applications: bone-related disease diagnosis, clinical knowledge acquisition, and museum education. Therefore, we propose Bone Age Progression Generative Adversarial Network (BAPGAN) to progress/regress both femur/phalange X-ray images while preserving identity and realism. We exhaustively confirm the BAPGAN’s clinical potential via Fr´echet Inception Distance, Visual Turing Test by two expert orthopedists, and t-Distributed Stochastic Neighbor Embedding.
A novel CNN+LSTM approach to classify frontal chest x-rays for spine fractures
Larissa C. Schudlo, Yiting Xie, Kirstin Small, et al.
Spine fractures are serious injuries that are often missed on standard chest x-rays, especially the frontal view. Automatic detection using artificial intelligence could help reduce these missed findings. To detect fractures, radiologists compare the morphological characteristics within the ordered vertebrae sequence. To this end, we designed a time-distributed CNN+LSTM model with Res-Net-inspired backbone to classify a sequence of vertebral image patches extracted from a frontal chest x-ray. While the CNN component encapsulates spatial characteristics of vertebrae, the time-distributed aspect ensures the sequential information of each image is explicitly preserved and better exploited by the LSTM. A Dense UNet was trained to automatically detect vertebrae locations for patch extraction. For this retrospective researchonly study, a comparative analysis was performed using six random partitions of 1250 images, and the AUCs for each partition were averaged. Using manual vertebrae detection to extract vertebral images patches from the x-rays, the proposed method achieved an average AUC of 83.17 ± 3.8%. Using the Dense UNet for automatic vertebrae detection in the pipeline, the proposed method achieved an average AUC of 79.3 ± 1.8% in differentiating x-rays with and without vertebral fractures. This is an average improvement of 6.0% relative to the same CNN+LSTM architecture that is not time distributed. Using a model ensemble, an AUC of 82.7 ± 3.7% was achieved. These findings suggest the importance of exploiting both sequential and spatial information for fracture detection.
Efficient endoscopic frame informativeness assessment by reusing the encoder of the primary CAD task
The majority of the encouraging experimental results published on AI-based endoscopic Computer-Aided Detection (CAD) systems have not yet been reproduced in clinical settings, mainly due to highly curated datasets used throughout the experimental phase of the research. In a realistic clinical environment, these necessary high image-quality standards cannot be guaranteed, and the CAD system performance may degrade. While several studies have previously presented impressive outcomes with Frame Informativeness Assessment (FIA) algorithms, the current-state of the art implies sequential use of FIA and CAD systems, affecting the time performance of both algorithms. Since these algorithms are often trained on similar datasets, we hypothesise that part of the learned feature representations can be leveraged for both systems, enabling a more efficient implementation. This paper explores this case for early Barrett cancer detection by integrating the FIA algorithm within the CAD system. Sharing the weights between two tasks reduces the number of parameters from 16 to 11 million and the number of floating-point operations from 502 to 452 million. Due to the lower complexity of the architecture, the proposed model leads to inference time up to 2 times faster than the state-of-the-art sequential implementation while retaining the classification performance.
Radiomics, Radiogenomics, Multi-omics
icon_mobile_dropdown
Multi-modal learning with missing data for cancer diagnosis using histopathological and genomic data
Multi-modal learning (e.g., integrating pathological images with genomic features) tends to improve the accuracy of cancer diagnosis and prognosis as compared to learning with a single modality. However, missing data is a common problem in clinical practice, i.e., not every patient has all modalities available. Most of the previous works directly discarded samples with missing modalities, which might lose information in these data and increase the likelihood of overfitting. In this work, we generalize the multi-modal learning in cancer diagnosis with the capacity of dealing with missing data using histological images and genomic data. Our integrated model can utilize all available data from patients with both complete and partial modalities. The experiments on the public TCGA-GBM and TCGA-LGG datasets show that the data with missing modalities can contribute to multi-modal learning, which improvesthe model performance in grade classification of glioma cancer.
Uncertainty estimation in classification of MGMT using radiogenomics for glioblastoma patients
W. Farzana, Z. A. Shboul, A. Temtam, et al.
Glioblastoma Multiforme (GBM) is one of the most malignant brain tumors among all high-grade brain cancers. Temozolomide (TMZ) is the first-line chemotherapeutic regimen for glioblastoma patients. The methylation status of the O6-methylguanine-DNA-methyltransferase (MGMT) gene is a prognostic biomarker for tumor sensitivity to TMZ chemotherapy. However, the standardized procedure for assessing the methylation status of MGMT is an invasive surgical biopsy, and accuracy is susceptible to resection sample and heterogeneity of the tumor. Recently, radio-genomics which associates radiological image phenotype with genetic or molecular mutations has shown promise in the non-invasive assessment of radiotherapeutic treatment. This study proposes a machine-learning framework for MGMT classification with uncertainty analysis utilizing imaging features extracted from multimodal magnetic resonance imaging (mMRI). The imaging features include conventional texture, volumetric, and sophisticated fractal, and multi-resolution fractal texture features. The proposed method is evaluated with publicly available BraTS-TCIA-GBM pre-operative scans and TCGA datasets with 114 patients. The experiment with 10-fold cross-validation suggests that the fractal and multi-resolution fractal texture features offer an improved prediction of MGMT status. The uncertainty analysis using an ensemble of Stochastic Gradient Langevin Boosting models along with multi-resolution fractal features offers an accuracy of 71.74% and area under the curve of 0.76. Finally, analysis shows that our proposed method with uncertainty analysis offers improved predictive performance when compared with different well-known methods in the literature.
Iterative ComBat methods for harmonization of radiomic features
Hannah Horng, Apurva Singh, Bardia Yousefi, et al.
Background: ComBat is a promising harmonization method for radiomic features, but it cannot harmonize simultaneously by multiple batch effects and shows reduced performance in the setting of bimodal distributions and unknown clinical/batch variables. In this study, we develop and evaluate two iterative ComBat approaches (Nested and Nested+GMM ComBat) to address these limitations and improve radiomic feature harmonization performance. Methods: In Nested ComBat, radiomic features are sequentially harmonized by multiple batch effects with order determined by the permutation associated with the smallest number of features with statistically significant differences due to batch effects. In Nested+GMM ComBat, a Gaussian mixture model is used to identify a scan grouping associated with a latent variable from the observed feature distributions to be added as a batch effect to Nested ComBat. These approaches were used to harmonize differences associated with contrast enhancement, spatial resolution due to reconstruction kernel, and manufacturer in radiomic datasets generated by using CapTK and PyRadiomics to extract features from lung CT datasets (Lung3 and Radiogenomics). Differences due to batch effects in the original data and data harmonized with standard ComBat, Nested ComBat, and Nested+GMM ComBat were assessed. Results: Nested ComBat exhibits similar or better performance compared to standard ComBat, likely due to bimodal feature distributions. Nested+GMM ComBat successfully harmonized features with bimodal distributions and in most cases showed superior harmonization performance when compared to Nested and standard ComBat. Conclusions: Our findings show that Nested ComBat can harmonize by multiple batch effects and that Nested+GMM ComBat can improve harmonization of bimodal features.
4D radiomics in dynamic contrast-enhanced MRI: prediction of pathological complete response and systemic recurrence in triple-negative breast cancer
Marco Caballo, Wendelien B. G. Sanderink, Luyi Han, et al.
We developed a four-dimensional (4D) radiomics approach for the analysis of breast cancer on dynamic contrast-enhanced (DCE) MRI scans. This approach quantifies 348 features related to kinetics, enhancement heterogeneity, and timedependent textural variation in 4D (3D over time) from the tumors and the peritumoral regions, leveraging both spatial and temporal image information. The potential of these features was studied for two clinical applications: the prediction of pathological complete response (pCR) to neoadjuvant chemotherapy (NAC), and of systemic recurrence (SR) in triplenegative (TN) breast cancers. For this, 72 pretreatment images of TN cancers (19 achieving pCR, 14 recurrence events), retrieved from a publicly available dataset (The Cancer Imaging Archive, Duke-Breast-Cancer-MRI dataset), were used. For both clinical problems, radiomic features were extracted from each case and used to develop a machine learning logistic regression model for outcome prediction. The model was trained and validated in a supervised leave-one-out cross validation fashion, with the input feature space reduced through statistical analysis and forward selection for overfitting prevention. The model was tested using the area under the receiver operating characteristics (ROC) curve (AUC), and statistical significance was assessed using the associated 95% confidence interval estimated through bootstrapping. The model achieved an AUC of 0.80 and 0.86, respectively for pCR and SR prediction. Both AUC values were statistically significant (p<0.05, adjusted for repeated testing). In conclusion, the developed approach could quantify relevant imaging biomarkers from TN breast cancers in pretreatment DCE-MRI images. These biomarkers were promising in the prediction of pCR to NAC and SR.
Visualization and unsupervised clustering of emphysema progression using t-SNE analysis of longitudinal CT images and SNPs
Chronic obstructive pulmonary disease (COPD) is predicted to become the third leading cause of death worldwide by 2030. A longitudinal study using CT scans of COPD is useful to assess the changes in structural abnormalities. In this study, we performed visualization and unsupervised clustering of emphysema progression using t-distributed stochastic neighbor embedding (t-SNE) analysis of longitudinal CT images, smoking history, and SNPs. The procedure of this analysis is as follows: (1) automatic segmentation of lung lobes using 3D U-Net, (2) quantitative image analysis of emphysema progression in lung lobes, and (3) visualization and unsupervised clustering of emphysema progression using t-SNE. Nine explanatory variables were used for the clustering: genotypes at two SNPs (rs13180 and rs3923564), smoking history (smoking years, number of cigarettes per day, pack-year), and LAV distribution (LAV size and density in upper lobes, LAV size, and density in lower lobes). The objective variable was emphysema progression which was defined as the annual change in low attenuation volume (LAV%/year) using linear regression. The nine-dimensional space was transformed to two-dimensional space by t-SNE, and divided into three clusters by Gaussian mixture model. This method was applied to 37 smokers with 68.2 pack-years and 97 past smokers with 51.1 pack-years. The results demonstrated that this method could be effective for quantitative assessment of emphysema progression by SNPs, smoking history, and imaging features.
Poster Session
icon_mobile_dropdown
Placenta accreta spectrum and hysterectomy prediction using MRI radiomic features
In women with placenta accreta spectrum (PAS), patient management may involve cesarean hysterectomy at delivery. Magnetic resonance imaging (MRI) has been used for further evaluation of PAS and surgical planning. This work tackles two prediction problems: predicting presence of PAS and predicting hysterectomy using MR images of pregnant patients. First, we extracted approximately 2,500 radiomic features from MR images with two regions of interest: the placenta and the uterus. In addition to analyzing two regions of interest, we dilated the placenta and uterus masks by 5, 10, 15, and 20 mm to gain insights from the myometrium, where the uterus and placenta overlap in the case of PAS. This study cohort includes 241 pregnant women. Of these women, 89 underwent hysterectomy while 152 did not; 141 with suspected PAS, and 100 without suspected PAS. We obtained an accuracy of 0.88 for predicting hysterectomy and an accuracy of 0.92 for classifying suspected PAS. The radiomic analysis tool is further validated, it can be useful for aiding clinicians in decision making on the care of pregnant women.
Multi-class prediction for improving intestine segmentation on non-fecal-tagged CT volume
This paper proposes an intestine segmentation method on CT volume based on a multi-class prediction of intestinal content materials (ICMs). The mechanical intestinal obstruction and the ileus (non-mechanical intestinal obstruction) are diseases which disrupt the movement of ICMs. Although clinicians find the obstruction point that movement of intestinal contents is required on CT volumes, it is difficult for non-expert clinicians to find the obstruction point. We have studied a CADe system which presents obstruction candidates to users by segmentation of the intestines on CT volumes. Generation of incorrect shortcuts in segmentation results was partly reduced in our proposed method by introducing distance maps. However, incorrect shortcuts still remained between the regions filled by air. This paper proposes an improved intestine segmentation method from CT volumes. We introduce a multi-class segmentation of ICMs (air, liquid, and feces). Reduction of incorrect shortcut generation is specifically applied to air regions. Experiments using 110 CT volumes showed that our proposed method reduced incorrect shortcuts. Rates of segmented regions that are analyzed as running through the intestine were 59.6% and 62.4% for the previous and proposed methods, respectively. This result partly implies that our proposed method reduced production of incorrect shortcuts.
Identifying an optimal machine learning generated image marker to predict survival of gastric cancer patients
Computer-aided detection and/or diagnosis (CAD) schemes typically include machine learning classifiers trained using handcrafted features. The objective of this study is to investigate the feasibility of identifying and applying a new quantitative imaging marker to predict survival of gastric cancer patients. A retrospective dataset including CT images of 403 patients is assembled. Among them, 162 patients have more than 5-year survival. A CAD scheme is applied to segment gastric tumors depicted in multiple CT image slices. After gray-level normalization of each segmented tumor region to reduce image value fluctuation, we used a special feature selection library of a publicly available Pyradiomics software to compute 103 features. To identify an optimal approach to predict patient survival, we investigate two logistic regression model (LRM) generated imaging markers. The first one fuses image features computed from one CT slice and the second one fuses the weighted average image features computed from multiple CT slices. Two LRMs are trained and tested using a leave-one-case-out cross-validation method. Using the LRM-generated prediction scores, receiving operating characteristics (ROC) curves are computed and the area under ROC curve (AUC) is used as index to evaluate performance in predicting patients’ survival. Study results show that the case prediction-based AUC values are 0.70 and 0.72 for two LRM-generated image markers fused with image features computed from a single CT slide and multiple CT slices, respectively. This study demonstrates that (1) radiomics features computed from CT images carry valuable discriminatory information to predict survival of gastric cancer patients and (2) fusion of quasi-3D image features yields higher prediction accuracy than using simple 2D image features.
Prediction of CD3 T-cell infiltration status in colorectal liver metastases: a radiomics-based imaging biomarker
Colorectal cancer (CRC) continues to be a leading cause of cancer-related death in the developed world due to metastatic progression of the disease. In an effort to improve the understanding of tumor biology and developing prognostic tools, it was found that CD3+ tumor infiltrating lymphocytes (TIL) had a very strong prognostic value in primary CRC as well as in colorectal liver metastases (CLM). Quantification of TILs remains labor intensive and requires tissue samples, hence being of limited use in the pre-operative period or in the context of non-operable disease. Computed tomography (CT) images however are widely available for patients with CLM. In this study, we propose a pipeline to predict CD3 T-cell infiltration in CLM from pre-operative CT images. Radiomic features were extracted from 58 automatically segmented CLM lesions. Subsequently, dimensionality reduction was performed by training an autoencoder (AE) on the full feature set. We then used AE bottleneck embeddings to predict CD3 T-cell density, stratified into two categories: CD3hi and CD3low. For this, we implemented a 1D convolutional neural network (1D-CNN) and compared its performance against five machine learning models using 5-fold cross-validation. Results showed that the proposed 1D-CNN outperformed the other trained models achieving a mean accuracy of 0.69 (standard deviation [SD], 0.01) and a mean area under the receiver operating curve (AUROC) of 0.75 (SD, 0.02) on the validation set. Our findings demonstrate a relationship between CT radiomic features and CD3 tumor infiltration status with the potential of noninvasively determining CD3 status from preoperative CT images.
Improving prostate cancer triage with GAN-based synthetically generated prostate ADC MRI
Alvaro Fernandez-Quilez, Omer Parvez, Trygve Eftestøl, et al.
Tumor classification in clinically significant (cS, Gleason score ≥ 7) or non-clinically significant (ncS, Gleason score < 7) plays a crucial role in patient management of prostate cancer (PCa), allowing to triage those patients that might benefit from an active surveillance approach from those that require an immediate action in the form of further testing or treatment. In spite of it, the current diagnostic pathway of PCa is substantially hampered by over-diagnosis of ncS lesions and under-detection of cS ones. Magnetic Resonance Imaging (MRI) has proven to be helpful in the stratification of tumors, but it relies on specialized training and experience. Despite the promise shown by deep learning (DL) methods, they are data-hungry approaches and rely on the availability of large amounts of annotated data. Standard augmentation techniques such as image translation have become the by default option to increase variability and data availability. However, the correlation between transformed data and original one limits the amount of information provided by them. Generative Adversarial Networks (GAN) present an alternative to classic augmentation techniques by creating synthetic samples. In this paper, we explore a conditional GAN (cGAN) architecture and a deep convolutional one (DCGAN) to generate synthetic apparent diffusion coefficient (ADC) prostate MRI. Following, we compare classic augmentation techniques with our GAN-based approach in a prostate cancer triage (classification of tumors) setting. We show that by adding synthetic ADC prostate MRI we are able to improve the final classification AUC of cS vs ncS tumors when compared to classic augmentation.
One class to rule them all: detection and classification of prostate tumors presence in bi-parametric MRI based on auto-encoders
Alvaro Fernandez-Quilez, Habib Ullah, Trygve Eftestøl, et al.
Prostate Cancer (PCa) is the fifth leading cause of death and the second most common cancer diagnosed among men worldwide. Current diagnostic practices suffer from a substantial overdiagnosis of indolent tumors. Deep Learning (DL) holds promise in automatizing prostate MRI analysis and enabling computer-assisted systems able to improve current practices. Nevertheless, large amounts of annotated data are commonly required for DL systems success. On the other hand, an experienced clinician is typically able to discern between a normal (no lesion) and an abnormal (contains PCa lesions) case after seeing a few normal cases, ultimately reducing the amount of data required to detect abnormal cases. This work exploits such an ability by making use of normal cases at training time and learning their distribution through auto-encoder-based architectures. We propose to use a threshold approach based on interquartile range to discriminate between normal and abnormal cases at evaluation time, quantified through the area under the curve (AUC). Furthermore, we show the ability of our method to detect lesions in those cases deemed as abnormal in an unsupervised way in T2w and apparent diffusion coefficient maps (ADC) MRI modalities.
Hepatic artery segmentation with 3D convolutional neural networks
Farina Kock, Grzegorz Chlebus, Felix Thielke, et al.
The segmentation of liver vessels is a crucial task for liver surgical planning. In selective internal radiation therapy, a catheter has to be placed into the hepatic artery, injecting radioactive beads to internally destroy tumor tissue. Based on a set of 146 abdominal CT datasets with expert segmentations, we trained three-level 3D U-Nets with loss-sensitive re-weighting. They are evaluated with respect to different measures including the Dice coefficient and the mutual skeleton coverage. The best model incorporates a masked loss for the liver area, which achieves a mean Dice coefficient of 0.56, a sensitivity of 0.69 and a precision of 0.66.
Colorectal polyp classification using confidence-calibrated convolutional neural networks
Computer-Aided Diagnosis (CADx) systems for in-vivo characterization of Colorectal Polyps (CRPs) which are precursor lesions of Colorectal Cancer (CRC), can assist clinicians with diagnosis and better informed decisionmaking during colonoscopy procedures. Current deep learning-based state-of-the-art solutions achieve a high classification performance, but lack measures to increase the reliability of such systems. In this paper, the reliability of a Convolutional Neural Network (CNN) for characterization of CRPs is specifically addressed by confidence calibration. Well-calibrated models produce classification-confidence scores that reflect the actual correctness likelihood of the model, thereby supporting reliable predictions by trustworthy and informative confidence scores. Two recently proposed trainable calibration methods are explored for CRP classification to calibrate the confidence of the proposed CNN. We show that the confidence-calibration error can be decreased by 33.86% (−0.01648 ± 0.01085), 48.33% (−0.04415 ± 0.01731), 50.57% (−0.11423 ± 0.00680), 61.68% (−0.01553 ± 0.00204) and 48.27% (−0.22074 ± 0.08652) for the Expected Calibration Error (ECE), Average Calibration Error (ACE), Maximum Calibration Error (MCE), Over-Confidence Error (OE) and Cumulative Calibration Error (CUMU), respectively. Moreover, the absolute difference between the average entropy and the expected entropy was considerably reduced by 32.00% (−0.04374 ± 0.01238) on average. Furthermore, even a slightly improved classification performance is observed, compared to the uncalibrated equivalent. The obtained results show that the proposed model for CRP classification with confidence calibration produces better calibrated predictions without sacrificing classification performance. This work shows promising points of engagement towards obtaining reliable and well-calibrated CADx systems for in-vivo polyp characterization, to assist clinicians during colonoscopy procedures.
Fully automated longitudinal tracking and in-depth analysis of the entire tumor burden: unlocking the complexity
We present a novel approach for handling complex information of lesion segmentation in CT follow-up studies. The backbone of our approach is the computation of a longitudinal tumor tree. We perform deep learning based segmentation of all lesions for each time point in CT follow-up studies. Subsequently, follow-up images are registered to establish correspondence between the studies and trace tumors among time points, yielding tree-like relations. The tumor tree encodes the complexity of the individual disease progression. In addition, we present novel descriptive statistics and tools for correlating tumor volumes and RECIST diameters to analyze significance of various markers.
Learning to triage by learning to reconstruct: a generative self-supervised approach for prostate cancer based on axial T2w MRI
Alvaro Fernandez-Quilez, Trygve Eftestøl, Svein Reidar Kjosavik, et al.
Prostate cancer (PCa) is the second most commonly diagnosed cancer worldwide among men. In spite of it, its current diagnostic pathway is substantially hampered by over-diagnosis of indolent lesions and under-detection of aggressive ones. Imaging techniques like magnetic resonance imaging (MRI) have proven to add additional value to the current diagnostic practices, but they rely on specialized training and can be time-intensive. Deep learning (DL) has arisen as an alternative to automatize tasks such as MRI analysis. Nevertheless, its success relies on large amounts of annotated data which are rarely available in the medical domain. Existing work tackling data scarcity commonly relies on ImageNet pre-training, which is sub-optimal due to the existing gap between the training and the task domain. We propose a generative self-supervised learning (SSL) approach to alleviate such issues. We show that by making use of an auto-encoder architecture and by applying different patch-level transformations such as pixel intensity or occlusion transformations to T2w MRI slices and then trying to recover the original T2w slice we are able to learn robust medical visual representations that are domain-specific. Furthermore, we show the usefulness of our approach by making use of the representations as an initialization method for PCa lesion classification downstream task. Following, we show how our method outperforms ImageNet initialization and how the performance gap increases as the amount of the available labeled data decreases. Furthermore, we provide a detailed sensitivity analysis of the different pixel manipulation transformations and their effect on the downstream task performance.
Intrinsic subtype classification of breast lesions on mammograms by contrastive learning
Chisako Muramatsu, Mikinao Oiwa, Tomonori Kawasaki, et al.
Periodic breast cancer screening with mammography is considered effective in decreasing breast cancer mortality. Once cancer is found, the best treatment is selected based on the characteristic of cancer. In this study, we investigated a method to classify breast cancer lesions into four molecular subtypes to assist diagnosis and treatment planning. Because of a limited number of samples and imbalanced types, the lesions were classified based on the similarities of samples using a contrastive learning. The convolutional neural network (CNN) was trained by self-supervised method using paired views of the same lesions with contrastive loss. The subtype was determined by k-nearest neighbor classifier using deep features obtained by the trained network. The proposed model was tested using 385 cases by a 4-fold cross validation. The results are compared with CNN models without and with pretraining. The result indicates the potential usefulness of the proposed method. The computerized subtype classification may support a prompt treatment planning and proper patient care.
Machine learning algorithm for classification of breast ultrasound images
Jennie Karlsson, Jennifer Ramkull, Ida Arvidsson, et al.
Breast cancer is the most common type of cancer globally. Early detection is important for reducing the morbidity and mortality of breast cancer. The aim of this study was to evaluate the performance of different machine learning models to classify malignant or benign lesions on breast ultrasound images. Three different convolutional neural network approaches were implemented: (a) Simple convolutional neural network, (b) transfer learning using pre-trained InceptionV3, ResNet50V2, VGG19 and Xception and (c) deep feature networks based on combinations of the four transfer networks in (b). The data consisted of two breast ultrasound image data sets: (1) an open, single-vendor, data set collected by Cairo University at Baheya Hospital, Egypt, consisting of 437 benign lesions and 210 malignant lesions, where 10% was set to be a test set and the rest was used for training and validation (development) and (2) An in-house, multi-vendor data set collected at Unilabs Mammography Unit, Skåne University Hospital, Sweden, consisting of 13 benign lesions and 265 malignant lesions, was used as an external test set. Both test sets were used for evaluating the networks. The performance measures used were area under the receiver operating characteristic curve (AUC), sensitivity, specificity and weighted accuracy. Holdout, i.e. the splitting of the development data into training and validation data sets just once, was used to find a model with as good performance as possible. 10-fold cross-validation was also performed to provide uncertainty estimates. For the transfer networks which were obtained with holdout, Gradient-weighted Class Activation Mapping was used to generate heat maps indicating which part of the image contributed to the network’s decision. For 10-fold cross-validation it was possible to achieve a mean AUC of 92% and mean sensitivity of 95% for the transfer network based on Xception when testing on the first data set. When testing on the second data set it was possible to obtain a mean AUC of 75% and mean sensitivity of 86% for the combination of ResNet50V2 and Xception.
DBNet: a dual-branch network for breast cancer classification in ultrasound images
Computer-aided diagnosis has been widely used in breast ultrasound images, and many deep learning-based models have emerged. However, the datasets used for breast ultrasound classification face the problem of category imbalance, which limits the accuracy of breast cancer classification. In this work, we propose a novel dual-branch network (DBNet) to alleviate the imbalance problem and improve classification accuracy. DBNet is constructed by conventional learning branch and re-balancing branch in parallel, which take universal sampling data and reversed sampling data as inputs, respectively. Both branches adopt ResNet-18 to extract features and share all the weights except for the last residual block. Additionally, both branches use the same classifier and share all the weights. The cross-entropy loss of each branch is calculated using the output logits and the corresponding groundtruth labels. The total loss of DBNet is designed as a linear weighted sum of two branches’ losses. To evaluate the performance of the DBNet, we conduct breast cancer classification on the dataset composed of 6309 ultrasound images with malignant nodules and 3527 ultrasound images with benign nodules. Furthermore, ResNet-18 and bilateral-branch network (BBN) are utilized as baselines. The results demonstrate that DBNet yields a result of 0.854 in accuracy, which outperforms the ResNet-18 and the BBN by 2.7% and 1.1%, respectively.
Deep-learning characterization and quantification of COVID-19 pneumonia lesions from chest CT images
D. Bermejo-Peláez, R. San José Estépar, M. Fernández-Velilla, et al.
A relevant percentage of COVID-19 patients present bilateral pneumonia. Disease progression and healing is characterized by the presence of different parenchymal lesion patterns. Artificial intelligence algorithms have been developed to identify and assess the related lesions and properly segment affected lungs, however very little attention has been paid to automatic lesion subtyping. In this work we present artificial intelligence algorithms based on CNN to automatically identify and quantify COVID-19 pneumonia patterns. A Dense-efficient CNN architecture is presented to automatically segment the different lesion subtypes. The proposed technique has been independently tested in a multicentric cohort of 100 patients, showing Dice coefficients of 0.988±0.01 for ground glass opacities, 0.948±0.05 for consolidations, and 0.999±0.0003 for healthy tissue with respect to radiologist’s reference segmentations, and high correlations with respect to radiologist severity visual scores.
Hybrid transformer for lesion segmentation on adaptive optics retinal images
Adaptive optics (AO) retinal imaging has enabled the visualization of cellular-level changes in the living human eye. However, imaging tissue-level lesions with such high resolution introduces unique challenges. At a fine spatial scale, intralesion features can resemble cells, effectively serving as camouflage and making it difficult to delineate the boundary of lesions. The size discrepancy between the tissue-level lesions and retinal cells is also highly variable, ranging from a difference of several-fold to greater than an order-of-magnitude. Here, we introduce a hybrid-transformer based on the combination of a convolutional LinkNet and a fully axial attention transformer network to consider both local and global image features, which excels at identifying tissue-level lesions within a cellular landscape. After training the hybrid transformer on 489 manually-annotated AO images, accurate lesion segmentation was achieved on a separate test dataset consisting of 75 AO images for validation. The segmentation accuracy achieved using the hybrid transformer was superior to the use of convolutional neural networks alone (U-Net and LinkNet) or transformer-based networks alone (AxialDeepLab and Medical Transformer) (p<0.05). These experimental results demonstrate that the combination of convolution and transformer networks are an efficient way to utilize both local and global image features for the purpose of lesion segmentation in medical imaging and may be important for computer-aided diagnosis that relies on accurate lesion segmentation.
Annotation and segmentation of diabetic retinopathy lesions: an explainable AI application
One of the leading causes of irreversible vision loss is Diabetic Retinopathy (DR). The International Clinical Diabetic Retinopathy scale (ICDRS) provides grading criteria for DR. Deep Convolutional Neural Networks (DCNNs) have high performance in DR grading in terms of classification evaluation metrics; however, these metrics are not sufficient for evaluation. The eXplainable Artificial Intelligence (XAI) methodology provides insight into the decisions made by networks by producing sparce, generic heat maps highlighting the most critical DR features. XAI also could not satisfy clinical criteria due to the lack of explanation on the number and types of lesions. Hence, we propose a computational tool box that provides lesion-based explanation according to grading system criteria for determining severity levels. According to ICDRS, DR has 10 major lesions and 4 severity levels. Experienced clinicians annotated 143 DR fundus images and we developed a toolbox containing 9 lesion-specified segmentation networks. Networks should detect lesions with high annotation resolution and then compute DR severity grade according to ICDRS. The network that was employed in this study is the optimized version of Holistically Nested Edge Detection Network (HEDNet). Using this model, the lesions such as hard exudates (Ex), cotton wool spots (CWS), microaneurysms (MA), intraretinal haemorrhages (IHE) and vitreous preretinal haemorrhages (VPHE) were properly detected but the prediction of lesions such as venous beading (VB), neovascularization (NV), intraretinal microvascular abnormalities (IRMA) and fibrous proliferation (FP) had low specificity. Consequently,this will affect the value of grading which uses the segmented masks of all contributing lesions.
Human-in-the-loop deep learning retinal image classification with customized loss function
Suhev Shakya, Mariana Vasquez, Yiyang Wang, et al.
Age-related Macular Degeneration (AMD) is a significant health burden that can lead to irreversible vision loss in the elderly population. Accurately classifying Optical Coherence Tomography (OCT) images is vital in computer-aided diagnosis (CAD) of AMD. Most CAD studies focus on improving classification results but ignore the fact that a classifier may predict a correct image label for the wrong reasons, i.e., the classifier provided a correct prediction label but looked at the wrong region. We propose a human-in-the-loop OCT image classification scheme that allows users to provide feedback on model explanation during the training process to address this limitation. We innovatively integrate a custom loss function with our expert’s annotation of the OCT images along with the model’s explanation. The model learns both to classify the images and the explanations (ground truth) using this loss. Our results indicate that the proposed method improves the model explanation accuracy over the baseline model by 85% while maintaining a high classification accuracy of over 95%.
Blood vessel segmentation in en-face OCTA images: a frequency based method
Anna Breger, Felix Goldbach, Bianca S. Gerendas, et al.
Optical coherence tomography angiography (OCTA) is a novel noninvasive imaging modality for visualization of retinal blood flow in the human retina. Using specific OCTA imaging biomarkers for the identification of pathologies, automated image segmentations of the blood vessels can improve subsequent analysis and diagnosis. We present a novel method for the vessel density identification based on frequency representations of the image, in particular, using so-called Gabor filter banks. The algorithm is evaluated qualitatively and quantitatively on an OCTA image in-house data set from 10 eyes acquired by a Cirrus HD-OCT device. Qualitatively, the segmentation outcomes received very good visual evaluation feedback by experts. Quantitatively, we compared resulting vessel density values with the automated in-built values provided by the device. The results underline the visual evaluation. Furthermore, for the evaluation of the substep of FAZ identification manual annotations of 2 expert graders were used, showing that our results coincide well in visual and quantitative manners. Lastly, we suggest the computation of adaptive local vessel density maps that allow straightforward analysis of retinal blood flow in a local manner.
Deep learning-based longitudinal CT registration for anatomy variation assessment during radiotherapy
Yabo Fu, Yang Lei, Zhen Tian, et al.
In proton therapy, quality assurance (QA) CT is often acquired along the treatment course to evaluate the dosimetric change caused by the patient anatomy variation and, if needed, replan the treatment on the new anatomy, particularly for Headand-Neck (HN) cancer which often involves many organs-at-risks (OARs) in close proximity to the targets and has a high replan rate around 45.6% after week 4. For this purpose, it is required to contour the OARs on all the acquired QA CT sets for dose-volume-histogram analysis and deform the QA CT to the planning CT to evaluate the anatomy variation and the accumulated dose over the treatment course. To facilitate this process, in this study, we have proposed deep learning based method for groupwise HN QACT deformable image registration to deform mutual image deformation between planning CT and QA CT in a single shot. A total of 30 patients’ datasets with one planning CT and 3 QA CT throughout the treatment were collected. The network was trained to register the CT images in both directions, namely registering the planning CT to each QACT and each QACT to the planning CT. The proposed mutual image registration framework can greatly improve the image registration accuracy as compared to the initial rigid image registration. The mean absolute error (MAE) and structural similarity index (SSIM) were calculated to evaluate the performance of the trained network. On average, The MAE 133±29 HU and 88±15 HU for the rigid and the proposed registration, respectively. The SSIM was on average 0.92±0.01 and 0.94±0.01 for the rigid and the proposed registration, respectively.
Detecting aggressive papillary thyroid carcinoma using hyperspectral imaging and radiomic features
Ka'Toria Leitch, Martin Halicek, Maysam Shahedi, et al.
Hyperspectral imaging (HSI) and radiomics have the potential to improve the accuracy of tumor malignancy prediction and assessment. In this work, we extracted radiomic features of fresh surgical papillary thyroid carcinoma (PTC) specimen that were imaged with HSI. A total of 107 unique radiomic features were extracted. This study includes 72 ex-vivo tissue specimens from 44 patients with pathology-confirmed PTC. With the dilated hyperspectral images, the shape feature of least axis length was able to predict the tumor aggressiveness with a high accuracy. The HSI-based radiomic method may provide a useful tool to aid oncologists in determining tumors with intermediate to high risk and in clinical decision making.
Linked psychopathology-specific factors and individual structural brain abnormalities in schizophrenia
Numerous studies have demonstrated substantial inter-individual symptom heterogeneity among patients with schizophrenia, which seriously affects the quantification of diagnosis and treatment schema. Normative model is a statistical model offering quantitative measurements of abnormal deviations under interindividual heterogeneity. Here, we explored the individual-specific associations among morphologic deviations from normative ranges of brain structure and specific symptomatology structure on three different dimensions without the effect of general disease effects. Specifically, we employed an exploratory bi-factor model for the PANSS scale and built normative models for two cortical measurements: cortical area and thickness. Significant correlations among different cortical measurements and latent symptom groups were observed, which could provide evidence to understand the pathophysiology of schizophrenia symptoms.
Fusion of clinical phenotypic and multi-modal MRI for acute bilirubin encephalopathy classification
Since the non-specificity of acute bilirubin encephalopathy (ABE), accurate classification based on structural MRI is intractable. Due to the complexity of the diagnosis, multi-modality fusion has been widely studied in recent years. The most current medical image classification researches only fuse image data of different modalities. Phenotypic features that may carry useful information are usually excluded from the model. In the paper, a multi-modal fusion strategy for classifying ABE was proposed, which combined the different modalities of MRI with clinical phenotypic data. The baseline consists of three individual paths for training different MRI modalities i.e., T1, T2, and T2-flair. The feature maps from different paths were concatenated to form multi-modality image features. The phenotypic inputs were encoded into a two-dimensional vector to prevent the loss of information. The Text-CNN was applied as the feature extractor of the clinical phenotype. The extracted text feature map will be concatenated with the multi-modality image feature map along the channel dimension. The obtained MRI-phenotypic feature map is sent to the fully connected layer. We trained/tested (80%/20%) the approach on a database containing 800 patients data. Each sample is composed of three modalities 3D brain MRI and its corresponding clinical phenotype data. Different comparative experiments were designed to explore the fusion strategy. The results demonstrate that the proposal achieves an accuracy of 0.78, a sensitivity of 0.46, and a specificity of 0.99, which outperforms the model using MRI or clinical phenotype as input alone. Our work suggests the fusion of clinical phenotype data and image data can improve the performance of ABE classification.
Segmentation of aorta and main pulmonary artery of non-contrast CT images using U-Net for chronic thromboembolic pulmonary hypertension: evaluation of robustness to contacts with blood vessels
Enlargement of the pulmonary artery is a morphological abnormality of pulmonary hypertension patients. Diameters of the aorta and main pulmonary artery (MPA) are useful for predicting the presence of pulmonary hypertension. A major problem in the automatic segmentation of the aorta and MPA from non-contrast CT images is the invisible boundary caused by contact with blood vessels. In this study, we applied U-Net to the segmentation of the aorta and MPA from non-contrast CT images for normal and chronic thromboembolic pulmonary hypertension (CTEPH) cases and evaluated the robustness to the contacts between blood vessels. Our approach of the segmentation consists of three steps: (1) detection of trachea branch point, (2) cropping region of interest centered to the trachea branch point, and (3) segmentation of the aorta and MPA using U-Net. The segmentation performances were compared in seven methods: 2D U-Net, 2D U-Net with pre-trained VGG-16 encoder, 2D U-Net with pre-trained VGG-19 encoder, 2D Attention U-Net, 3D U-Net, an ensemble method of them, and our conventional method. The aorta and MPA segmentation methods using these U-Net achieved higher performance than a conventional method. The contact boundaries of blood vessels caused lower performance compared with the non-contact boundaries, whereas the mean boundary distances were below about one pixel.
Similarity-based uncertainty scores for computer-aided diagnosis
Exploring ways to apply deep learning in high-stakes fields like medicine is an emerging research area. In particular, there is a significant amount of research in applying deep learning to medical image classification. The NIH/NCI Lung Image Database Consortium (LIDC) data set allows these techniques to be tested and applied on lung nodule data. It incorporates multiple nodule ratings, including the degree of spiculation, a visual characteristic radiologists consider when diagnosing nodule malignancy. Our ultimate motivation is to improve resource allocation during this process. We aim to flag ambiguous cases that may require more time or more opinions from radiologists. Specifically, using LIDC images, we propose to show a correlation between radiologist semantic disagreement on spiculation ratings and cases with a high level of uncertainty based on our novel methodology, hence flagging “hard” to diagnose cases and assisting radiologists in prioritizing their reviews. Our results show that we can implement meaningful uncertainty scores by clustering image features extracted from a Siamese Convolutional Neural network (SCNN). We found that the nodule images which fell under the highest 33% of our uncertainty scores captured more than 50% of the data with low and no radiologist agreement on spiculation. Moreover, our results flag more images in the spiculated (rather than not spiculated) category, that is images collocated with spiculated images in the feature space, suggesting that we may be capturing important disagreements.
A deep learning approach for COVID-19 screening and localization on chest x-ray images
Chest X-ray (CXR) images have a high potential in the monitoring and examination of various lung diseases, including COVID-19. However, the screening of a large number of patients with diagnostic hypothesis for COVID-19 poses a major challenge for physicians. In this paper, we propose a deep learning-based approach that can simultaneously suggest a diagnose and localize lung opacity areas in CXR images. We used a public dataset containing 5, 639 posteroanterior CXR images. Due to unbalanced classes (69.2% of the images are COVID-19 positive), data augmentation was applied only to images belonging to the normal category. We split the dataset into train and test sets with proportional rate at 90:10. To the classification task, we applied 5-fold cross-validation to the training set. The EfficientNetB4 architecture was used to perform this classification. We used a YOLOv5 pre-trained in COCO dataset to the detection task. Evaluations were based on accuracy and area under the ROC curve (AUROC) metrics to the classification task and mean average precision (mAP) to the detection task. The classification task achieved an average accuracy of 0.83 ± 0.01 (95% CI [0.81, 0.84]) and AUC of 0.88 ± 0.02 (95% CI [0.85, 0.89]) in 5-fold over the test dataset. The best result was reached in fold 3 (0.84 and 0.89 of accuracy and AUC, respectively). Positive results were evaluated by the opacity detector, which achieved a mAP of 59.51%. Thus, the good performance and rapid diagnostic prediction make the system a promising means to assist radiologists in decision making tasks.
A comparison of feature selection methods for the development of a prognostic radiogenomic biomarker in non-small cell lung cancer patients
This study aims at comparing methods for selecting optimal radiomic and gene expression features to develop a radiogenomic phenotype, that will be used to predict overall survival in non-small cell lung cancer (NSCLC) patients. Baseline CT images of 85 NSCLC patients (male/female: 58/27, event: death, adenocarcinoma/squamous cell carcinoma/unspecified: 41/32/12, in stages I/II/III/unspecified: 39/25/12/9) with gene expression profile (microarray data) of 33 genes were used from the NSCLC-Radiomics Genomics dataset, publicly available from the National Cancer Institute’s Cancer Imaging Archive (TCIA). The 33 genes were selected on the basis that they represent three major co-expression patterns (“signatures”) in the dataset. These included the histology, neuroendocrine (NE) and pulmonary surfactant systems (PSS) signature genes. ITKSNAP was used for 3D tumor volume segmentation from CT scans. Radiomic features (n=102) were extracted from the 3D tumor volume using the CaPTk software. The first approach performs the feature selection in two steps: intra-modal feature selection (select features within the radiomic and genomic modalities such that the features are not highly correlated with each other and do not have a skewed distribution, have a positive Mean Decrease in Accuracy (MDA) value and maximize the AUC in the prediction of overall survival) and inter-modal feature selection (select features that are not highly correlated with features from other modalities). The second approach builds upon the standard and widely used Principal Component Analysis but tries to improve on its poor performance for survival analysis by doing consensus clustering to determine the optimal number of feature clusters within the radiomic and genomic modalities. For each of the clusters, the first principal component is calculated and used as the representative feature for that highly correlative cluster. The third approach provides a supervised take on feature selection by fitting a Cox regression with lasso regularization on the radiomic and genomic features to obtain a correlation between the individual features and the overall survival outcome. The features which have the highest correlation with the outcome are selected. Consensus clustering with a 10% cutoff for minimum change in the cumulative distribution function is used to calculate the optimal multi-modal phenotypes from the optimal multi-modal features determined from these three approaches. The multi-modal phenotypes were combined with clinical factors of histology, stage and sex in five-fold cross-validated multivariate Cox proportional hazards models (200 iterations) of overall survival. In addition to the cross-validated cstatistics, we also built a model on the complete dataset, for each of the approaches, to evaluate the Kaplan Meier performance in separating participants above versus the median prognostic score. The first approach gives a survival prediction performance (0.61, [0.55,0.63]) that is comparable to the third approach (0.61, [0.56,0.65]). The second approach results in a model that has a comparably lower prognostic performance (0.54, [0.48,0.60]). All three approaches result in models that improve on the prognostic performance of the model built using only clinical covariates (0.53, [0.50,0.59]). This preliminary study aims to draw comparisons between the various methods used to select optimal features from multi-modal descriptors of tumor regions.
Fusion of multiple deep convolutional neural networks (DCNNs) for improved segmentation of lung nodules in CT images
We are developing DCNN-based methods for the segmentation of lung nodules of a wide variety of sizes, shapes, and margins. In this study, we developed several fusion methods for hybridizing multiple DCNNs with different structures to improve segmentation accuracy. Two groups of fusion methods that combined the output information from our previously designed shallow and deep U-shape based deep learning models (U-DL) were compared. Group-1 denotes the late fusion (LF) methods that concatenated the feature maps output from the last layers (before a sigmoid activation layer) of the two U-DLs. Group-2 denotes early fusion (EF) methods that combined multi-scale output information from the encoders and decoders, or decoders alone, of the two U-DLs. A set of 883 cases from the LIDC-IDRI database which contained lung nodules manually marked by at least two radiologists was selected for this study. We split the data into 683 training and validation and 200 independent test cases. The multiple DCNNs with the fusion method were trained simultaneously and end-to-end. The baseline LF-1 method using pre-defined thresholds achieved an average DICE coefficient of 0.718±0.159. The newly developed LF method using Squeeze and Excitation Attention Block (SEAB) followed by a sigmoid activation layer (LF-4), and two Convolutional Block Attention Modules (CBAM) based EF methods that combined multiscale information from the decoders alone (EF-2) or from both the encoders and decoders (EF-3) of the two U-DLs, achieved significantly (p<0.05) better performance with average DICE coefficients of 0.745±0.135, 0.745±0.142, and 0.747±0.142, respectively.
Avalanche decision schemes to improve pediatric rib fracture detection
Rib fractures are a sentinel injury for physical abuse in young children. When rib fractures are detected in young children, 80-100% of the time it is the result of child abuse. Rib fractures can be challenging to detect on pediatric radiographs given that they can be non-displaced, incomplete, superimposed over other structures, or oriented obliquely with respect to the detector. This work presents our efforts to develop an object detection method for rib fracture detection on pediatric chest radiographs. We propose a method entitled “avalanche decision” motivated by the reality that pediatric patients with rib fractures commonly present with multiple fractures; in our dataset, 76% of patients with fractures had more than one fracture. This approach is applied at inference and uses a decision threshold that decreases as a function of the number of proposals that clear the current threshold. These contributions were added to two leading single stage detectors: RetinaNet and YOLOv5. These methods were trained and tested with our curated dataset of 704 pediatric chest radiographs, for which pediatric radiologists labeled fracture locations and achieved an expert reader-to-reader F2 score of 0.76. Comparing base RetinaNet to RetinaNet+Avalanche yielded F2 scores of 0.55 and 0.65, respectively. F2 scores of base YOLOv5 and YOLOv5+Avalanche were 0.58 and 0.65, respectively. The proposed avalanche inferencing approaches provide increased recall and F2 scores over the standalone models.
Identifying sinus invasion in meningioma patients before surgery with deep learning
Meningioma is the most common intracranial non-malignant tumor but is usually closely associated with the major venous sinuses. It has been recognized by neurosurgeons that meningioma should be treated with different surgical options depending on the status of sinus invasion. Therefore, it is necessary to accurately identify the venous sinus invasion status of meningioma patients before surgery; however, appropriate techniques are still lacking. Our study aimed to construct a deep learning model for accurate determination of sinus invasion before surgery. In this study, we collected multi-modal imaging data and clinical information for a total of 1048 meningioma patients from two hospitals. ResNet-50 with a dual attention mechanism was used on the preprocessed T1c and T2WI data respectively, and the final model was generated by combining the two unimodal models. The classification performance was evaluated by the area under receiver operating characteristic (ROC) curve (AUC). The results implied that the multimodal fusion classification model showed good performance in predicting meningioma sinus invasion. Further analysis on the patients with different WHO gradings indicated that our model has the best classification ability under WHO grading 1 in an independent validation cohort (0.84 AUC). This study shows that deep learning is a reliable method for predicting sinus invasion in patients with meningioma before surgery.
Interpretable learning approaches in structural MRI: 3D-ResNet fused attention for autism spectrum disorder classification
Are there any abnormal reflection in the structural Magnetic Resonance Imaging(sMRI) of patients with autism spectrum disorder (ASD)? Although a few brain regions have been somehow implicated in the pathophysiologic mechanism of the disorder, the gold-standard for diagnosis based on sMRI has not been reached in the academic community. Recently, the powerful deep learning algorithms have been widely studied and applied, which provides a chance to explore the brain structural abnormalities of ASD by the visualization based on the deep learning model. In this paper, a 3D-ResNet with an attention subnet for ASD classification is proposed. The model combined the residual module and the attention subnet to mask the regions which are relevant or irrelevant to the classification during the feature extraction. The model was trained and tested by sMRI from Autism Brain Imaging Data Exchange (ABIDE). The result of 5-fold cross-validation shows an accuracy of 75%. The Grad-CAM was further applied to display the emphasized composition of the model during classification. The class activation mapping of multiple slices of the representation sMRI was visualized. The results show that there are high related signals in the regions near the hippocampus, corpus callosum, thalamus, and amygdala. This result may confirm some of the previous hypotheses. The work is not only limited to the classification of ASD but also attempts to explore the anatomic abnormality with a quite promising visualization-based deep learning approach.
Deep learning with vessel surface meshes for intracranial aneurysm detection
It is important that unruptured intracranial aneurysms (UIAs) are detected early for rupture risk and treatment assessment. Radiologists usually visually diagnose UIAs on Time-of-Flight Magnetic Resonance Angiographs (TOF-MRAs) or contrast-enhanced Computed Tomography Angiographs (CTAs). Several automatic UIA detection methods using voxel-based deep learning techniques have been developed, but are limited to a single modality. We propose modality-independent UIA detection by deep learning using mesh surface representations of brain vasculature. Vessels from a training set of 90 brain TOF-MRAs with UIAs were automatically segmented and converted to triangular surface meshes. Vertices and corresponding edges on the surface meshes were labelled as either vessel or aneurysm. A mesh convolutional neural network was trained using the labeled vessel surface meshes as input with a weighted cross-entropy loss function. The network was a U-Net style architecture with convolutional and pooling layers, which operates on mesh edges. The trained network predicted edges on vessel surface meshes, which corresponded to UIAs in a test set of 10 TOF-MRAs and a separate test set of 10 CTAs. UIAs were detected in the test MRAs with an average sensitivity of 65% and an average false positive count/scan of 1.8 and in the test CTAs, with a sensitivity of 65% and a false positive count of 4.1. Using vessel surface meshes it is possible to detect UIAs in TOF-MRAs and CTAs with comparable performance to state-of-the-art UIA detection algorithms. This may aid radiologists in automatic UIA detection without requiring the same image modality or protocol for follow-up imaging.
A study on 3D classical versus GAN-based augmentation for MRI brain image to predict the diagnosis of dementia with Lewy bodies and Alzheimer's disease in a European multi-center study
Petter Minne, Alvaro Fernandez-Quilez, Dag Aarsland, et al.
Every year around 10 million people are diagnosed with dementia worldwide. Higher life expectancy and population growth could inflate this number even further in the near future. Alzheimer’s disease (AD) is one of the primary and most frequently diagnosed dementia disease in elderly subjects. On the other hand, dementia with Lewy Bodies (DLB) is the third most common cause of dementia. A timely and accurate diagnosis of dementia is critical for patients’ management and treatment. However, its diagnostic is often challenging due to overlapping symptoms between the different forms of thee disease. Deep learning (DL) combined with magnetic resonance imaging (MRI) has shown potential improving the diagnostic accuracy of several neurodegenerative diseases. In spite of it, DL methods heavily rely on the availability of annotated data. Classic augmentation techniques such as translation are commonly used to increase data availability. In addition, synthetic samples obtained through generative adversarial networks (GAN) are becoming an alternative to classic augmentation. Such techniques are well-known and explored for 2D images, but little is known about their effects in a 3D setting. In this work, we explore the effects of 3D classic augmentation and 3D GAN-based augmentation to classify between AD, DLB and control subjects.
Image transformers for classifying acute lymphoblastic leukemia
Priscilla Cho, Sajal Dash, Aristeides Tsaris, et al.
Cancer is the leading cause of death by disease in American children. Each year, nearly 16,000 children in the United States and over 300,000 children globally are diagnosed with cancer. Leukemia is a form of blood cancer that originates in the bone marrow and accounts for one-third of pediatric cancers. This disease occurs when the bone marrow contains 20% or more immature white blood cell blasts. Acute lymphoblastic leukemia is the most prevalent leukemia type found in children, with half of all annual cases in the U.S. diagnosed for subjects under 20 years of age. To diagnose acute lymphoblastic leukemia, pathologists often conduct a morphological bone marrow assessment. This assessment determines whether the immature white blood cell blasts in bone marrow display the correct morphological characteristics, such as size and appearance of nuclei. Pathologists also use immunophenotyping via multi-channel flow cytometry to test whether certain antigens are present on the surface of blast cells; the antigens are used to identify the cell lineage of acute lymphoblastic leukemia. These manual processes require well-trained personnel and medical professionals, thus being costly in time and expenses. Computerized decision support via machine learning can accelerate the diagnosis process and reduce the cost. Training a reliable classification model to distinguish between mature and immature white blood cells is essential to the decision support system. Here, we adopted the Vision Transformer model to classify white blood cells. The Vision Transformer achieved superb classification performance compared to state-of-the-art convolutional neural networks while requiring less computational resources for training. Additionally, the latent self-attention architecture provided attention maps for a given image, providing clues as to which portion(s) of the image were significant in decision-making. We applied the Vision Transformer model and a convolutional neural network model to an acute lymphoblastic leukemia classification dataset of 12,528 samples and achieved accuracies of 88.4% and 86.2%.
Deep ensemble models with multiscale lung-focused patches for pneumonia classification on chest x-ray
Yoon Jo Kim, Jinseo An, Helen Hong
Recently, deep learning-based pneumonia classification has shown excellent performance in chest X-ray(CXR) images, but when analyzing classification results through visualization such as Grad-CAM, deep learning models have limitations in classifying by observing the outside of the lungs. To overcome these limitations, we propose a deep ensemble model with multiscale lung-focused patches for the classification of pneumonia. First, Contrast Limited Adaptive Histogram Equalization is applied to appropriately increase the local contrast while maintaining important features. Second, lung segmentation and multiscale lung-focused patches generation is performed to prevent pneumonia diagnosis from external lung region information. Third, we use a classification network with a Convolutional Block Attention Module to make the model to focus on meaningful regions and ensemble single models trained on large, middle and small-sized patches, respectively. For the evaluation of the proposed classification method, the model was trained on 5,216 pediatric CXRs and tested 624 images. Deep ensemble model trained on large and middle-sized patches showed the best performance with an accuracy of 92%, which is a 15%p improvement over the original single model.
Prediction of TNM stage in head and neck cancer using hybrid machine learning systems and radiomics features
The tumor, node, metastasis (TNM) staging system enables clinicians to describe the spread of head-and-necksquamous-cell-carcinoma (HNSCC) cancer in a specific manner to assist with the assessment of disease status, prognosis, and management. This study aims to predict TNM staging for HNSCC cancer via Hybrid Machine Learning Systems (HMLSs) and radiomics features. In our study, 408 patients were included from the Cancer Imaging Archive (TCIA) database, included in a multi-center setting. PET images were registered to CT, enhanced, and cropped. We created 9 sets including CT-only, PET-only, and 7 PET-CT fusion sets. Next, 215 radiomics features were extracted from HNSCC tumor segmented by the physician via our standardized SERA radiomics package. We employed multiple HMLSs, including 16 feature-extraction (FEA) + 9 feature selection algorithms (FSA) linked with 8 classifiers optimized by grid-search approach, with model training, fine-tuning, and selection (5-fold cross-validation; 319 patients), followed by external-testing of selected model (89 patients). Datasets were normalized by z-score-technique, with accuracy reported to compare models. We first applied datasets with all features to classifiers only; accuracy of 0.69 ± 0.06 was achieved via PET applied to Random Forest classifier (RFC); performance of external testing (~ 0.62) confirmed our finding. Subsequently, we employed FSAs/FEAs prior to the application of classifiers. We achieved accuracy of 0.70 ± 0.03 for Curvelet transform (fusion) + Correlation-based Feature Selection (FSA) + K-Nearest Neighbor (classifier), and 0.70 ± 0.05 for PET + LASSO (FSA) + RFC (classifier). Accuracy of external testing (0.65 & 0.64) also confirmed these findings. Other HMLSs, applied on some fused datasets, also resulted in close performances. We demonstrate that classifiers or HMLSs linked with PET only and PET-CT fusion techniques enabled relatively low improved accuracy in predicting TNM stage. Meanwhile, the combination of PET and RFC enabled good prediction of TNM in HNSCC.
Radiomic texture feature descriptor to distinguish recurrent brain tumor from radiation necrosis using multimodal MRI
M. S. Sadique, A. Temtam, E. Lappinen, et al.
Despite multimodal aggressive treatment with chemo-radiation-therapy, and surgical resection, Glioblastoma Multiforme (GBM) may recur which is known as recurrent brain tumor (rBT), There are several instances where benign and malignant pathologies might appear very similar on radiographic imaging. One such illustration is radiation necrosis (RN) (a moderately benign impact of radiation treatment) which are visually almost indistinguishable from rBT on structural magnetic resonance imaging (MRI). There is hence a need for identification of reliable non-invasive quantitative measurements on routinely acquired brain MRI scans: pre-contrast T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) that can accurately distinguish rBT from RN. In this work, sophisticated radiomic texture features are used to distinguish rBT from RN on multimodal MRI for disease characterization. First, stochastic multiresolution radiomic descriptor that captures voxel-level textural and structural heterogeneity as well as intensity and histogram features are extracted. Subsequently, these features are used in a machine learning setting to characterize the rBT from RN from four sequences of the MRI with 155 imaging slices for 30 GBM cases (12 RN, 18 rBT). To reduce the bias in accuracy estimation our model is implemented using Leave-one-out crossvalidation (LOOCV) and stratified 5-fold cross-validation with a Random Forest classifier. Our model offers mean accuracy of 0.967 ± 0.180 for LOOCV and 0.933 ± 0.082 for stratified 5-fold cross-validation using multiresolution texture features for discrimination of rBT from RN in this study. Our findings suggest that sophisticated texture feature may offer better discrimination between rBT and RN in MRI compared to other works in the literature.
Co-occurring diseases heavily influence the performance of weakly supervised learning models for classification of chest CT
Fakrul Islam Tushar, Vincent M. D'Anniballe, Geoffrey D. Rubin, et al.
Despite the potential of weakly supervised learning to automatically annotate massive amounts of data, little is known about its limitations for use in computer-aided diagnosis (CAD). For CT specifically, interpreting the performance of CAD algorithms can be challenging given the large number of co-occurring diseases. This paper examines the effect of co-occurring diseases when training classification models by weakly supervised learning, specifically by comparing multi-label and multiple binary classifiers using the same training data. Our results demonstrated that the binary model outperformed the multi-label classification in every disease category in terms of AUC. However, this performance was heavily influenced by co-occurring diseases in the binary model, suggesting it did not always learn the correct appearance of the specific disease. For example, binary classification of lung nodules resulted in an AUC of < 0.65 when there were no other co-occurring diseases, but when lung nodules cooccurred with emphysema, the performance reached AUC < 0.80. We hope this paper revealed the complexity of interpreting disease classification performance in weakly supervised models and will encourage researchers to examine the effect of co-occurring diseases on classification performance in the future.
Spotlight scheme: enhancing medical image classification with lesion location information
Jing Ni, Qilei Chen, Ping Liu, et al.
Medical image classification, aiming to categorize images according to the underlying lesion conditions, has been widely used in computer-aided diagnosis. Previously, most models are obtained via transfer learning where the backbone model is designed for and trained on generic image datasets, resulting in the lack of model interpretability. While adding lesion location information introduces domain-specific knowledge during transfer learning and thus helps mitigate the problem, it may bring more complicated ones. Many of the existing models are rather complex containing multiple disjoint CNN streams. In addition, they are mainly geared towards a specific task lacking adaptability across different tasks. In this paper, we present a simple and generic approach, named the Spotlight Scheme, to leverage the knowledge of lesion locations in image classification. In particular, in addition to the whole image classification stream, we add a spotlighted image stream by blacking out the non-suspicious regions. We then introduce a hybrid two-stage intermediate fusion module, namely, shallow tutoring and deep ensemble, to enhance the image classification performance. The shallow tutoring module allows the whole image classification stream to focus on the lesion area with the help of the spotlight stream. This module can be placed in any backbone architecture multiple times, and thus penetrates the entire feature extraction procedure. At a later point, a deep ensemble network is adopted to aggregate the two streams and learn a joint representation. The experimental results show state-of-the-art or competitive performance on three medical tasks, Retinopathy of Prematurity, glaucoma, and Colorectal polyps. In addition, we demonstrate the robustness of our scheme by showing that it consistently achieves promising results with different backbone architectures and model configurations.
A vector representation of local image contrast patterns for lesion classification
Weiguo Cao, Marc J. Pomeroy, Yongfeng Gao, et al.
Quantitative description of lesion image heterogeneity is a major task for computer-aided diagnosis of lesions, and it has remained a challenging task because the heterogeneity is associated with local image contrast patterns of each voxel. This work explores a novel vector representation of the local image contrast patterns of each voxel and learns the features from the local vector field across all voxels in the lesion volume. We generate a matrix from the first ring of surrounding voxels from each voxel in the image and perform a Karhunen-Loève transformation on this matrix. Using the eigenvectors associated with the largest three eigenvalues, we then generate a series of textures based on a vector representation of this matrix. Using an in-house dataset, experiments were performed to classify colorectal polyps using the learnt features and a Random Forest classifier to differentiate malignant from benign lesions. The outcomes show dramatic improvement for the lesion classification compared to seven existing classification methods (e.g. LBP, Haralick, VGG16), which learn the features from the original intensity image.
Fusion of handcrafted and deep transfer learning features to improve performance of breast lesion classification
Computer-aided detection and/or diagnosis schemes typically include machine learning classifiers trained using either handcrafted features or deep learning model-generated automated features. The objective of this study is to investigate a new method to effectively select optimal feature vectors from an extremely large automated feature pool and the feasibility of improving the performance of a machine learning classifier trained using the fused handcrafted and automated feature sets. We assembled a retrospective image dataset involving 1,535 mammograms in which 740 and 795 images depict malignant and benign lesions, respectively. For each image, a region of interest (ROI) around the center of the lesion is extracted. First, 40 handcrafted features are computed. Two automated feature set are extracted from a VGG16 network pretrained using the ImageNet dataset. The first automated feature set is extracted using pseudo color images created by stacking the original image, a bilateral filtered image, and a histogram equalized image. The second automated feature set is created by stacking the original image in three channels. Two fused feature sets are then created by fusing the handcrafted feature set with each automated feature set, respectively. Five linear support vector machines are then trained using a 10- fold cross-validation method. The classification accuracy and AUC of the SVMs trained using the fused feature sets performs significantly better than using handcrafted or automated features alone (p<0.05). Study results demonstrate that handcrafted and automated features contain complimentary information so that fusion together create classifiers with improved performance in classifying breast lesions as malignant or benign.
Cardiovascular disease and all-cause mortality risk prediction from abdominal CT using deep learning
Daniel C. Elton, Andy Chen, Perry J. Pickhardt, et al.
Cardiovascular disease is the number one cause of mortality worldwide. Risk prediction can help incentivize lifestyle changes and inform targeted preventative treatment. In this work we explore utilizing a convolutional neural network (CNN) to predict cardiovascular disease risk from abdominal CT scans taken for routine CT colonography in otherwise healthy patients aged 50-65. We find that adding a variational autoencoder (VAE) to the CNN classifier improves its accuracy for five year survival prediction (AUC 0.787 vs. 0.768). In four-fold cross validation we obtain an average AUC of 0.787 for predicting five year survival and an AUC of 0.767 for predicting cardiovascular disease. For five year survival prediction our model is significantly better than the Framingham Risk Score (AUC 0.688) and of nearly equivalent performance to method demonstrated in Pickhardt et al. (AUC 0.789) which utilized a combination of five CT derived biomarkers.
Neurovascular bundles segmentation on MRI via hierarchical object activation network
Yang Lei, Tonghe Wang, Justin Roper, et al.
Sexual dysfunction after radiotherapy for prostate cancer remains an important adverse late toxicity that has been correlated with the radiation dose to the neurovascular bundles (NVBs). Currently NVBs are not contoured as an organat-risk in standard-of-care radiotherapy since they are not clearly distinguishable on planning CT images. As a result, dose to the NVBs is not optimized during treatment planning. Recently, MR images with superior soft tissue contrast have made NVB contouring feasible. In this study, we aim to develop a deep learning-based method for automated segmentation of NVBs on MR images. Our proposed method, named hierarchical object activation network, consists of four subnetworks, i.e., feature extractor, fully convolutional one-state object detector (FCOS), hierarchical block and mask module. The feature extractor is used to select the informative features from MRI. The FCOS then locates a volume of interest (VOI) for the left and right NVBs. The hierarchical block enhances the feature contrast around NVB boundary and maintains its spatial continuity. The mask module then segments the NVBs from the refined feature map within the VOI. A three-fold cross-validation study was performed using 30 patient cases to evaluate the network performance. The left and right NVBs were segmented and compared with physician contours using several segmentation metrics. The Dice similarity coefficient (DSC) and mean surface distance (MSD) are as follows: (left) 0.72, 1.64 mm, and (right) 0.72, 1.84 mm. These results demonstrate the feasibility and efficacy of our proposed method for NVB segmentation from prostate MRI, which can be further used to spare NVBs during proton and photon radiotherapy.
Evaluation of the impact of physical adversarial attacks on deep learning models for classifying covid cases
The SARS-CoV-2 (COVID-19) disease rapidly spread worldwide, thus increasing the need to create new strategies to fight it. Several researchers in different fields have attempted to develop methods to early identifying it and mitigating its effects. The Deep Learning (DL) approach, such as the Convolutional Neural Networks (CNNs), has been increasingly used in COVID-19 diagnoses. These models intend to support decision-making and are doing well to detecting patient status early. Although DL models have good accuracy to support diagnosis, they are vulnerable to Adversarial Attacks. These attacks are new methods to make DL models biased by adding small perturbations on the original image. This paper investigates the impact of Adversarial Attacks on DL models for classifying X-ray images of COVID-19 cases. We focused on the attack Fast Gradient Sign Method (FGSM), which aims to add perturbations to the testing images by combining a perturbation matrix, producing a crafted image. We conduct the experiments analyzing the model’s performance attack-free and adding attacks. The following CNNs models were selected: DenseNet201, ResNet-50V2, MobileNetV2, NasNet and VGG16. In the attack-free environment, we reach precision around 99%. When it adds the attack, our results revealed that all models suffer from performance reduction, and the most affected was MobileNet that reduced its ability from 98.61% to 67.73%. However, the VGG16 network showed to be the least affected by the attacks. Our finds describe that DL models for COVID-19 are vulnerable to Adversarial Examples. The FGSM was capable of fooling the model, resulting in a significant reduction in the DL performance.
Contrastive learning meets transfer learning: a case study in medical image analysis
Yuzhe Lu, Aadarsh Jha, Ruining Deng, et al.
Annotated medical images are typically more rare than labeled natural images, since they are limited by domain knowledge and privacy constraints. Recent advances in transfer and contrastive learning have provided effective solutions to tackle such issues from different perspectives. The state-of-the-art transfer learning (e.g., Big Transfer (BiT)) and contrastive learning (e.g., Simple Siamese Contrastive Learning (SimSiam)) approaches have been investigated independently, without considering the complementary nature of such techniques. It would be appealing to accelerate contrastive learning with transfer learning, given that slow convergence speed is a critical limitation of modern contrastive learning approaches. In this paper, we investigate the feasibility of aligning BiT with SimSiam. From empirical analyses, different normalization techniques (Group Norm in BiT vs. Batch Norm in SimSiam) is a key hurdle of adapting BiT to SimSiam. When combining BiT with SimSiam, we evaluated the performance of using BiT, SimSiam, and BiT+SimSiam on CIFAR-10 and HAM10000 datasets. The results suggest that the BiT models accelerate the convergence speed of SimSiam. When used together, the model gives superior performance over both of its counterparts. We hope this study will motivate researchers to revisit the task of aggregating big pretrained models with contrastive learning models for image analysis.
Automated segmentation of pediatric brain tumors based on multi-parametric MRI and deep learning
Rachel Madhogarhia, Anahita Fathi Kazerooni, Sherjeel Arif, et al.
Background: Brain tumors are the most common solid tumors in children, and high-grade brain tumors are the leading cause of cancer-related death among all childhood cancers. Tumor segmentation is essential for surgical planning, treatment planning, and radiomics studies, but manual segmentation is time-consuming and has high interoperator variability. This paper presents a deep learning-based method for automated segmentation of pediatric brain tumors based on multi-parametric MRI. Methods: Multi-parametric MRI (MP-MRI) scans (T1, T1w-Gd, T2, and FLAIR) of 167 pediatric patients with de novo brain tumors, including a variety of tumor subtypes, were processed and manually segmented according to Response Assessment in Pediatric Neuro-Oncology (RAPNO) guidelines into five tumor subregions, i.e., enhancing tumor (ET), non-enhancing tumor (NET), cystic core (CC), cystic reactive (CR), and peritumoral edema (ED). Segmentations were revised and approved by experienced neuroradiologists and used as the ground truth (GT). The cohort was split into training (n=134) and independent test (n=33) subsets. DeepMedic, a three-dimensional convolutional neural network, was trained with 7-fold cross-validation, and the model’s parameters were tuned. Finally, the network was evaluated on the withheld test cohort, and its performance was assessed in comparison with GT segmentations. Results: Dice similarity score (mean+/-SD) was 0.71+/-0.28 for the whole tumor (union of all five subregions), 0.66+/- 0.30 for ET, 0.31+/-0.26 for NET, 0.34+/-0.38 for CC, 0.55+/-0.50 for CR, and 0.32+/-0.42 for ED. Conclusion: This model displayed good performance on segmentation of the whole tumor region of pediatric brain tumors and can facilitate detection of abnormal region for further clinical measurements.
Automatic polyp detection using SmartEndo-Net based on fusion feature pyramid network with mix-up edges
Colonoscopy is essential for examining colorectal polyp or cancer. Examining colonoscopy has allowed for a reduction in the incidence and mortality of colorectal cancer through the detection and removal of polyps. However, missed polyp rate during colonoscopy has been reported as approximately 24% and intra- and inter-observer variability for polyp detection rates among endoscopists has been an issue. In this paper, we propose a real-time deep learning-based colorectal polyp detection system called SmartEndo-Net. To extract the polyp information, ResNet-50 is used in the backbone. To enable high-level feature fusion, extra mix-up edges in all level of the fusion feature pyramid network (FPN) are added. Fusion features are fed to a class and box network to produce object class and bounding box prediction. SmartEndo-Net is compared with Yolo-V3, SSD, and Faster R-CNN. SmartEndo-Net recorded sensitivity of 92.17% and proposed network was higher 7.96%, 6.78%, and 10.05% than Yolo-V3, SSD, and Faster R-CNN. SmartEndo-Net showed stable detection results regardless of polyp size, shape, and surrounding structures.
Early prediction of the Alzheimer’s disease risk using Tau-PET and machine learning
Lujia Wang, Zhiyang Zheng, Yi Su, et al.
Alzheimer’s Disease (AD) is a devastating neurodegenerative disease. Recent advances in tau-positron emission tomography (PET) imaging allow quantitating and mapping out the regional distribution of one important hallmark of AD across the brain. There is a need to develop machine learning (ML) algorithms to interrogate the utility of this new imaging modality. While there are some recent studies showing promise of using ML to differentiate AD patients from normal controls (NC) based on tau-PET images, there is limited work to investigate if tau-PET, with the help of ML, can facilitate predicting the risk of converting to AD while an individual is still at the early Mild Cognitive Impairment (MCI) stage. We developed an early AD risk predictor for subjects with MCI based on tau-PET using Machine Learning (ML). Our ML algorithms achieved good accuracy in predicting the risk of conversion to AD for a given MCI subject. Important features contributing to the prediction are consistent with literature reports of tau susceptible regions. This work demonstrated the feasibility of developing an early AD risk predictor for subjects with MCI based on tau-PET and ML.
AI-human interactive pipeline with feedback to accelerate medical image annotation
Youngwon Choi, Marlena Garcia, Steven S. Raman, et al.
We propose an AI-human interactive pipeline to accelerate medical image annotation of large data sets. This pipeline continuously iterates on three steps. First, an AI system provides initial automated annotations to image analysts. Second, the analysts edit the annotations. Third, the AI system is upgraded with analysts’ feedback, thus enabling more efficient annotation. To develop this pipeline, we propose an AI system and upgraded workflow that is focused on reducing the annotation time while maintaining accuracy. We demonstrated the ability of the feedback loop to accelerate the task of prostate MRI segmentation. With the initial iterations on small batch sizes, the annotation time was reduced substantially.
High-resolution MR imaging using self-supervised parallel network
One general practice for radiotherapy in acquiring the magnetic resonance (MR) images is to acquire longitudinally coarse slices while keeping the in-plane spatial resolution high in order to shorten the scan time as well as to ensure enough body coverage. The purpose of this work is to develop a deep learningbased method for synthesizing longitudinal high-resolution (HR) MR images using parallel trained cycleconsistent generative adversarial networks (CycleGANs) with self-supervision. The parallel CycleGANs independently predict HR MR images in two planes along the directions that is orthogonal to the longitudinal MR scan direction. These predicted images are fused to generate the final synthetic HR MR images. MR images in the multimodal brain tumor segmentation challenge 2020 (BraTS2020) dataset were processed to investigate the performance of the proposed workflow with qualitative, by visual inspections on the image appearance, and quantitative, by calculations of normalized mean absolute error (NMAE), peak signal-to-noise ratio (PSNR) and structural similarity index measurement (SSIM), evaluations. Preliminary results show that the proposed method can generate HR MR images visually indistinguishable from the ground truth MR images. Quantitative evaluations show that the calculated metrics of synthetic HR MR images can all be enhanced for the T1, T1CE, T2 and FLAIR images. The feasibility of the proposed method to synthesize HR MR images using self-supervised parallel CycleGANs is demonstrated and its potential usage in common clinical practices of radiotherapy is expected.
Drug response prediction using deep neural network trained by adaptive resampling of histopathological images
Tomoharu Kiyuna, Noriko Motoi, Hiroshi Yoshida, et al.
Recent advances in programmed death-1 (PD-1) and programmed death-ligand 1 (PD-L1) immune checkpoint inhibitor (ICI) revolutionized the clinical practice in lung cancer treatment. The PD-L1 immunohistochemistry (IHC) test is a widely used biomarker for ICI responder in lung cancer. However, the accuracy of PD-L1 IHC test for selecting ICI responder is unsatisfactory, especially due to its low specificity. Therefore, methods which could effectively predict the efficacy of ICI are crucial for patient selection. In this article, we apply a deep neural network (DNN) to predict responders for anti-PD1 blockade on the basis of histopathological images of hematoxylin and eosin (H&E) stained tissue. The main difficulty to train DNN for responder prediction is the inability to accurately label at the image patch level. We employed a semi-supervised multi-instance learning (MIL) framework with adaptive positive patch selection in each region-of-interest (ROI). Using a dataset of 250 whole slide images (WSIs) of non-small cell lung cancer (111 responders and 139 nonresponders), we train a DNN-based MIL classifier on a case-level partition of the dataset (150 WSIs) and obtain an area under curve (AUC) of 0.773 for the test dataset (50 WSIs), which outperforms the PD-L1 IHC test (AUC=0.636). These results suggest that a DNN model can be used for assisting clinicians to make decision for treatment plan. We also confirm that the locations of adaptively selected positive patches can give valuable insights into the histological features associated with drug response.
Predicting hematoma expansion after spontaneous intracranial hemorrhage through a radiomics based model
Purpose: Intracranial hemorrhage (ICH) is characterized as bleeding into the brain tissue, intracranial space, and ventricles and is the second most disabling form of stroke. Hematoma expansion (HE) following ICH has been correlated with significant neurological decline and death. For early detection of patients at risk, deep learning prediction models were developed to predict whether hematoma due to ICH will expand. This study aimed to explore the feasibility of HE prediction using a radiomic approach to help clinicians better stratify HE patients and tailor intensive therapies timely and effectively. Materials and Methods: Two hundred ICH patients with known hematoma evolution, were enrolled in this study. An open-source python package was utilized for the extraction of radiomic features from both non-contrast computed tomography (NCCT) and magnetic resonance imaging (MRI) scans through characterization algorithms. A total of 99 radiomic features were extracted and different features were selected for network inputs for the NCCT and MR models. Seven supervised classifiers: Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbor and Multilayer Perceptron were used to build the models. A training:testing split of 80:20 and 20 iterations of Monte Carlo cross validation were performed to prevent overfitting and assess the variability of the networks, respectively. The models were fed training datasets from which they learned to classify the data based on pre-determined radiomic categories. Results: The highest sensitivity among the NCCT classifier models was seen with the support vector machine (SVM) and logistic regression (LR) of 72 ± 0.3% and 73 ± 0.5%, respectively. The MRI classifier models had the highest sensitivity of 68 ± 0.5% and 72 ± 0.5% for the SVM and LR models, respectively. Conclusions: This study indicates that the NCCT radiomics model is a better predictor of HE and that SVM and LR classifiers are better predictors of HE due to their more cautious approach indicated by a higher sensitivity metric.
A radiomics approach to distinguish non-contrast enhancing tumor from vasogenic edema on multi-parametric pre-treatment MRI scans for glioblastoma tumors
Glioblastoma (GBM) is a highly aggressive tumor with a heterogeneous tumor micro-environment that extends beyond the visible tumor margin and is known to play a substantial role in GBM recurrence. For instance, it is often difficult to distinguish infiltrating non-contrast enhancing tumor (nCET) from peritumoral edema due to their confounding appearances on T2w/FLAIR MRI scans. Thus, nCET is often left unresected and contributes to over 90% of recurrences in GBM tumors that occur within 2-cm of the resected tumor margin. Histopathologically, the infiltrative nCET has high cellularity compared to vasogenic edema. This work explores the hypothesis that these histopathological changes may be reflected on routine imaging as subtle micro-architectural texture differences, that can be captured via radiomic features, allowing for differentiating nCET from vasogenic edema on routine pre-treatment MRI scans (Gd-T1w, T2w, FLAIR). Our radiomic analysis involved registering the preoperative MRI sequences of GBM patients from two institutions to a healthy MNI atlas. In the absence of histopathological confirmation,‘ground truth’ for nCET region of interest (ROI) on pre-treatment scans was defined as the site of future recurrence (as established on post-treatment scan with histopathologically-confirmed recurrence). Similarly, the ROI for vasogenic edema was defined as a region far from the site of recurrence, within the FLAIR/T2w hyperintensity edema on pre-treatment MRI. For every nCET and vasogenic edema ROI, a total of 316 3D radiomic features (e.g., Haralick, Laws, Gabor) were extracted from every MRI sequence. Feature pruning was conducted on the features’ statistics (median, variance, skewness, kurtosis) and a sequential feed forward classification scheme that employed a support vector machine classifier was applied. The FLAIR sequence yielded the highest accuracy in distinguishing nCET from vasogenic edema ROIs with accuracy of 91.3% and 78.5% on training (n = 25) and test (n = 30) sets respectively. Additionally, combining radiomic features from all three MRI sequences yielded an accuracy of 92.3% and 89.3% on the training and test sets, respectively. These results show that our radiomic approach may allow for reliable distinction of nCET from vasogenic edema on routine MRI scans, and thus may aid in improving treatment management of GBM tumors.
Normalization of MRI signal intensity in polycystic kidney disease and the effect on radiomic features
Linnea E. Kremer, Natalie Perri, Eliza Sorber, et al.
This study investigated the impact of reference-tissue normalization on radiomic texture features extracted from magnetic resonance images (MRI) of non-cystic kidney parenchyma in patients with autosomal dominant polycystic kidney disease (ADPKD). Image normalization has been shown to improve robustness of features and disease classification. Texture analysis is a promising technique to differentiate between PKD1 and PKD2 variants of ADPKD, which differ in progression and patient outcomes. Regions of interest (ROIs) were placed on the liver and psoas muscle, and Z-score image normalization was performed separately based on the two different ROI placements. This pilot study included 7 PKD1 and 8 PKD2 patients (29 kidney images in total). Right and left kidneys were manually segmented on the single coronal image for each individual kidney that contained the renal artery, and a thresholding tool was used to exclude cysts from the pixels used for feature extraction. This was performed using the open-source platform Pyradiomics on the original and two variants of normalized images. Intraclass correlation coefficients (ICCs) were calculated to compare the reliability of features across normalized images. A linear discriminant analysis (LDA) classifier was used to merge the top-three performing reliable texture features for PKD1 and PKD2 classification based on the receiver operating characteristic (ROC) analysis. Seventeen of the 93 features demonstrated good-to-excellent reliability between normalization approaches. Psoas muscle-normalized images yielded the highest area under the ROC curve (AUC) value of 0.74 (0.53-0.89). Image normalization impacts texture features and classification of PKD1 and PKD2 using MRI-based texture features and should be further explored.
Multi-modality classification between myxofibrosarcoma and myxoma using radiomics and machine learning models
Myxofibrosarcoma is a rare, malignant myxoid soft tissue tumor. It can be challenging to distinguish it from a benign myxoma in clinical practice as there exists imaging and histologic feature overlap between these two entities. Some previous works used radiomics features of T1-weighted images to differentiate myxoid tumors, but few have used multimodality data. In this project, we collect a dataset containing 20 myxomas and 20 myxofibrosarcomas, each with a T1- weighted image, a T2-weighted image, and clinical features. Radiomics features from multi-modality images and clinical features are used to train multiple machine learning models. Our experiment results show that the prediction accuracy using the multi-modality features surpasses the results from a single modality. The radiomics features Gray Level Variance, Gray Level Non-uniformity Normalized extracted from the Gray Level Run Length Matrix (GLRLM) of the T2 images, and age are the top three features selected by the least absolute shrinkage and selection operator (LASSO) feature reduction model
Automated kidney segmentation by mask R-CNN in T2-weighted magnetic resonance imaging
Manu Goyal, Junyu Guo, Lauren Hinojosa, et al.
Despite the recent advances of deep learning algorithms in medical imaging, automatic segmentation algorithms for kidneys in Magnetic Resonance Imaging (MRI) examinations are lacking. Automated segmentation of kidneys in MRI can enable several clinical applications and use of radiomics and machine learning analysis of renal disease. In this work, we propose the application of a Mask R-CNN for the automatic segmentation of the kidneys in coronal T2- weighted single-shot fast spin echo MRI. We propose the morphological operations as post-processing to further improve the performance of Mask R-CNN for this task. With 5-fold cross-validation data, the proposed Mask R-CNN was trained and validated on 70 and 10 MRI exams, respectively, and then evaluated on the remaining 20 exams in each fold. Our proposed method achieved a dice score of 0.905 and Intersection over Union of 0.828.
Multi-institutional evaluation of a deep learning model for fully automated detection of aortic aneurysms in contrast and non-contrast CT
Yiting Xie, Benedikt Graf, Parisa Farzam, et al.
We developed and validated a research-only deep learning (DL) based automatic algorithm to detect thoracic and abdominal aortic aneurysms on contrast and non-contrast CT images and compared its performance with assessments obtained from retrospective radiology reports. The DL algorithm was developed using 556 CT scans. Manual annotations of aorta centerlines and cross-sectional aorta boundaries were created to train the algorithm. Aorta segmentation and aneurysm detection performances were evaluated on 2263 retrospective CT scans (154 thoracic and 176 abdominal aneurysms). Evaluation was performed by comparing the automatically detected aneurysm status to the aneurysm status reported in the radiology reports and the AUC was reported. In addition, a quantitative evaluation was performed to compare the automatically measured aortic diameters to manual diameters on a subset of 59 CT scans. Pearson correlation coefficient was used. For aneurysm detection, the AUC was 0.95 for thoracic aneurysm detection (95% confidence region [0.93, 0.97]) and 0.94 for abdominal aneurysm detection (95% confidence region [0.92, 0.96]). For aortic diameter measurement, the Pearson correlation coefficient was 0.973 (p<0.001).
Multi-channel medical image segmentation method in Hessian domain
Medical image segmentation is an important technique in surgical navigation, tumor quantification, computer-aided diagnosis and detection, etc. To enhance the accuracy of its performance, the image contrast among different tissues in multi-modality plays an important role and is still a great challenge. In this paper, we proposed an image segmentation method by merging multi-channels which are defined in Hessian domain. Every domain owns individual properties and is only sensitive to a few specific tissues, which should be an essential complement for other domains. The contrast between two neighboring tissues in their merging results provides more sensitive boundary information than the original images. Moreover, we also notice some weak boundaries are the major barrier in the processing of image segmentation. An unsupervised local segmentation scheme is proposed to combat this challenge by dividing the whole volume into small patches which are overlapped with each other. Our method is testified over five different organs with two major modalities and three noise levels, and yields very promising results. After comparing, our method exhibits its great superiority over slic and level set, two state-of-the-art methods with more accurate contours and boundaries.
Performance evaluation of lightweight convolutional neural networks on retinal lesion segmentation
M. Siebert, P. Rostalski
In addition to the recent development of deep learning-based, automatic detection systems for diabetic retinopathy (DR), efforts are being made to integrate those systems into mobile detection devices running on the edge requiring lightweight algorithms. Moreover, to enable clinical deployment it is important to enhance the transparency of the deep learning systems usually being black-box models and hence giving no insights into its reasoning. By providing precise segmentation masks for lesions being related to the severity of DR, a good intuition about the decision making of the diagnosing system can be given. Hence, to enable transparent mobile DR detection devices simultaneously segmenting disease-related lesions and running on the edge, lightweight models capable to produce fine-grained segmentation masks are required contradicting the generally high complexity of fully convolutional architectures used for image segmentation. In this paper, we evaluate both the runtime and segmentation performance of several lightweight fully convolutional networks for DR related lesion segmentation and assess its potential to extend mobile DR-grading systems for improved transparency. To this end, the U2-Net is downscaled to reduce the computational load by reducing feature size and applying depthwise separable convolutions and evaluated using deep model ensembling as well as single- and multi-task inference to improve performance and further reduce memory cost. Experimental results using the U2-Net-S† ensemble show good segmentation performance while maintaining a small memory footprint as well as reasonable inference speed and thus indicate a promising first step towards a holistic mobile diagnostic system providing both precise lesion segmentation and DR-grading.
CNN-based tumor progression prediction after thermal ablation with CT imaging
Local tumor progression (LTP) after ablation treatment in colorectal liver metastases (CRLM) has a detrimental impact on the outcome for patients with advanced colorectal cancer. The ability to predict or even identify LTP at the earliest opportunity is critical to personalise follow-up and subsequent treatment. We present a study of 79 patients (120 lesions) with CRLM who underwent thermal ablation treatment, in which a multi-channel model that identifies patients with LTP from baseline and restaging computed tomography (CT) scans. The study made use of transfer learning strategies in association with a 3-fold cross validation. The area under the receiveroperating characteristic curve was found to be 0.72 (95% confidence interval [CI]: 0.64-0.79), demonstrating that the model was able to generate prognostic features from the CT images.
Lung
icon_mobile_dropdown
Resampling and harmonization for mitigation of heterogeneity in imaging parameters: a comparative study
Apurva Singh, Hannah Horng, Leonid Roshkovan, et al.
We compare techniques for addressing heterogeneity in image physical dimensions and acquisition parameters, and how these methods affect the predictive performance of radiomic features. We further combine radiomic signatures with established clinical prognostic factors to predict progression-free survival (PFS) in stage four NSCLC patients undergoing first-line immunotherapy. Our study includes 124 stage 4 NSCLC patients treated with pembrolizumab (monotherapy:30.65%, combination therapy:69.35%). The Captk software was used to extract radiomic features (n=102) from 3D tumor volumes segmented from lung CT scans with ITK-SNAP. The ability of the following approaches to mitigate the heterogeneity in image physical dimensions (voxel spacing parameters) and acquisition parameters (contrast enhancement and CT reconstruction kernel) were evaluated: resampling the images (to minimum/maximum voxel spacing parameters), harmonization of radiomic features using a nested ComBat technique (taking voxel spacing and/or image acquisition parameters as batch variables) or a combination of resampling the images to the minimum voxel spacing parameters and applying nested harmonization by image acquisition parameters. Two radiomic phenotypes were identified using unsupervised hierarchical clustering of the extracted radiomic features derived from each of these scenarios. Established prognostic factors, including PDL1 expression, ECOG status, BMI and smoking status, were combined with radiomic phenotypes in five-fold cross-validated multivariate Cox proportional hazards models (200 iterations) of progression-free survival. A Cox model based only on clinical factors had a cstatistic (mean, 95% CI) of 0.53[0.50,0.57], which increased to 0.62[0.55,0.64] upon the addition of radiomic phenotypes derived from images which had been resampled to minimum voxel spacing and harmonized by image acquisition parameters. In addition to the cross-validated cstatistics, we also built a model on the complete dataset of features corresponding to each of the approaches to evaluate the Kaplan Meier performance in separating patients above versus below the median prognostic score. This preliminary study aims to draw comparisons between the various techniques used to address the issue of reproducibility in radiomic features derived from medical images with heterogeneous parameters.
Optimization of imaging parameters of an investigational photon-counting CT prototype for lung lesion radiomics
The aim of this study was to evaluate and optimize the imaging parameters of a new dual-source photon-counting CT (PCCT) scanner (NAEOTOM Alpha, Siemens Healthineers) for lung lesion radiomics using virtual imaging trials. Virtual patients (XCAT phantoms) were modeled at three BMIs (22%, 52%, and 88%), with three lesions in each phantom. The lesions were modeled with varying spiculation levels (low, medium, high). A scanner-specific CT simulator (DukeSim), setup to model the NAEOTOM Alpha scanner, was used to simulate imaging of the virtual patients under varying radiation dose (5.7 to 17.1 mGy) and reconstruction parameters (matrix size of 512x512 and 1024x1024, kernels of Bl56, Br56, and Qr56, and slice thicknesses of 0.4 to 3.0 mm). A morphological snakes segmentation method was used to segment the lesions in the reconstructed images. The segmented masks were used to calculate morphological radiomic features across all acquired images. The original phantoms were also run through the same radiomics software to serve as ground truth measurements. The radiomics features were found to be most dependent on slice thickness and least dependent on dose level. By increasing the dose from 5.7 mGy to 17.1 mGy the accuracy in the radiomics measurements increased at most by 2.0%. The Qr56 kernel, 0.34 mm in-plane pixel size and 0.4 mm slice thickness had the more accurate measurements of morphological features (e.g., error of 6.7 ± 5.6 % vs. 11.8 ± 9.6% for mesh volume).
Prediction of lung CT scores of systemic sclerosis by cascaded regression neural networks
Visually scoring lung involvement in systemic sclerosis (SSc) from CT scans plays an important role in monitoring progression, but its labor intensiveness hinders practical application. We proposed, therefore, an automatic scoring framework that consists of two cascaded deep regression neural networks. The first (3D) network aims to predict the craniocaudal position of five anatomically defined scoring levels on the 3D CT scans. The second (2D) network receives the resulting 2D axial slices and predicts the scores, which represent the extent of SSc disease. CT scans from 227 patients were used for both networks. 180 scans were split into four groups with equal number of samples to perform four-fold cross validation and an additional set of 47 scans constitute a separate testing dataset. Two experts scored all CT data in consensus and to obtain inter-observer variabilities they also scored independently 16 patients from the testing dataset. To alleviate the unbalance in training labels in the second network, we introduced a balanced sampling technique and to increase the diversity of the training samples, synthetic data was generated, mimicking ground glass and reticulation patterns. The four-fold cross validation results showed that our proposed score prediction network achieved an average MAE of 5.90, 4.66 and 4.49%, weighted kappa of 0.66, 0.58 and 0.65 for total score (TOT), ground glass (GG) and reticular pattern (RET), respectively. Our network performed slightly worse than the best human observation on TOT and GG prediction but it has competitive performance on RET prediction and has the potential to be an objective alternative for the visual scoring of SSc in CT thorax studies.
Can deep learning model undergo the same process as a human radiologist when determining malignancy of pulmonary nodules?
When multiple radiologists make radiological decisions based on CT scans, inter-reader variability often exists between radiologists, and thus different conclusions can be reached from viewing an identical scan. Predicting lung nodule malignancy is a prime example of a radiological decision where inter-reader variability can exist, which may be originated from the variety of radiologic features with inter-reader variability that a radiologist can consider when predicting the malignancy of nodules (1). Radiologists predict whether the nodule on chest CT is malignant or benign by observing radiologic features. Although deep learning model can be trained using 3-dimensional images, the more accurate prediction may be achieved by extracting radiologic features. The purpose of this paper is to investigate how the deep learning model can be trained to predict malignancy with regards to extracting relevant radiologic features and to compare the extent of agreement between human readers and between human readers and deep learning model for malignancy prediction of lung nodules.
Abdomen
icon_mobile_dropdown
A graph-theoretic algorithm for small bowel path tracking in CT scans
We present a novel graph-theoretic method for small bowel path tracking. It is formulated as finding the minimum cost path between given start and end nodes on a graph that is constructed based on the bowel wall detection. We observed that a trivial solution with many short-cuts is easily made even with the wall detection, where the tracked path penetrates indistinct walls around the contact between different parts of the small bowel. Thus, we propose to include must-pass nodes in finding the path to better cover the entire course of the small bowel. The proposed method does not entail training with ground-truth paths while the previous methods do. We acquired ground-truth paths that are all connected from start to end of the small bowel for 10 abdominal CT scans, which enables the evaluation of the path tracking for the entire course of the small bowel. The proposed method showed clear improvements in terms of several metrics compared to the baseline method. The maximum length of the path that is tracked without an error per scan, by the proposed method, is above 800mm on average.
Lymph node detection in T2 MRI with transformers
Identification of lymph nodes (LN) in T2 Magnetic Resonance Imaging (MRI) is an important step performed by radiologists during the assessment of lymphoproliferative diseases. The size of the nodes play a crucial role in their staging, and radiologists sometimes use an additional contrast sequence such as diffusion weighted imaging (DWI) for confirmation. However, lymph nodes have diverse appearances in T2 MRI scans, making it tough to stage for metastasis. Furthermore, radiologists often miss smaller metastatic lymph nodes over the course of a busy day. To deal with these issues, we propose to use the DEtection TRansformer (DETR) network to localize suspicious metastatic lymph nodes for staging in challenging T2 MRI scans acquired by different scanners and exam protocols. False positives (FP) were reduced through a bounding box fusion technique, and a precision of 65.41% and sensitivity of 91.66% at 4 FP per image was achieved. To the best of our knowledge, our results improve upon the current state-of-the-art for lymph node detection in T2 MRI scans.
CT radiomics to predict early hepatic recurrence after resection for intrahepatic cholangiocarcinoma
Intrahepatic cholangiocarcinoma (IHC) is an aggressive liver cancer with a five-year survival rate of less than 10%. Surgery is the only curative treatment. However, most patients die of disease recurrence, with more than 50% recurring within 2 years. The liver is the most common site. Recurrence at liver within a short period after surgery is common and eventually leads to death. Currently, there is no way to assess the risk of early recurrence or death in these patients. Methods to predict these risks would help physicians select the best treatment plan for individual patients; patients at high risk of recurrence could be treated early or at the time of surgery with chemotherapy or radiation. Such changes in patient management would greatly impact patients’ prospects of survival. The objective of the present study is to identify preoperative computed tomography (CT)-based quantitative imaging predictors of early hepatic recurrence. Two hundred fifty four texture features were extracted from CT-tumor and future liver remnant (FLR) along with tumor size. With features selected using minimum redundancy maximum relevance method and AdaBoost classifier, we obtained an area under the receiver operating characteristic curve of 0.78 using a 3-fold cross-validation for a cohort of 139 patients with IHC.
Universal lesion detection in CT scans using neural network ensembles
In clinical practice, radiologists are reliant on the lesion size when distinguishing metastatic from non-metastatic lesions. A prerequisite for lesion sizing is their detection, as it promotes the downstream assessment of tumor spread. However, lesions vary in their size and appearance in CT scans, and radiologists often miss small lesions during a busy clinical day. To overcome these challenges, we propose the use of state-of-the-art detection neural networks to flag suspicious lesions present in the NIH DeepLesion dataset for sizing. Additionally, we incorporate a bounding box fusion technique to minimize false positives (FP) and improve detection accuracy. Finally, to resemble clinical usage, we constructed an ensemble of the best detection models to localize lesions for sizing with a precision of 65.17% and sensitivity of 91.67% at 4 FP per image. Our results improve upon or maintain the performance of current state-of-the-art methods for lesion detection in challenging CT scans.
Unsupervised optical small bowel ischemia detection in a preclinical model using convolutional variational autoencoders
Mesenteric ischemia or infraction involves a wide spectrum of disease and is known as complex disorder with high mortality rate. The bowel ischemia is caused by insufficient blood flow to the intestine and surgical intervention is the definitive treatment to remove non-viable tissues and restore blood flow to viable tissues. Current clinical practice primarily relies on individual surgeon’s visual inspection and clinical experience that can be subjective and unreproducible. Therefore, more consistent and objective method is required to improve the surgical performance and clinical outcomes. In this work, we present a new optical method combined with unsupervised learning using conditional variational encoders to enable quantitative and objective assessment of tissue perfusion. We integrated multimodal optical imaging technologies of color RGB and non-invasive dye-free laser speckle contrast imaging (LSCI) into a handheld device, observed normal small bowel tissues to train generative autoencoder deep neural network pipeline, and finally tested small bowel ischemia detection through preclinical rodent studies.
Self-supervised U-Net for segmenting flat and sessile polyps
Debayan Bhattacharya, Christian Betz, Dennis Eggert, et al.
Colorectal Cancer(CRC) poses a great risk to public health. It is the third most common cause of cancer in the US. Development of colorectal polyps is one of the earliest signs of cancer. Early detection and resection of polyps can greatly increase survival rate to 90%. Manual inspection can cause misdetections because polyps vary in color, shape, size and appearance. To this end, Computer-Aided Diagnosis systems(CADx) has been proposed that detect polyps by processing the colonoscopic videos. The system acts a secondary check to help clinicians reduce misdetections so that polyps may be resected before they transform to cancer. Polyps vary in color, shape, size, texture and appearance. As a result, the miss rate of polyps is between 6% and 27% despite the prominence of CADx solutions. Furthermore, sessile and flat polyps which have diameter less than 10 mm are more likely to be undetected. Convolutional Neural Networks(CNN) have shown promising results in polyp segmentation. However, all of these works have a supervised approach and are limited by the size of the dataset. It was observed that smaller datasets reduce the segmentation accuracy of ResUNet++. Self-supervision is a stronger alternative to fully supervised learning especially in medical image analysis since it redresses the limitations posed by small annotated datasets. From the self-supervised approach proposed by Jamaludin et al., it is evident that pretraining a network with a proxy task helps in extracting meaningful representations from the underlying data which can then be used to improve the performance of the final downstream supervised task. In summary, we train a U-Net to inpaint randomly dropped out pixels in the image as a proxy task. The dataset we use for pretraining is Kvasir-SEG dataset. This is followed by a supervised training on the limited Kvasir-Sessile dataset. Our experimental results demonstrate that with limited annotated dataset and a larger unlabeled dataset, self-supervised approach is a better alternative than fully supervised approach. Specifically, our self-supervised U-Net performs better than five segmentation models which were trained in supervised manner on the Kvasir-Sessile dataset.
Eye, Retina
icon_mobile_dropdown
Automation of ischemic myocardial scar detection in cardiac magnetic resonance imaging of the left ventricle using machine learning
Michael H. Udin, Ciprian N. Ionita, Saraswati Pokharel, et al.
Purpose: Machine learning techniques can be applied to cardiac magnetic resonance imaging (CMR) scans in order to differentiate patients with and without ischemic myocardial scarring (IMS). However, processing the image data in the CMR scans requires manual work that takes a significant amount of time and expertise. We propose to develop and test an AI method to automatically identify IMS in CMR scans to streamline processing and reduce time costs. Materials and Methods: CMR scans from 170 patients (138 IMS & 32 without IMS as identified by a clinical expert) were processed using a multistep automatic image data selection algorithm. This algorithm consisted of cropping, circle detection, and supervised machine learning to isolate focused left ventricle image data. We used a ResNet-50 convolutional neural network to evaluate manual vs. automatic selection of left ventricle image data through calculating accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUROC). Results: The algorithm accuracy, sensitivity, specificity, F1 score, and AUROC were 80.6%, 85.6%, 73.7%, 83.0%, and 0.837, respectively, when identifying IMS using manually selected left ventricle image data. With automatic selection of left ventricle image data, the same parameters were 78.5%, 86.0%, 70.7%, 79.7%, and 0.848, respectively. Conclusion: Our proposed automatic image data selection algorithm provides a promising alternative to manual selection when there are time and expertise limitations. Automatic image data selection may also prove to be an important and necessary step toward integration of machine learning diagnosis and prognosis in clinical workflows.
Device specific SD-OCT retinal layer segmentation using cycle-generative adversarial networks in patients with AMD
Souvick Mukherjee, Tharindu De Silva, Gopal Jayakar, et al.
Purpose: Spectral Domain Optical Coherence Tomography (SD-OCT) is a much utilized imaging modality in retina clinics to inspect the integrity of retinal layers in patients with age related macular degeneration. Spectralis and Cirrus are two of the most widely used SD-OCT vendors. Due to the stark difference in intensities and signal to noise ratio’s between the images captured by the two instruments, a model trained on images from one instrument performs poorly on the images of the other instrument. Methods: In this work, we explore the performance of an algorithm trained on images obtained from the Heidelberg Spectralis device on Cirrus images. Utilizing a dataset containing Heidelberg images and Cirrus images, we address the problem of accurately segmenting images on one domain with an algorithm developed on another domain. In our approach we use unpaired CycleGAN based domain adaptation network to transform the Cirrus volumes to the Spectralis volumes, before using our trained segmentation network. Results: We show that the intensity distribution shifts towards the Spectralis domain when we domain adapt Cirrus images to Spectralis images. Our results show that the segmentation model performs significantly better on the domain translated volumes (Total Retinal Volume Error: 0.17±0.27mm3, RPEDC Volume Error: 0.047±0.05mm3) compared to the raw volumes (Total Retinal VolumeError: 0.26±0.36mm3, RPEDC Volume Error: 0.13±0.15mm3) from the Cirrus domain and that such domain adaptation approaches are feasible solutions. Conclusions: Both our qualitative and quantitative results show that CycleGAN domain adaptation network can be used as an efficient technique to perform unpaired domain adaptation between SD-OCT images generated from different devices. We show that a 3D segmentation model trained on Spectralis volume performs better on domain adapted Cirrus volumes, compared to raw Cirrus volumes.
Semi-supervised learning approach for automatic detection of hyperreflective foci in SD-OCT imaging
Tharindu De Silva, Kristina Hess, Cameron Duic, et al.
Purpose: This work investigates a semi-supervised approach for automatic detection of hyperreflective foci (HRF) in spectral-domain optical coherence tomography (SD-OCT) imaging. Starting with a limited annotated data set containing HRFs, we aim to build a larger data set and then a more robust detection model. Methods: Faster RCNN model for object detection was trained in a semi-supervised manner whereby high confidence detections from the current iteration are added to the training set in subsequent iterations after manual verification. With each iteration the size of the training set is increased by including model detected additional cases. We expect the model to be more accurate and robust as the number of training iterations increase. We performed experiments in a data set consisting over 170,000 SD-OCT B scans. The models were tested in a data set consisting of 30 patients (3630 B scans). Results: Across iterations the model performance improved with final model yielding precision=0.56, recall=0.99, and F1-score=0.71. As the number of training example increases the model detects cases with more confidence. The high false positive rate is associated with additional detections that capture instances of elevated reflectivity which upon review were found to represent questionable cases rather than definitive HRFs due to confounding factors. Conclusion: We demonstrate that by starting with a small data set of HRFs we are able to search the occurrences of other HRFs in the data set in a semi-supervised fashion. This method provides an objective, time, and cost-effective alternative to laborious manual inspection of B-scans for HRF occurrences.
Retinal layer segmentation for age-related macular degeneration patients with 3D-UNet
Souvick Mukherjee, Tharindu De Silva, Gopal Jayakar, et al.
Purpose: Spectral Domain Optical Coherence Tomography (SD-OCT) images are a series of Bscans which capture the volume of the retina and reveal structural information. Diseases of the outer retina cause changes to the retinal layers which are evident on SD-OCT images, revealing disease etiology and risk factors for disease progression. Quantitative thickness information of the retina layers provide disease relevant data that reveal important aspects of disease pathogenesis. Manually labeling these layers is extremely laborious, time consuming and costly. Recently, deep learning algorithms have been used for automating the process of segmentation. While retinal volumes are inherently 3 dimensional, state-of-the-art segmentation approaches have been limited in their utilization of the 3 dimensional nature of the structural information. Methods: In this work, we train a 3D-UNet using 150 retinal volumes and test using 191 retinal volumes from a hold out test set (with AMD severity grade ranging from no disease through the intermediate stages to the advanced disease, and presence of geographic atrophy). The 3D deep features learned by the model captures spatial information simultaneously from all the three volumetric dimensions. Since unlike the ground truth, the output of 3D-UNet is not single pixel wide, we perform a column wise probabilistic maximum operation to obtain single pixel wide layers, for quantitative evaluations. Results: We compare our results to the publicly available OCT Explorer and deep learning based 2D-UNet algorithms and observe a low error within 3.11 pixels with respect to the ground truth locations (for some of the most challenging or advanced stage AMD eyes with AMD severity score: 9 and 10). Conclusion: Our results show that both qualitatively and quantitatively there is a significant advantage of extracting and utilizing 3D features over the traditionally used OCT Explorer or 2D-UNet.
Synthesis of annotated pathological retinal OCT data with pathology-induced deformations
In this work, a generative adversarial network (GAN)-based pipeline for the generation of realistic retinal optical coherence tomography (OCT) images with available pathological structures and ground truth anatomical and pathological annotations is established. The emphasis of the proposed image generation approach lies especially on the simulation of the pathology-induced deformations of the retinal layers around a pathological structure. Our experiments demonstrate the realistic appearance of the images as well as their applicability for the training of neural networks.
Segmentation
icon_mobile_dropdown
CVT-Vnet: a convolutional-transformer model for head and neck multi-organ segmentation
In this work, we propose a convolutional vision transformer V-net (CVT-Vnet) for multi-organ segmentation in 3- dimensional CT images of head and neck cancer patients for radiotherapy treatment planning. Organs include brain-stem, chiasm, mandible, optic nerve (left and right), parotid (left and right), and submandibular (left and right). The proposed CVT-Vnet has a U-shape encoder-decoder architecture. A CVT is firstly deployed as the encoder to encourage global characteristics which still preserve precise local details. And a convolutional decoder is utilized to assemble the segmentation from the features learned by the CVT. We evaluated the network using a dataset of 32 patients undergoing radiotherapy treatment. We present quantitative evaluation of the performance of our proposed CVT-Vnet, in terms of segmentation volume similarity (Dice score, sensitivity , precision and absolution percentage volume difference (AVD)) and surface similarity (Hausdorff distance (HD), mean surface distance (MSD) and residual mean square distance (RMSD)), using the physicians’ manual contour as the ground truth. The volume similarities averaged over all organs were 0.79 as Dice score, 0.83 as sensitivity and 0.78 as precision. The average surface similarities were 13.41mm as HD, 0.39mm as MSD and 1.01mm as RMSD. The proposed network performed significantly better than Vnet and DV-net, which are two state-of-the-art methods. The proposed CVT-Vnet can be a promising tool of multi-organ delineation for head and neck radiotherapy treatment planning.
Taking full advantage of uncertainty estimation: an uncertainty-assisted two-stage pipeline for multi-organ segmentation
Zhou Zheng, Masahiro Oda, Kazunari Misawa, et al.
This paper proposes an uncertainty-assisted two-stage pipeline for multi-organ (liver, spleen, and stomach) segmentation. Deep learning methods, especially the convolutional neural networks (CNNs), have been widely applied for multi-organ segmentation in abdominal CT images. However, most models for multi-organ segmentation ignore uncertainty analysis and do not explore the role of uncertainty information in profiting segmentation accuracy. In the first stage of our approach, we analyze the effects of test-time dropout (TTD), test-time augmentation (TTA), and the combination (TTA + TTD) on segmentation accuracy. We also obtain their corresponding uncertainty information at both voxel and structure levels; For the second stage, we propose a novel uncertaintyguided interactive unified level set framework to optimize the segmentation accuracy further efficiently. Our experiments were done on a locally collected dataset containing 400 CT cases. The validation results showed that: 1) utilizing and combining TTD and TTA could improve the baseline performance. TTA, in particular, could boost the baseline performance by a large margin (Dice: 79.66% to 90.95%, ASD: 5.92 to 1.28 voxels), outperforming the state-of-the-art methods, and its corresponding aleatoric uncertainty could provide a better uncertainty estimation than TTD based epistemic uncertainty and contributes to reducing mis-segmentation; 2) Compared with the mainstream interactive algorithms, the proposed level set framework obtained competitive results and required about 76.78% fewer user interaction scribbles in the meanwhile.
Towards a device-independent deep learning approach for the automated segmentation of sonographic fetal brain structures: a multi-center and multi-device validation
Quality assessment of prenatal ultrasonography is essential for the screening of fetal central nervous system (CNS) anomalies. The interpretation of fetal brain structures is highly subjective, expertise-driven, and requires years of training experience, limiting quality prenatal care for all pregnant mothers. With recent advancement in Artificial Intelligence (AI), computer assisted diagnosis has shown promising results, being able to provide expert level diagnosis in matter of seconds and therefore has the potential to improve access to quality and standardized care for all. Specifically, with advent of deep learning (DL), assistance in precise anatomy identification through semantic segmentation essential for the reliable assessment of growth and neurodevelopment, and detection of structural abnormalities have been proposed. However, existing works only identify certain structures (e.g., cavum septum pellucidum [CSP], lateral ventricles [LV], cerebellum ) from either of the axial views (transventricular [TV], transcerebellar [TC]), limiting the scope for a thorough anatomical assessment as per practice guidelines necessary for the screening of CNS anomalies. Further, existing works do not analyze the generalizability of these DL algorithms across images from multiple ultrasound devices and centers, thus, limiting their real-world clinical impact. In this study, we propose a deep learning (DL) based segmentation framework for the automated segmentation of 10 key fetal brain structures from 2 axial planes from fetal brain USG images (2D). We developed a custom U-Net variant that uses inceptionv4 block as a feature extractor and leverages custom domain-specific data augmentation. Quantitatively, the mean (10 structures; test sets 1/2/3/4) Dice-coefficients were: 0.827, 0.802, 0.731, 0.783. Irrespective of the USG device/center, the DL segmentations were qualitatively comparable to their manual segmentations. The proposed DL system offered a promising and generalizable performance (multi-centers, multi-device) and also presents evidence in support of device-induced variation in image quality (a challenge to generalizibility) by using UMAP analysis. Its clinical translation can assist a wide range of users across settings to deliver standardized and quality prenatal examinations.
Transformation-consistent semi-supervised learning for prostate CT radiotherapy
Yichao Li, Mohamed S. Elmahdy, Michael S. Lew, et al.
Deep supervised models often require a large amount of labelled data, which is difficult to obtain in the medical domain. Therefore, semi-supervised learning (SSL) has been an active area of research due to its promise to minimize training costs by leveraging unlabelled data. Previous research have shown that SSL is especially effective in low labelled data regimes, we show that outperformance can be extended to high data regimes by applying Stochastic Weight Averaging (SWA), which incurs zero additional training cost. Our model was trained on a prostate CT dataset and achieved improvements of 0.12 mm, 0.14 mm, 0.32 mm, and 0.14 mm for the prostate, seminal vesicles, rectum, and bladder respectively, in terms of median test set mean surface distance (MSD) compared to the supervised baseline in our high data regime.
Deep pancreas segmentation through quantification of pancreatic uncertainty on abdominal CT images
Accurate segmentation of the pancreas on abdominal CT images is a prerequisite step for understanding the shape of the pancreas in pancreatic cancer diagnosis, surgery, and treatment planning. However, pancreas segmentation is very challenging due to the characteristics of the pancreas about the high within and between patient variability and the poor contrast with surrounding organs. In addition, the uncertain area arising from variability in the location and morphology of the pancreas will lead to over-segmentation or under-segmentation. Therefore, the purpose of this study is to improve the performance of pancreas segmentation by increasing the level of confidence through multi-scale prediction network (MP-Net) for areas with high uncertainty due to the characteristics of the pancreas. First, the pancreas is localized using U-Net based 2D segmentation networks on the three-orthogonal planes and combined through a majority voting. Second, the localized pancreas is segmented using a 2D MP-Net considering pancreatic uncertainty from multi-scale prediction results. The average F1-score, recall, and precision of the proposed pancreas segmentation method were 78.60%, 78.44%, and 79.72%, respectively. Our deep pancreas segmentation can be used to reduce intra- and inter-patient variations for understanding the shape of the pancreas, which is helpful in diagnosing cancer, surgery, and planning treatment.