Medical Imaging 2019: Computer-Aided Diagnosis

Front Matter: Volume 10950

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 10950, including the Title Page, Copyright information, Table of Contents, Introduction and Author and Conference Committee lists

Vendor-independent soft tissue lesion detection using weakly supervised and unsupervised adversarial domain adaptation

Joris van Vugt, Elena Marchiori, Ritse Mann, et al.

Show abstract

Computer-aided detection aims to improve breast cancer screening programs by helping radiologists to evaluate digital mammography (DM) exams. DM exams are generated by devices from different vendors, with diverse characteristics between and even within vendors. Physical properties of these devices and postprocessing of the images can greatly influence the resulting mammogram. This results in the fact that a deep learning model trained on data from one vendor cannot readily be applied to data from another vendor. This paper investigates the use of tailored transfer learning methods based on adversarial learning to tackle this problem. We consider a database of DM exams (mostly bilateral and two views) generated by Hologic and Siemens vendors. We analyze two transfer learning settings: 1) unsupervised transfer, where Hologic data with soft lesion annotation at pixel level and Siemens unlabelled data are used to annotate images in the latter data; 2) weak supervised transfer, where exam level labels for images from the Siemens mammograph are available. We propose tailored variants of recent state-of-the-art methods for transfer learning which take into account the class imbalance and incorporate knowledge provided by the annotations at exam level. Results of experiments indicate the beneficial effect of transfer learning in both transfer settings. Notably, at 0.02 false positives per image, we achieve a sensitivity of 0.37, compared to 0.30 of a baseline with no transfer. Results indicate that using exam level annotations gives an additional increase in sensitivity.

Detecting mammographically-occult cancer in women with dense breasts using deep convolutional neural network and Radon cumulative distribution transform

Juhun Lee, Robert M. Nishikawa

Show abstract

We previously introduced the Radon Cumulative Distribution Transform (RCDT) as a novel image transformation to highlight the subtle difference between left and right mammograms to detect mammographically-occult (MO) cancer from women with dense breasts and negative screening mammograms. This study developed deep convolutional neural networks (CNN) as classifiers for estimating the probability of having MO cancer. We acquired screening mammograms of 333 women (97 unilateral MO cancer) with dense breasts and at least two consecutive mammograms and used the immediate prior mammograms, which radiologists interpreted as negative. We divided our dataset into a training, a validation, and a test set with ratios of 0.72:0.08:0.2. We applied RCDT on the left and right mammograms of each view. We applied inverse radon transform to represent the resulting RCDT images in the image domain. We then fine-tuned a VGG16 network pretrained on ImageNet using the resulting images per each view. Using the same images, we also developed a traditional classifier using handcrafted features per each view. The CNNs achieved areas under the receiver operating characteristic (AUC) curve of 0.74 and 0.69 for CC view and MLO view, respectively. The traditional classifiers from handcrafted features achieved AUCs of 0.5 and 0.64 for CC view and MLO view, respectively. We averaged the scores from the top three classifiers and achieved an AUC of 0.81 on the test set. In conclusion, we showed that inverse radon transformed RCDT images hold information to detect MO cancer and deep CNNs could learn such information.

Reducing overfitting of a deep learning breast mass detection algorithm in mammography using synthetic images

Kenny H. Cha, Nicholas Petrick, Aria Pezeshk, et al.

Show abstract

We evaluated whether using synthetic mammograms for training data augmentation may reduce the effects of overfitting and increase the performance of a deep learning algorithm for breast mass detection. Synthetic mammograms were generated using a combination of an in-silico random breast generation algorithm and x-ray transport simulation. In-silico breast phantoms containing masses were modeled across the four BI-RADS breast density categories, and the masses were modeled with different sizes, shapes and margins. A Monte Carlo-based xray transport simulation code, MC-GPU, was used to project the 3D phantoms into realistic synthetic mammograms. A training data set of 2,000 mammograms with 2,522 masses were generated and used for augmenting a data set of real mammograms for training. The data set of real mammograms included all the masses in the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) and consisted of 1,112 mammograms (1,198 masses) for training, 120 mammograms (120 masses) for validation, and 361 mammograms (378 masses) for testing. We used Faster R-CNN for our deep learning network with pre-training from ImageNet using Resnet-101 architecture. We compared the detection performance when the network was trained using only the CBIS-DDSM training images, and when subsets of the training set were augmented with 250, 500, 1,000 and 2,000 synthetic mammograms. FROC analysis was performed to compare performances with and without the synthetic mammograms. Our study showed that enlarging the training data with synthetic mammograms shows promise in reducing the overfitting, and that the inclusion of the synthetic images for training increased the performance of the deep learning algorithm for mass detection on mammograms.

Deep learning for identifying breast cancer malignancy and false recalls: a robustness study on training strategy

Kadie Clancy, Lei Zhang, Aly Mohamed, et al.

Show abstract

Identification of malignancy and false recalls (women who are recalled in screening for additional workup, but later proven benign) in screening mammography has significant clinical value for accurate diagnosis of breast cancer. Deep learning methods have recently shown success in the area of medical imaging classification. However, there are a multitude of different training strategies that can significantly impact the overall model performance for a specific classification task. In this study, we aimed to investigate the impact of training strategy on classification of digital mammograms by performing a robustness analysis of deep learning models to distinguish malignancy and false-recall from normal (benign) findings. Specifically, we employed several pre-training strategies including transfer learning with medical and non-medical datasets, layer freezing, and varied network structure on both binary and three-class classification tasks of digital mammography images. We found that, overall, deep learning models appear to be robust to some modifications of network structure and pre-training strategy that we tested for mammogram-specific classification tasks. However, for specific classification tasks, some training strategies offer performance gains. The most notable performance gains in our experiments involved residual network models.

Evaluating deep learning techniques for dynamic contrast-enhanced MRI in the diagnosis of breast cancer

Rachel Anderson, Hui Li, Yu Ji, et al.

Show abstract

Deep learning has shown promise in the field of computer vision for image recognition. We evaluated two deep transfer learning techniques (feature extraction and fine-tuning) in the diagnosis of breast cancer compared to a lesion-based radiomics computer-aided diagnosis (CAD) method. The dataset included a total of 2006 breast lesions (1506 malignant and 500 benign) that were imaged with dynamic contrast-enhanced MRI. Pre-contrast, first post-contrast, and second post-contrast timepoint images for each lesion were combined to form an RGB image, which subsequently served as input to a VGG19 convolutional neural network (CNN) pre-trained on the ImageNet database. The first transfer learning technique was feature extraction conducted by extracting feature output from each of the five max-pooling layers in the trained CNN, average-pooling the features, performing feature reduction, and merging the CNN-features with a support vector machine in the classification of malignant and benign lesions. The second transfer learning method used a 64% training, 16% validation, and 20% testing dataset split in the fine-tuning of the final fully connected layers of the pretrained VGG19 to classify the images as malignant or benign. The performance of each of the three CAD methods were evaluated using receiver operating characteristic (ROC) analysis with area under the ROC curve (AUC) as the performance metric in the task of distinguishing between malignant and benign lesions. The performance of the radiomics CAD (AUC = 0.90) was significantly better than that of the CNN-feature-extraction (AUC = 0.84; p<0.0001), however, we failed to show a significant difference with the fine-tuning method (AUC = 0.86; p=0.1251), and thus, we conclude that transfer learning shows potential as a comparable computer-aided diagnosis technique.

Registration based detection and quantification of intracranial aneurysm growth

Žiga Bizjak, Tim Jerman, Boštjan Likar, et al.

Show abstract

As growing aneurysms are very likely to rupture, features to detect and quantify the growth are needed in order to assess rupture risk. So far cross-sectional features like maximum dome size were used, however, independent analysis of baseline and follow-up aneurysm shapes may bias these features and thereby conceal the often subtle changes of aneurysm morphology. We propose to detect and quantify aneurysm growth using shape coregistration, composed of globally optimal rigid registration, followed by non-rigid warping of baseline mesh to the follow-up mesh. Aneurysm isolation algorithm is used to constrain the registration to parent vessels and to aneurysm dome in the rigid and non-rigid registration steps, respectively. Based on the analysis of the obtained deformation field, two novel morphologic features were proposed, namely the relative differential surface area and median path length, normalized by maximum dome size. The morphological features were extracted and studied on a CTA image dataset of 20 patients, each containing one unruptured intracranial saccular aneurysm (maximal dome diameters were from 1.4 to 12.2 mm). For a baseline performance comparison, five cross-sectional features were also extracted and their relative change computed. The two novel registration based features performed best as demonstrated by lowest p-values (<0.003) obtained by Mann-Whitney U-test and highest area under the curve (>0.89) obtained from a ROC analysis. The proposed differential features are inherently longitudinal, taking into consideration baseline and follow-up aneurysm shape information at once, and seem to enable an interventional neuroradiologist to differentiate better between low- and high-rupture-risk aneurysms.

Reliability of computer-aided diagnosis tools with multi-center MR datasets: impact of training protocol

M. Bento , R. Souza, M. Salluzzi, et al.

Show abstract

Computer-aided diagnosis (CAD) tools using MR images have been largely developed for disease burden quantification, patient diagnosis and follow-up. Newer CAD tools, based on machine learning techniques, often require large and heterogeneous data-sets to provide accurate and generalizable results. Commonly multi-center MR imaging data-sets are used. Typically, collection of these data-sets require adherence to an appropriate experimental protocol in order to assure that findings are due to a pathology and not due to variability in image quality or acquisition parameters across scanners and/or imaging centers. We compared different experimental training protocols used with a representative CAD tool (in this work, designed to identify Alzheimer’s disease (AD) patients from normal control (NC) subjects) using public multi-center data-sets. We examined: 1) subsets of the data-set that were acquired on the same scanner (simulating a single site homogeneous data-set), 2) a traditional cross validation framework (i.e., randomly splitting the data-set into training and testing sets irrespective of centre), and 3) a site-wise cross validation framework, in which training and testing data were differentiated by center using a leave one center out per iteration method. Results achieved with the homogeneous data-set, traditional cross-validation and site-wise cross validation differed (p = 0.0005): 100.0% (i.e., no misclassifications), 99.6% and 97.3% accuracy rates, respectively, even when the same image data-set, features and classifier were used. The lowest accuracy was observed with site-wise cross validation, the only protocol with no site-wise contamination between training and testing samples.

Automatic multi-modality segmentation of gross tumor volume for head and neck cancer radiotherapy using 3D U-Net

Zhe Guo, Ning Guo, Kuang Gong, et al.

Show abstract

Accurate delineation of gross tumor volume (GTV) is essential for head and neck cancer radiotherapy. Complexity of morphology and potential image artifacts usually cause inaccurate manual delineation and interobserver variability. Manual delineation is also time consuming. Motivated by the recent success of deep learning methods in natural and medical image processing, we propose an automatic GTV segmentation approach based on 3D-Unet to achieve automatic GTV delineation. One innovative feature of our proposed method is that PET/CT multi-modality images are integrated in the segmentation network. 175 patients are included in this study with manually drawn GTV by physicians. Based on results from 5-fold cross validation, our proposed method achieves a dice loss of 0.82±0.07 which is better than the model using PET image only (0.79±0.09). In conclusion, automatic GTV segmentation is successfully applied to head and neck cancer patients using deep learning network and multi-modality images, which brings unique benefits for radiation therapy planning.

Automatic strategy for extraction of anthropometric measurements for the diagnostic and evaluation of deformational plagiocephaly from infant’s head models

Bruno Oliveira, Helena R. Torres, Fernando Veloso, et al.

Show abstract

Deformational Plagiocephaly (DP) refers to an asymmetrical distortion of an infant’s skull resulting from external forces applied over time. The diagnosis of this condition is performed using asymmetry indexes that are estimated from specific anatomical landmarks, whose are manually defined on head models acquired using laser scans. However, this manual identification is susceptible to intra-/inter-observer variability, being also time-consuming. Therefore, automatic strategies for the identification of the landmarks and, consequently, extraction of asymmetry indexes, are claimed. A novel pipeline to automatically identify these landmarks on 3D head models and to estimate the relevant cranial asymmetry indexes is proposed. Thus, a template database is created and then aligned with the unlabelled patient through an iterative closest point (ICP) strategy. Here, an initial rigid alignment followed by an affine one are applied to remove global misalignments between each template and the patient. Next, a non-rigid alignment is used to deform the template information to the patient-specific shape. The final position of each landmark is computed as a local weight average of all candidate results. From the identified landmarks, a head’s coordinate system is automatically estimated and later used to estimate cranial asymmetry indexes. The proposed framework was evaluated in 15 synthetic infant head’s model. Overall, the results demonstrated the accuracy of the identification strategy, with a mean average distance of 2.8±0.6 mm between the identified landmarks and the ground-truth. Moreover, for the estimation of cranial asymmetry indexes, a performance comparable to the inter-observer variability was achieved.

Radiomics of the lesion habitat on pre-treatment MRI predicts response to chemo-radiation therapy in Glioblastoma

Ruchika Verma, Ramon Correa, Virginia Hill, et al.

Show abstract

Glioblastoma (GBM) is a highly aggressive brain tumor with a median survival of 15 months. Unfortunately, chemo-radiation therapy (CRT), the standard-of-care treatment for GBM, fails in over 40% of the patients within 6 months of treatment, likely on account of the highly infiltrative and heterogeneous nature of the disease. Consequently, there is a need to differentiate patients who might be at high-risk of poor outcome due to treatment failure from those who may respond favorably to CRT treatment. In this work, we analyzed the lesion heterogeneity on clinical multi-parametric MRI (MP-MRI) by interrogating radiomic features from the lesion habitat" (comprising enhancing tumor, necrotic core, T2/FLAIR hyperintensities), to determine if we could non-invasively stratify patients into low-risk and high-risk categories based on their progression-free-survival (PFS). We employed a total of 124 pre-treatment MP-MRI scans (Gadolinium (Gd)-enhanced T1w, T2w, FLAIR) and dichotomized high-risk from low-risk patients using the median PFS information. Of the 124 scans, 90 studies were used for training and 34 studies were used for independent validation. For each MRI scan, necrotic core, enhancing tumor, and edematous sub-compartments were annotated by an expert. Thereafter, a total of 1008 radiomic descriptors (e.g. Haralick, Laws energy, CoLlAGe) were extracted from every sub-compartment across all three MRI protocols (Gd-T1w, T2w, FLAIR). The top 5 most discriminatory radiomic features (p _ 0.05) were selected from the training cohort using a one way analysis of variance (ANOVA) test after removing multi-collinearity for each sub-compartment across all three MRI protocols. A linear discriminant analysis (LDA) classifier was employed individually for each sub-compartment, using the top 5 features selected from the training cohort. These features were then used to differentiate between high-risk and low-risk groups in the independent validation set. We further concatenated the top 5 radiomic features from each sub-compartment to evaluate the combined impact of the tumor habitat in predicting patient outcome. The best accuracy of 72.2% on the training cohort, and 73.3% on the independent validation cohort was obtained from the enhancing tumor sub-compartment, in distinguishing the low-risk from the high-risk group. Interestingly, when top features from the 3 sub-compartments were combined together for risk-stratification, an accuracy of 82.3% was obtained on the independent validation cohort (N=34). Our preliminary results seem to suggest that radiomic features from the tumor habitat might be reflective of tumor heterogeneity and could potentially differentiate high-risk from low-risk groups in predicting patient's response to treatment.

Modeling normal brain asymmetry in MR images applied to anomaly detection without segmentation and data annotation

Samuel Botter Martins, Barbara Caroline Benato, Bruna Ferreira Silva, et al.

Show abstract

While the human brain presents natural structural asymmetries between left and right hemispheres in MR images, most neurological diseases are associated with abnormal brain asymmetries. Due to the great variety of such anomalies, we present a framework to model normal structural brain asymmetry from control subjects only, independent of the neurological disease. The model dismisses data annotation by exploiting generative deep neural networks and one-class classifiers. We also propose a patch-based model to localize volumes of interest with reduced background sizes around selected brain structures and a one-class classifier based on an optimum-path forest. This model makes the framework independent of segmentation, which may fail, especially in abnormal images, or may not be available for a given structure. We validate the first method to the detection of abnormal hippocampal asymmetry using distinct groups of Epilepsy patients and testing controls. The results of validation using the original feature space and a two-dimensional space based on non-linear projection show the potential to extend the framework for abnormal asymmetry detection in other parts of the brain and develop intelligent and interactive virtual environments. For instance, the approach can be used for screening, inspection, and annotation of the detected anomaly type, allowing the development of CADx systems.

Response monitoring of breast cancer on DCE-MRI using convolutional neural network-generated seed points and constrained volume growing

Bas H. M. van der Velden, Bob D. de Vos, Claudette E. Loo, et al.

Show abstract

Response of breast cancer to neoadjuvant chemotherapy (NAC) can be monitored using the change in visible tumor on magnetic resonance imaging (MRI). In our current workflow, seed points are manually placed in areas of enhancement likely to contain cancer. A constrained volume growing method uses these manually placed seed points as input and generates a tumor segmentation. This method is rigorously validated using complete pathological embedding. In this study, we propose to exploit deep learning for fast and automatic seed point detection, replacing manual seed point placement in our existing and well-validated work ow. The seed point generator was developed in early breast cancer patients with pathology-proven segmentations (N=100), operated shortly after MRI. It consisted of an ensemble of three independently trained fully convolutional dilated neural networks that classified breast voxels as tumor or non-tumor. Subsequently, local maxima were used as seed points for volume growing in patients receiving NAC (N=10). The percentage of tumor volume change was evaluated against semi-automatic segmentations. The primary cancer was localized in 95% of the tumors at the cost of 0.9 false positive per patient. False positives included focally enhancing regions of unknown origin and parts of the intramammary blood vessels. Volume growing from the seed points showed a median tumor volume decrease of 70% (interquartile range: 50%{77%), comparable to the semi-automatic segmentations (median: 70%, interquartile range 23%{76%). To conclude, a fast and automatic seed point generator was developed, fully automating a well-validated semi-automatic work ow for response monitoring of breast cancer to neoadjuvant chemotherapy.

Multiview mammographic mass detection based on a single shot detection system

Yinhao Ren, Rui Hou, Dehan Kong, et al.

Show abstract

Detection of suspicious breast cancer lesion in screening mammography images is an important step for the downstream diagnosis the of breast cancer. A trained radiologist can usually take advantage of multi-view correlation of suspicious lesions to locate abnormalities. In this work, we investigate the feasibility of using a random image pair of the same breast from the same exam for the detection of suspicious lesions. We present a novel approach to utilize a single shot detection system inspired by You only look once (YOLO) v1 to simultaneously process a primary detection view and a secondary view for the localization of lesion in the primary detection view. We used a combination of screening exams from Duke University Hospital and OPTIMAM to conduct our experiments. The Duke dataset includes 850 positive cases and around 10,000 negative cases. The OPTIMAM dataset includes around 350 cases. We observed a consistent left shift of the Free-Response Receiver Operating Characteristic (FROC) curve in the multi-view detection model compared to the single-view detection model. This result is promising for future development of automated lesion detection systems focusing on modern full-field digital mammography (FFDM).

A deep learning method for volumetric breast density estimation from processed full field digital mammograms

Doiriel Vanegas C., Mahlet A. Birhanu, Nico Karssemeijer, et al.

Show abstract

Breast density is an important factor in breast cancer screening. Methods exist to measure the volume of dense breast tissue from 2D mammograms. However, these methods can only be applied to raw mammograms. Breast density classification methods that have been developed for processed mammograms are commonly based on radiologist Breast Imaging and Reporting Data System (BI-RADS) annotations. Unfortunately, such labels are subjective and may introduce personal bias and inter-reader discrepancy. In order to avoid such limitations, this paper presents a method for estimation of percent dense tissue volume (PDV) from processed full field digital mammograms (FFDM) using a deep learning approach. A convolutional neural network (CNN) was implemented to carry out a regression task of estimating PDV using density measurement on raw FFDM as a ground truth. The dataset used for training, validation, and testing (Set A) includes over 2000 clinical cases from 3 different vendors. Our results show a high correlation of the predicted PDV to raw measurements, with a Spearman’s correlation coefficient of r=0.925. The CNN was also tested on an independent set of 97 clinical cases (Set B) for which PDV measurements from FFDM and MRI were available. CNN predictions on Set B showed a high correlation with both raw FFDM and MRI data (r=0.897 and r=0.903, respectively). Set B had radiologist annotated BI-RADS labels, which agreed with the estimated values to a high degree, showing the ability of our CNN to make a distinction between different BI-RADS categories comparable to methods applied to raw mammograms.

Breast density follow-up decision support system using deep convolutional models

Sun Young Park, Dustin Sargent, David Richmond

Show abstract

Breast cancer risk assessment relies on accurate classification of breast density, which is a key component of the ACR breast cancer screening recommendations for clinical decisions. The 5^th edition of the BIRADS standard divides breast density into four categories, ranging from almost entirely fatty to extremely dense. High breast density (classes C and D) reduces the sensitivity of mammography, since the dense fibroglandular tissue can hide lesions, masses and other findings. Therefore, although the benefit of supplementary imaging in such cases has not been conclusively demonstrated, the ACR guidelines suggest additional screening for patients with high breast density. This creates an important treatment decision boundary between class B (scattered areas of fibroglandular density) and class C (heterogeneously dense). Unfortunately, the slightly abstract, qualitative nature of the class descriptions leads to significant inter- and intra-rater variation in breast density assessment. This is exacerbated by updates to the BIRADS standard that can cause recent breast density assessments to be incompatible with prior assessments for the same patient. Additionally, images from similar patients can vary significantly when taken with different devices or at sites with different acquisition protocols. To address these issues, we present a new deep learning algorithm combining three models that achieves accurate and objective breast density classification. The first model performs the normal four-class breast density classification, the second model performs a two-class low (A or B) vs. high (C or D) classification, and the third patch-based model focuses on improving the accuracy of the B and C categories. We present initial results from 9989 studies from a three-site dataset with BIRADS 4^th and 5^th edition ground truth.

DCE-MRI based analysis of intratumor heterogeneity by decomposing method for prediction of HER2 status in breast cancer

Peng Zhang, Ming Fan, Yuanzhe Li, et al.

Show abstract

Human epidermal growth factor receptor-2 (HER2) plays an important role in treatment strategy and prognosis determination in breast cancers. However, breast cancers are characterized by considerable heterogeneity both between and within tumors, which is a key impediment to accurately determine HER2 status for radiomic analysis. To this end, tumor heterogeneity was evaluated by unsupervised decomposition method on breast magnetic resonance imaging (MRI), in which three tumor subregions were generated terms as Input, Fast and Slow. This tumor decomposition was performed by a convex analysis of mixtures (CAM) method, which was designed according to analysis of contrast-enhancement patterns. The study retrospectively investigated 181 patients who underwent dynamic contrast enhancement magnetic resonance imaging (DCE-MRI) examination. Among them, 124 were HER2-negative and 57 were HER2-positive status. Imaging features of texture and histogram were computed in each subregion. Multivariate logistic regression classifiers were trained and validated with leave-one-out cross-validation (LOOCV) method. An area under a receiver operating characteristic curve (AUC) was calculated to assess performance of the classifier. The classifier based on features from Fast subregion obtained an AUC of 0.802 ± 0.067 and was significantly (P = 0.0113) outperformed the classifier based on features from the whole tumors. When the predicted values from the respective classifiers were fused by weighted average, the AUC significantly increased to 0.820 ± 0.063 (P = 0.0011). The results indicate that analysis of intratumor heterogeneity through decomposing method of DCE-MRI has the potential to serve as a marker for predicting HER2 status.

Association of computer-aided detection results and breast cancer risk

Seyedehnafiseh Mirniaharikandehei, Morteza Heidari, Gopichandh Danala, et al.

Show abstract

Since conventional computer-aided detection (CAD) schemes of mammograms produce high false positive detection rates, radiologists often ignore CAD-cued suspicious regions, in particular, the mass-type regions, which reduces the application value of CAD in clinical practice. The objective of this study is to investigate a new hypothesis that CAD-generated detection results may be useful and have a positive association to the mammographic cases with a high risk of being positive for cancer. To test this hypothesis, a large and diverse image dataset including mammograms acquired from 2,349 women was retrospectively assembled. Among them, 882 are positive and 1,467 are negative. Each case involves 4 images of craniocaudal (CC) and mediolateral oblique (MLO) view of the left and right breasts. A CAD scheme was applied to process all mammograms. From CAD results, a number of bilateral difference features from the matched CC and MLO view images of right and left breasts were computed. We analyzed discriminatory power to predict the risk of cases being positive using the bilateral difference features and a multi-feature fusion based Logistic-Regression machine learning classifier. By using a leave-onecase- out cross-validation method, the area under the ROC curve of the classifier for the multi-feature fusion was AUC=0.660 ±0.012. By applying an operating threshold at 0.5, the overall prediction accuracy was 67% and the odds ratio was 4.794 with a statistically significant increasing trend (p<0.01). Study results indicated that from CAD-generated false-positives, we enabled to generate a new quantitative imaging marker to predict higher risk cases being positive and cue a case-based warning sign.

Breast parenchyma analysis and classification for breast masses detection using texture feature descriptors and neural networks in dedicated breast CT images

Marco Caballo, Jonas Teuwen, Ritse Mann, et al.

Show abstract

We propose an algorithm to recognize breast parenchyma regions containing mass-like abnormalities in dedicated breast CT images using texture feature descriptors. From 53 patient breast CT scans (29 of which containing masses), we first isolated the parenchyma through automatic segmentation, and we obtained a total of 14,751 normal 2D image patches (negatives), and 2,100 containing a breast mass (positives). We extracted 141 texture features (10 first-order descriptors, 6 Haralick features, 20 run-length features, 45 structural and pattern descriptors, 60 Gabor features), which we then analyzed through multivariate analysis of variance (MANOVA) and linear discriminant analysis, resulting in an area under the ROC curve (AUC) of 0.9. We finally identified the most discriminant features through sequential forward selection, and used them to train and validate a neural network by dividing the data into multiple batches, with each batch always containing the whole set of positive cases, and as many different negative examples. To avoid the possible bias due to the high skewness in class proportion, the training was performed on all these batches independently, without re-initializing the network weights after each training. The network was tested using an additional independent 18 patient breast CT scans (8 normal and 10 containing a mass), on a total of 7,274 image patches (852 positives, 6,422 negatives) which were not used during the training/validation phase, resulting in 95.6% precision, 95.8% recall, and 0.99 AUC. Our results suggest that the proposed approach could be further evaluated and expanded for computer-aided detection tasks in breast CT imaging.

Visual evidence for interpreting diagnostic decision of deep neural network in computer-aided diagnosis

Seong Tae Kim, Jae-Hyeok Lee, Yong Man Ro

Show abstract

Recent studies have reported that deep learning techniques could achieve high performance in medical image analysis such as computer-aided diagnosis (CADx). However, there is a limitation in interpreting the diagnostic decisions of deep learning due to the black-box nature. To increase confidence in the diagnostic decisions of deep learning, it is necessary to develop a deep neural network with the interpretable structure which could provide a reasonable explanation of diagnostic decisions. In this study, a novel deep neural network has been devised to provide visual evidence of the diagnostic decisions of CADx. The proposed deep network is designed to include a visual interpreter which could provide important areas as the visual evidence of the diagnostic decision in the deep neural network. Based on the observation that the radiologists usually make a diagnostic decision based on the lesion characteristics (the margin and the shape of masses), the visual interpreter provides visual evidence related with the margin and the shape, respectively. To verify the effectiveness of the proposed method, experiments were conducted on mammogram datasets. Experimental results show that the proposed method could provide more important areas as the visual evidence compared with the conventional visualization method. These results imply that the proposed visual interpretation method could be a promising approach to overcome the current limitation of the deep learning for CADx.

Automated measurement of fetal right-myocardial performance index from pulsed wave Doppler spectrum

Rahul Suresh, Srinivasan Sivanandan, Nitin Singhal, et al.

Show abstract

Congenital heart disease is the leading cause of birth defect related deaths. The modified myocardial performance index of the right ventricle (R-MPI) is a sensitive and early clinical indicator of fetal cardiac health. Objective repeatable measurement of R-MPI is an important deciding factor for the clinical adaptation of the R-MPI. In this work, we describe a novel method for automatic computation of R-MPI from the Pulsed Wave Doppler (PWD) images. Our method involves a Fourier series based cardiac cycle detection followed by an adaptive windowed energy based valve click localization and weighted gradient based refinement. Using this method, we have been able to measure R-MPI reliably with a mean difference of 0.0075 ± 0.034 from 170 expert annotations on 68 fetal PWD images with an Intra-Class Correlation (ICC) of 0.9380. Furthermore, we have introduced novel methods for normalization and synchronization of PWD images acquired at two different time intervals for the assessment of iso-volume time intervals and an accurate measurement of R-MPI.

An ensemble of U-Net architecture variants for left atrial segmentation

C. Wang, M. Rajchl, A. D. C. Chan, et al.

Show abstract

Segmentation of the left atrium and proximal pulmonary veins is an important clinical step for diagnosis of atrial fibrillation. However, the automatic segmentation of the left atrium from late gadolinium-enhanced magnetic resonance (LGE-MRI) images remains a challenging task due to differences in acquisition and large variability between individuals. Deep learning has shown to outperform traditional methodologies for segmentation in numerous tasks. A popular deep learning architecture for segmentation is the U-Net, which has shown promising results biomedical segmentation problems. Many newer network architectures have been proposed that leverage the base U-Net architecture such as attention U-Net, dense U-Net and residual U-Net. These models incorporate updated encoder blocks into the U-Net architecture to incrementally improve performance over the base U-Net. Currently, there is no comprehensive evaluation of performance between these models. In this study we (1) explore approaches for the segmentation of the left atrium based on different- Net architectures. (2) We compare and evaluate these on the STACOM 2018 Atrial Segmentation Challenge dataset and (3) ensemble these models to improve overall segmentation by reducing the internal variance between models and architectures. (4) Lastly, we define and build upon a U-Net framework to simplify development of novel U-Net inspired architectures. Our ensemble achieves a mean Dice similarity coefficient (DSC) of 92.1 ± 2.0% on a test set of twenty 3D LGE-MRI images, outperforming other fully automatic segmentation methodologies.

A deep learning approach to classify atherosclerosis using intracoronary optical coherence tomography

Lambros S. Athanasiou, Max L. Olender, José M. de la Torre Hernandez, et al.

Show abstract

Optical coherence tomography (OCT) is a fiber-based intravascular imaging modality that produces high-resolution tomographic images of artery lumen and vessel wall morphology. Manual analysis of the diseased arterial wall is time consuming and sensitive to inter-observer variability; therefore, machine-learning methods have been developed to automatically detect and classify mural composition of atherosclerotic vessels. However, none of the tissue classification methods include in their analysis the outer border of the OCT vessel, they consider the whole arterial wall as pathological, and they do not consider in their analysis the OCT imaging limitations, e.g. shadowed areas. The aim of this study is to present a deep learning method that subdivides the whole arterial wall into six different classes: calcium, lipid tissue, fibrous tissue, mixed tissue, non-pathological tissue or media, and no visible tissue. The method steps include defining wall area (WAR) using previously developed lumen and outer border detection methods, and automatic characterization of the WAR using a convolutional neural network (CNN) algorithm. To validate this approach, 700 images of diseased coronary arteries from 28 patients were manually annotated by two medical experts, while the non-pathological wall and media was automatically detected based on the Euclidian distance of the lumen to the outer border of the WAR. Using the proposed method, an overall classification accuracy 96% is reported, indicating great promise for clinical translation.

PHT-bot: a deep learning based system for automatic risk stratification of COPD patients based upon signs of pulmonary hypertension

David Chettrit, Orna Bregman Amitai, Itamar Tamir, et al.

Show abstract

Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality worldwide. Identifying those at highest risk of deterioration would allow more effective distribution of preventative and surveillance resources. Secondary pulmonary hypertension is a manifestation of advanced COPD, which can be reliably diagnosed by the main Pulmonary Artery (PA) to Ascending Aorta (Ao) ratio. In effect, a PA diameter to Ao diameter ratio of greater than 1 has been demonstrated to be a reliable marker of increased pulmonary arterial pressure. Although clinically valuable and readily visualized, the manual assessment of the PA and the Ao diameters is time consuming and under-reported. The present study describes a non invasive method to measure the diameters of both the Ao and the PA from contrast-enhanced chest Computed Tomography (CT). The solution applies deep learning techniques in order to select the correct axial slice to measure, and to segment both arteries. The system achieves test Pearson correlation coefficient scores of 93% for the Ao and 92% for the PA. To the best of our knowledge, it is the first such fully automated solution.

Identifying disease-free chest x-ray images with deep transfer learning

Ken C. L. Wong, Mehdi Moradi, Joy Wu, et al.

Show abstract

Chest X-rays (CXRs) are among the most commonly used medical image modalities. They are mostly used for screening, and an indication of disease typically results in subsequent tests. As this is mostly a screening test used to rule out chest abnormalities, the requesting clinicians are often interested in whether a CXR is normal or not. A machine learning algorithm that can accurately screen out even a small proportion of the “real normal” exams out of all requested CXRs would be highly beneficial in reducing the workload for radiologists. In this work, we report a deep neural network trained for classifying CXRs with the goal of identifying a large number of normal (disease-free) images without risking the discharge of sick patients. We use an ImageNet-pretrained Inception-ResNet-v2 model to provide the image features, which are further used to train a model on CXRs labelled by expert radiologists. The probability threshold for classification is optimized for 100% precision for the normal class, ensuring no sick patients are released. At this threshold we report an average recall of 50%. This means that the proposed solution has the potential to cut in half the number of disease-free CXRs examined by radiologists, without risking the discharge of sick patients.

Analysis of deep convolutional features for detection of lung nodules in computed tomography

Ravi K. Samala, Heang-Ping Chan, Caleb Richter, et al.

Show abstract

Understanding the benefits of deep convolutional neural networks (DCNN) may facilitate the development of robust computer-assisted image analysis methods. In this work we present features extracted from several DCNN structures with varying levels of depth for the classification of true and false lung nodules marked by a computer-aided detection system for thoracic computer tomography (CT) at a prescreening stage. 1048 true positive (TP) regions-of-interest (ROI) and 69,264 false positive (FP) ROIs from 350 patient cases were used for training VGG16, GoogLeNet, InceptionV4 and Inception-ResNet DCNN structures. For independent testing, 1022 TPs and 61,379 FPs from 310 cases were used. All nodule candidates were detected by applying multiscale Hessian enhancement filters and morphological operations to the segmented lungs. The top 200 ranked nodule candidates from each case were used for training and testing of the DCNN and for feature extraction. The area under the receiver operating characteristic curve (AUC) for the four DCNNs on the independent test set was 0.90, 0.89, 0.91, and 0.90. To analyze the characteristics of the deep features, we extracted features from the last fully connected layer by deploying the trained DCNNs to the independent test set. A total of 4096, 1024, 1536 and 1536 DCNN features from the four DCNNs were extracted and analyzed using uniform manifold approximation and projection (UMAP). The UMAP captured topological representation of features by clustering candidates in different anatomical regions such as lobular, terminal and respiratory bronchioles separately. FPs originated from similar global structures clustered in the UMAP space. The results indicate feature extracted from DCNNs have complementary characteristics that may be exploited to improve the accuracy of the classification tasks.

A combination of intra- and peritumoral features on baseline CT scans is associated with overall survival in non-small cell lung cancer patients treated with immune checkpoint inhibitors: a multi-agent multi-site study

Mohammadhadi Khorrami, Mehdi Alilou, Prateek Prasanna, et al.

Show abstract

Immune checkpoint inhibitors targeting the programmed cell death (PD)1/ L1 axis have been approved for treatment of chemotherapy refractory advanced non-small cell lung cancer (NSCLC) for a few years. While higher PD-L1 expression is associated with better outcomes after monotherapy with immune checkpoint inhibitors, it is not a perfect predictive biomarker for clinical benefit from immunotherapy, because some patients with low PD-L1 expression have sustained responses. In clinical practice, using radiological tools like Response Evaluation Criteria in Solid Tumors (RECIST), tends to underestimate the benefit of therapy. For instance, some patients treated with immunotherapy suffer from pseudoprogression while actually having a favorable response, RECIST in this setting is inadequate to capture the response. In this study we sought to explore whether radiomic texture features extracted from both inside and outside of the tumor from baseline CT scans were associated with overall patient survival (OS) in 139 NSCLC patients being treated with IO from two separate sites. Patients were divided into a discovery (D₁ = 50; nivolumab from Cleveland Clinic) and two validation sets (D₂ = 62 from Cleveland Clinic, D₃ = 27 from University of Pennsylvania Health System. Patients in the validation sets had been treated with different types of checkpoint inhibitor drugs including nivolumab, pembrolizumab, and atezolizumab. 454 radiomic texture features from within (intra-tumoral) and outside the tumor (peri-tumoral) were extracted from baseline contrast CT images. Following feature selection on the discovery set, a radiomic risk-score signature was generated by using least absolute shrinkage and selection operator. Using a Cox regression model, the association of the radiomic signature with overall survival (OS) was evaluated in the discovery and two validation sets. In addition, 95% confidence intervals (CI) and relative hazard ratios (HR) were calculated. Our results revealed that the radiomics signature was significantly associated with OS, both in the discovery set (HR = 5.06, 95%CI = 3, 8.55; p-value < 0.0001) and the two validation data sets (D₂: HR = 5.88, 95% CI = 2.19, 21.63, p-value = 0.0009; D₃: HR = 5.37, 95% CI = 1.74, 16.57, p-value = 0.0034). Our initial results appear to suggest that our radiomic signature could serve as a non-invasive way of predicting and monitoring response to checkpoint inhibitors for patients with non-small cell lung cancer.

Visualizing and explaining deep learning predictions for pneumonia detection in pediatric chest radiographs

Sivaramakrishnan Rajaraman, Sema Candemir, George Thoma, et al.

Show abstract

Pneumonia is a severe inflammatory condition of the lungs that leads to the formation of pus and other liquids in the air sacs. The disease is reported to affect approximately 450 million people across the world, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is the most frequently performed radiographic examination for diagnosing the disease. Unlike pneumonia in adults, pediatric pneumonia is poorly studied. Computer-aided diagnostic (CADx) tools aim to improve disease diagnosis and supplement decision making while simultaneously bridging the gap in effective radiological interpretations during mobile field screening. These tools make use of handcrafted and/or convolutional neural networks (CNN) extracted image features for visual recognition. However, CNNs are perceived as black boxes since their performance lack explanations and poorly understood. The lack of transparency in the learned behavior of CNNs is a serious bottleneck in medical screening/diagnosis since poorly interpreted model behavior could unfavorably impact decision-making. Visualization tools are proposed to interpret and explain model predictions. In this study, we highlight the advantages of visualizing and explaining the activations and predictions of CNNs applied to the challenge of pneumonia detection in pediatric chest radiographs. We evaluate and statistically validate the models’ performance to reduce bias, overfitting, and generalization errors.

Artifact-driven sampling schemes for robust female pelvis CBCT segmentation using deep learning

Annika Hänsch, Volker Dicken, Jan Klein, et al.

Show abstract

Adaptive radiotherapy (RT) planning requires segmentation of organs for adapting the RT treatment plan to changes in the patient’s anatomy. Daily imaging is often done using cone-beam CT (CBCT) imaging devices which produce images of considerably lower quality than CT images, due to scatter and artifacts. Involuntary patient motion during the comparably long CBCT image acquisition may cause misalignment artifacts. In the pelvis, most severe artifacts stem from motion of air and soft tissue boundaries in the bowel, which appear as streaking in the reconstructed images. In addition to low soft tissue contrast, this makes segmentation of organs close to the bowel such as bladder and uterus even more difficult. Deep learning (DL) methods have shown to be promising for difficult segmentation tasks. In this work, we investigate different, artifact-driven sampling schemes that incorporate domain knowledge into the DL training. However, global evaluation metrics such as the Dice score, often used in DL segmentation research, reveal little information about systematic errors and no clear perspective how to improve the training. Using slice-wise Dice scores, we find a clear difference in performance on slices with and without air detected. Moreover, especially when applied in a curriculum training scheme, the specific sampling of slices on which air has been detected might help to increase robustness of deep neural networks towards artifacts while maintaining performance on artifact-free slices.

A probabilistic approach for interpretable deep learning in liver cancer diagnosis

Clinton J. Wang, Charlie A. Hamm, Brian S. Letzen, et al.

Show abstract

Despite rapid advances in deep learning applications for radiological diagnosis and prognosis, the clinical adoption of such models is limited by their inability to explain or justify their predictions. This work developed a probabilistic approach for interpreting the predictions of a convolutional neural network (CNN) trained to classify liver lesions from multiphase magnetic resonance imaging (MRI). It determined the presence of 14 radiological features, where each lesion image contained one to four features and only ten examples of each feature were provided. Using stochastic forward passes of these example images through a trained CNN, samples were obtained from each feature's conditional probability distribution over the network's intermediate outputs. The marginal distribution was sampled with stochastic forward passes of images from the entire training dataset, and sparse kernel density estimation (KDE) was used to infer which features were present in a test set of 60 lesion images. This approach was tested on a CNN that reached 89.7% accuracy in classifying six types of liver lesions. It identified radiological features with 72.2 ± 2.2% precision and 82.6 ± 2.0% recall. In contrast with previous interpretability approaches, this method used sparsely labeled data, did not change the CNN architecture, and directly outputted radiological descriptors of each image. This approach can identify and explain potential failure modes in a CNN, as well as make a CNN's predictions more transparent to radiologists. Such contributions could facilitate the clinical translation of deep learning in a wide range of diagnostic and prognostic applications.

Combining deep learning methods and human knowledge to identify abnormalities in computed tomography (CT) reports

Matias Benitez, James Tian, Mark Kelly, et al.

Show abstract

Many researchers in the field of machine learning have addressed the problem of detecting anomalies within Computed Tomography (CT) scans. Training these machine learning algorithms requires a dataset of CT scans with identified anomalies (labels), usually, in specific organs. This represents a problem, since it requires experts to review thousands of images in order to create labels for these data. We aim to decrease human burden at labeling CT scans by developing a model that identifies anomalies within plain-text-based reports that then could be further used as a method to create labels for models based on CT scans. This study contains more than 4800 CT reports from Duke Health System, for which we aim to identify organ specific abnormalities. We propose an iterative active learning approach that consists of building a machine learning model to classify CT reports by abnormalities in different organs and then improving it by actively adding reports sequentially. At each iteration, clinical experts review the report that provides the model with highest expected information gain. This process is done in real time by using a web interface. Then, this datum is used by the model to improve its performance. We evaluated the performance of our method for abnormalities in kidneys and lungs. When starting with a model trained on 99 reports, the results show the model achieves an Area Under the Curve (AUC) score of 0.93 on the test set after adding 130 actively labeled reports to the model from an unlabeled pool of 4,000. This suggests that a set of labeled CT scans can be obtained with significantly reduced human work by combining machine learning techniques and clinical experts' knowledge.

Bladder cancer staging in CT urography: estimation and validation of decision thresholds for a radiomics-based decision support system

Dhanuj Gandikota, Lubomir Hadjiiski, Heang-Ping Chan, et al.

Show abstract

Stage T2 is the clinical threshold for deciding whether to treat bladder cancer with neoadjuvant chemotherapy. In this study we refined a radiomics-based decision support system (CDSS-S) to aid clinicians in staging of bladder cancer in CT urography (CTU). To train the CDSS-S, we used a data set of 84 bladder cancers from 76 CTU clinically staged cases, 43 cancers were below stage T2, and 41 were stage T2 or above. An independent test set comprising of 82 bladder cancers from 80 CTU clinically staged cases that were staged as T2 or above were also collected. Our Auto- Initialized Cascaded Level Sets (AI-CALS) segmentation pipeline was utilized to segment the lesions from which radiomics features were extracted. The training set was split on 2 balanced partitions. Four classifiers were studied: linear discriminant analysis (LDA), support vector machines (SVM), back-propagation neural networks (BPNN), and random forest (RAF) classifiers. Based on the likelihood scores for a training set, the decision threshold providing the highest classification accuracy for each classifier was determined. The classifier with the fixed decision threshold was then applied to the test set and the performance evaluated. The test classification accuracy for the LDA, SVM, BPNN, and RAF trained on Partition 1 was 0.95, 0.98, 0.88, and 0.89, respectively, and was 0.88, 0.94, 0.88, and 0.93, respectively, when trained on Partition 2. The test classification accuracy for the LDA, SVM, BPNN, and RAF trained on the entire training set was 0.94, 0.94, 0.94, and 0.89, respectively. The results show the potential of CDSS-S in bladder cancer stage assessment.

Automatic MR kidney segmentation for autosomal dominant polycystic kidney disease

Guangrui Mu, Yiyi Ma, Miaofei Han, et al.

Show abstract

Measurement of total kidney volume (TKV) plays an important role in the early therapeutic stage of autosomal dominant polycystic kidney disease (ADPKD). As a crucial biomarker, an accurate TKV can sensitively reflect the disease progression and be used as an indicator to evaluate the curative effect of the drug. However, manual contouring of kidneys in magnetic resonance (MR) images is time-consuming (40 minutes), which greatly hinders the wide adoption of TKV in clinic. In this paper, we propose a multi-resolution 3D convolutional neural network to automatically segment kidneys of ADPKD patients from MR images. We adopt two resolutions and use a customized V-Net model for both resolutions. The V-Net model is able to integrate both high-level context information with detailed local information for accurate organ segmentation. The V-Net model in the coarse resolution can robustly localize the kidneys, while the VNet model in the fine resolution can accurately refine the kidney boundaries. Validated on 305 subjects with different loss functions and network architectures, our method can achieve over 95% Dice similarity coefficient with the groundtruth labeled by a senior physician. Moreover, the proposed method can dramatically reduce the measurement of kidney volume from 40 minutes to about 1 second, which can greatly accelerate the disease staging of ADPKD patients for large clinical trials, promote the development of related drugs, and reduce the burden of physicians.

2D and 3D bladder segmentation using U-Net-based deep-learning

Xiangyuan Ma, Lubomir Hadjiiski, Jun Wei, et al.

Show abstract

We are developing a U-Net based deep learning (U-DL) model for bladder segmentation in CT urography (CTU) as a part of a computer-assisted bladder cancer detection and treatment response assessment pipeline. We previously developed a bladder segmentation method that used a deep-learning convolution neural network and level sets (DCNNLS) within a user-input bounding box. The new method does not require a user-input box nor the level sets for postprocessing. To identify the best model for this task, we compared a number of U-DL models: 1) 2D CTU slices or 3D volume as input, 2) different image resolutions, and 3) preprocessing with and without automated cropping on each slice. We evaluated the segmentation performance of the different U-DL models using 3D hand-segmented contours as reference standard. The segmentation accuracy was quantified by the average volume intersection ratio (AVI), average percent volume error (AVE), average absolute volume error (AAVE), average minimum distance (AMD), and the Jaccard index (JI) for a data set of 81 training/validation and 92 independent test cases. For the test set, the best 2D UDL model achieved AVI, AVE, AAVE, AMD, and JI values of 93.4±9.5%, -4.2±14.2%, 9.2±11.5%, 2.7±2.5 mm, 85.0±11.3%, respectively, while the best 3D U-DL achieved 90.6±11.9%, -2.3±21.7%, 11.5±18.5%, 3.1±3.2 mm, and 82.6±14.2%, respectively. For comparison, the corresponding values obtained with our previous DCNN-LS method were 81.9±12.1%, 10.2±16.2%, 14.0±13.0%, 3.6±2.0 mm, and 76.2±11.8%, respectively, for the same test set. The UDL model provided highly accurate bladder segmentation and was more automated than the previous approach.

Automatic anatomy partitioning of the torso region on CT images by using a deep convolutional network with majority voting

Xiangrong Zhou, Takuya Kojima, Song Wang, et al.

Show abstract

We propose an automatic approach to anatomy partitioning on three-dimensional (3D) computed tomography (CT) images that divides the human torso into several volumes of interest (VOIs) according to anatomical definition. In the proposed approach, a deep convolutional neural network (CNN) is trained to automatically detect the bounding boxes of organs on two-dimensional (2D) sections of CT images. The coordinates of those boxes are then grouped so that a vote on a 3D VOI (called localization) for each organ can be obtained separately. We applied this approach to localize the 3D VOIs of 17 types of organs in the human torso and then evaluated the performance of the approach by conducting a four-fold crossvalidation using a dataset consisting of 240 3D CT scans with the human-annotated ground truth for each organ region. The preliminary results showed that 86.7% of the 3D VOIs of the 3177 organs in the 240 test CT images were localized with acceptable accuracy (mean of Jaccard indexes was 72.8%) compared to that of the human annotations. This performance was better than that of the state-of-the-art method reported recently. The experimental results demonstrated that using a deep CNN for anatomy partitioning on 3D CT images was more efficient and useful compared to the method used in our previous work.

Automatic multi-organ segmentation in thorax CT images using U-Net-GAN

Yang Lei, Yingzi Liu, Xue Dong, et al.

Show abstract

We propose a method to automatically segment multiple organs at risk (OARs) from routinely-acquired thorax CT images using generative adversarial network (GAN). Multi-label U-Net was introduced in generator to enable end-to-end segmentation. Esophagus and spinal cord location information were used to train the GAN in specific regions of interest (ROI). The probability maps of new CT thorax multi-organ were generated by the well-trained network and fused to reconstruct the final contour. This proposed algorithm was evaluated using 20 patients' data with thorax CT images and manual contours. The mean Dice similarity coefficient (DSC) for esophagus, heart, left lung, right lung and spinal cord was 0.73±0.04, 0.85±0.02, 0.96±0.01, 0.97±0.02 and 0.88±0.03. This novel deep-learning-based approach with the GAN strategy can automatically and accurately segment multiple OARs in thorax CT images, which could be a useful tool to improve the efficiency of the lung radiotherapy treatment planning.

Polyp segmentation and classification using predicted depth from monocular endoscopy

Faisal Mahmood, Ziyun Yang, Richard Chen, et al.

Show abstract

Colorectal cancer is the fourth leading cause of cancer deaths worldwide, the standard for detection and prevention is the identification and removal of premalignant lesions through optical colonoscopy. More than 60% of colorectal cancer cases are attributed to missed polyps. Current procedures for automated polyp detection are limited by the amount of data available for training, underrepresentation of non-polypoid lesions and lesions which are inherently difficult to label and do not incorporate information about the topography of the surface of the lumen. It has been shown that information related to depth and topography of the surface of the lumen can boost subjective lesion detection. In this work, we add predicted depth information as an additional mode of data when training deep networks for polyp detection, segmentation and classification. We use conditional GANs to predict depth from monocular endoscopy images and fuse these predicted depth maps with RGB white light images in feature space. Our empirical analysis demonstrates that we achieve state-of-the-art results with RGB-D polyp segmentation with a 98% accuracy on four different publically available datasets. Moreover, we demonstrate a 87.24% accuracy on lesion classification. We also show that our networks can domain adapt to a variety of different kinds of data from different sources.

Computer-aided classification of colorectal polyps using blue-light and linked-color imaging

Thom Scheeve, Ramon-Michel Schreuder, Fons van der Sommen, et al.

Show abstract

Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths. Since most CRCs develop from colorectal polyps (CRPs), accurate endoscopic differentiation facilitates decision making on resection of CRPs, thereby increasing cost-efficiency and reducing patient risk. Current classification systems based on whitelight imaging (WLI) or narrow-band imaging (NBI) have limited predictive power, or they do not consider sessile serrated adenomas/polyps (SSA/Ps), although these cause up to 30% of all CRCs. To better differentiate adenomas, hyperplastic polyps, and SSA/Ps, this paper explores the feasibility of two approaches: (1) an accurate computer-aided diagnosis (CADx) system for automated diagnosis of CRPs, and (2) novel endoscopic imaging techniques like blue-light imaging (BLI) and linked-color imaging (LCI). Two methods are explored to predict histology: (1) direct classification using a support vector machine (SVM) classifier, and (2) classification via a clinical classification model (WASP classification) combined with an SVM. The use of probabilistic features of SVM facilitates objective quantification of the detailed classification process. Automated differentiation of colonic polyp subtypes reaches accuracies of 78−96%, thereby improving medical expert results by 4−20%. Diagnostic accuracy for directly predicting adenomatous from hyperplastic histology reaches 93% and 87−90% using NBI and the novel BLI and LCI techniques, respectively, thus improving medical expert results by 26% and 20−23%, respectively. Predicting adenomatous histology in diminutive polyps with high confidence yields NPVs of 100%, clearly satisfying the PIVI guideline recommendation on endoscopic innovations (≥90% NPV). Our CADx system outperforms clinicians, while the novel BLI technique adds performance value.

Ensemble 3D residual network (E3D-ResNet) for reduction of false-positive polyp detections in CT colonography

Tomoki Uemura, Janne J. Näppi, Huimin Lu, et al.

Show abstract

We developed a novel ensemble three-dimensional residual network (E3D-ResNet) for the reduction of false positives (FPs) in computer-aided detection (CADe) of polyps on CT colonography (CTC). To capture the volumetric multiscale information of CTC images, each polyp candidate was represented with three different sizes of volumes of interest (VOIs), which were enlarged to a common size and were individually subjected to three 3D-ResNets. These 3D-ResNets were trained to calculate three polyp-likelihood probabilities, p1, p2 and p3, corresponding to each input VOI. The final polyp likelihood, p, was obtained as the maximum of p1, p2 and p3. We compared the classification performance of the E3D-ResNet with that of a non-ensemble 3D-ResNet, ensemble 2D-ResNet, and ensemble of 2D- and 3D-convolutional neural network (CNN) models. All models were trained and evaluated with 21,021 VOIs of polyps and 19,557 VOIs of FPs that were sampled with data augmentation from the CADe detections on the CTC data of 20 patients. We evaluated the classification performance of the models with receiver operating characteristics (ROC) analysis using cross-validation, where the area under the ROC curve (AUC) was used as the figure of merit. Preliminary results showed that AUC value (0.98) of the E3D-ResNet was significantly higher than that of the reference models (P < 0.001), indicating that the E3D-ResNet has the potential of substantially reducing the FPs in CADe of polyps on CTC.

A local geometrical metric-based model for polyp classification

Weiguo Cao, Marc J. Pomeroy, Perry J. Pickhardt, et al.

Show abstract

Inspired by the co-occurrence matrix (CM) model for texture description, we introduce another important local metric, gradient direction, into polyp descriptor construction. Gradient direction and its two independent components, azimuth angle and polar angle, are used instead of the gray-level intensity to calculate the CMs of the Haralick model. Thus we obtain three new models: azimuth CM model (ACM), polar CM model (PCM) and gradient direction CM model (GDCM). These three new models share similar parameters with the traditional gray-level CM (GLCM) model which has 13 directions for volumetric data and 4 directions for image slices. To train and test the data, random forest method is employed. These three models are affected by angle quantization and, therefore, more than 10 experimental schemes are designed to get reasonable parameters for angle discretization. We compared our three models (ACM, PCM, GDCM) with the traditional GLCM model, a gradient magnitude CM (GMCM) model, and local anisotropic gradient orientations CM model (CoLIAge). Experimental results showed that our three models exceed the other three methods (GLCM, GMCM, CoLIAge) by their receiver operating characteristic (ROC) curves, AUC (area under the ROC curve) scores and accuracy values. Based on their AUC and accuracy, ACM should be the first choice for polyp classification.

Polyp-size classification with RGB-D features for colonoscopy

Hayato Itoh, Holger R. Roth, Yuichi Mori, et al.

Show abstract

Measurement of a polyp size is an essential task in colon cancer screening, since the polyp-size information has critical roles for decision on colonoscopy. However, an estimation of a polyp size from a single view of colonoscope without a measurement device is quite difficult even for expert physicians. To overcome this difficulty, automated size estimation techniques would be desirable for clinical scenes. This paper presents polyp-size classification method with a single colonoscopic image for colonoscopy. Our proposed method estimates depth information from a single colonoscopic image with trained model and utilises the estimated information for the classification. In our method, the model for depth information is obtained by deep learning with colonoscopic videos. Experimental results show the achievement of binary and trinary polyp-size classification with 79% and 74% accuracy from a single still image of a colonoscopic movie.

Handling label noise through model confidence and uncertainty: application to chest radiograph classification

Erdi Calli, Ecem Sogancioglu, Ernst Th. Scholten, et al.

Show abstract

In this work we analyze the effect of label noise in training and test data when performing classification experiments on chest radiographs (CXRs) with modern deep learning architectures. We use ChestXRay14, the largest publicly available CXR dataset. We simulate situs inversus by horizontal flipping of the CXRs, allowing us to precisely control the amount of label noise. We also perform experiments in classifying emphysema using the ChestXRay14 provided labels that are known to be noisy. Our situs inversus experiments confirm results from the computer vision literature that deep learning architectures are relatively robust but not completely insensitive to label noise in the training data: without or with very low noise, classification results are near perfect; 16% and 32% training label noise only lead to a 1.5% and 4.6% drop in accuracy. We investigate two metrics that could be used to identify test samples that have an incorrect label: model confidence and model uncertainty. We show, in an observer study with an experienced chest radiologist, that both measures are effective in identifying samples in ChestXRay14 that are erroneously labeled for the presence of emphysema.

Classification of chest CT using case-level weak supervision

Ruixiang Tang, Fakrul Islam Tushar, Songyue Han, et al.

Show abstract

Our goal is to investigate using only case-level labels extracted automatically from radiology reports to construct a multi-disease classifier for CT scans with deep learning method. We chose four lung diseases as a start: atelectasis, pulmonary edema, nodule and pneumonia. From a dataset of approximately 5,000 chest CT cases from our institution, we used a rule-based model to analyze those radiologist reports, labeling disease by text mining to identify cases with those diseases. From those results, we randomly selected the following mix of cases: 275 normal, 170 atelectasis, 175 nodule, 195 pulmonary edema, and 208 pneumonia. As a key feature of this study, each chest CT scan was represented by only 10 axial slices (taken at regular intervals through the lungs), and furthermore all slices shared the same label based on the radiology report. So the label was weak, because often disease will not appear in all slices. We used ResNet-50 as our classification model, with 4-fold cross-validation. Each slice was analyzed separately to yield a slice-level performance. For each case, we chose the 5 slices with highest probability and used their mean probability as the final patient-level probability. Performance was evaluated using the receiver operating characteristic (ROC) area under the curve (AUC). For the 4 diseases separately, the slice-based AUCs were 0.71 for nodule, 0.79 for atelectasis, 0.96 for edema, and 0.90 for pneumonia. The patient-based AUC were 0.74 for nodule, 0.83 for atelectasis, 0.97 for edema, and 0.91 for pneumonia. We backprojected the activations of last convolution layer and the weights from prediction layer to synthesize a heat map. This heat map could be an approximate disease detector, also could tell us feature patterns which ResNet-50 focus on.

Deep adversarial one-class learning for normal and abnormal chest radiograph classification

Yu-Xing Tang, You-Bao Tang, Mei Han, et al.

Show abstract

In machine learning, one-class classification tries to classify data of a specific category amongst all data, by learning from a training set containing only the data of that unique category. In the field of medical imaging, one-class learning can be developed to model only normality (similar to semi-supervised classification or anomaly detection), since the samples of all possible abnormalities are not always available, as some forms of anomaly are very rare. The one-class learning approach can be naturally adapted to the way radiologists identify anomalies in medical images: usually they are able to recognize lesions by comparing them with normal images and surroundings. Inspired by the traditional one-class learning approach, we propose an end-to-end deep adversarial one-class learning (DAOL) approach for semi-supervised normal and abnormal chest radiograph (X-ray) classification, by training only from normal X-ray images. The DAOL framework consists of deep convolutional generative adversarial networks (DCGAN) and an encoder at each end of the DCGAN. The DAOL generator is able to reconstruct the normal X-ray images while not adequate for well reconstructing the abnormalities in abnormal X-rays in the testing phase, since only the normal X-rays were used for training the network, and the abnormal images with various abnormalities were unseen during training. We propose three adversarial learning objectives which optimize the training of DAOL. The proposed network achieves an encouraging result (AUC 0.805) in classifying normal and abnormal chest X-rays on the challenging NIH Chest X-ray dataset in a semi-supervised setting.

Image biomarkers for quantitative analysis of idiopathic interstitial pneumonia

Young-Wouk Kim, Sebastián Roberto Tarando, Pierre-Yves Brillet, et al.

Show abstract

As a subclass of interstitial lung diseases, fibrosing idiopathic interstitial pneumonia (IIP), whose cause is mostly unknown, is a continuous and irreversible process, manifesting as progressive worsening of lung function. Quantifying the evolution of the patient status imposes the development of automated CAD tools to depict the pathology occurrence in the lung but also an associated severity degree. In this paper we propose several biomarkers for IIP quantification, associating spatial localization of the disease using lung texture classification, and severity measures in relation with vascular and bronchial remodeling which correlate with clinical parameters. We follow-up our work on lung texture analysis based on convolutional neural networks (reporting an increased performance in sensitivity, specificity and accuracy) on an enlarged training/testing database (110/20 patients respectively). The area under the curve (AUC:2-6) for vessel calibers distribution between 2-6 mm radii (evaluated in 70 patients) showed up as a promising biomarker of the severity of the disease, independently of the extent of lesions, correlating with the composite physiologic index. In the same way, normalized airway lobe length, normalized airway lobe volume and the score of distal airway caliber deviation from the physiologically power decrease law correlated with radiologic severity score, manifesting as potential biomarkers of traction bronchiectasis (assessment in 18 patients).

Effect of diversity of patient population and acquisition systems on the use of radiomics and machine learning for classification of 2,397 breast lesions

Heather M. Whitney, Yu Ji, Hui Li, et al.

Show abstract

Radiomic features extracted from dynamic contrast-enhanced magnetic resonance (DCE-MR) images have been previously shown to be useful for classification of breast lesions as benign or malignant. In this study, we investigated the performance of radiomics in distinguishing between lesion molecular subtypes across two populations. Clinical DCE-MR images of 847 breast lesions in the United States and 1,550 breast lesions in China were collected under HIPAA and IRB compliance. The radiomics workstation automatically segmented lesions using a fuzzy C-means method and extracted thirty-eight radiomic features describing size, shape, morphology, kinetics, and texture, using previously reported methods. Binary classification pairs included benign versus malignant, benign versus each molecular subtype, and each molecular subtype versus the other molecular subtypes grouped together. Stepwise feature selection and linear discriminant analysis with five-fold cross-validation was used for each population in each classification task to determine the posterior probability of each lesion being in the positive class. The area under the receiver operating characteristic curve (AUC) was determined using the conventional binormal model. The AUC was also determined for each feature in each classification task. Classification performance for each task was compared between populations using superiority testing relative to the difference in AUC. Three out of nine classification tasks (benign versus luminal A (p = 0.008), non-luminal B versus luminal B (p = 0.048) and non-HER2-enriched versus HER2-enriched (p = 0.001)) demonstrated significant difference in performance between the two populations. Differences in classification performance and potential for harmonization may be affected by population biology (i.e., distribution of molecular subtypes) and scanner acquisition systems.

Radiogenomic characterization of response to chemo-radiation therapy in glioblastoma is associated with PI3K/AKT/mTOR and apoptosis signaling pathways

Niha Beig, Prateek Prasanna, Virginia Hill, et al.

Show abstract

Over 40% of Glioblastoma (GBM) patients do not respond to conventional chemo-radiation therapy (chemo-RT) and relapse within 6-9 months, suggesting that they may have been better suited for other targeted therapies. Currently, there are no biomarkers that can reliably predict patients' response to chemo-RT in GBM. We seek to evaluate the role of radiomic markers on pre-treatment MRI to predict GBM patients' response to chemo-RT. Further, to establish a biological underpinning of the radiomic markers, we identified radiogenomic correlates of the radiomic markers with signaling pathways that are known to impact chemo-RT response. A total of 49 studies with Gd-T1w, T2w, FLAIR MRI protocols and corresponding gene expression were obtained from Ivy GAP (n=29) and TCIA (n=20) databases. Responders (n=22) were patients with progression-free survival (PFS) of at least ≥ 6 months, while non-responders (n=27) had PFS < 6 months. 13 molecular pathways were curated from the MSigDB Hallmark gene set. For each study, enhancing tumor on MRI was manually segmented by an expert reader. 1390 3D-radiomic features (Gabor, Haralick, and Laws energy) were extracted from this region across all MRI protocols. Joint mutual information identified the 3 most predictive radiomic features in the training set (n=29). This was followed by correlating these features with the gene set enrichment analysis (GSEA) score computed for every pathway. A support vector machine (SVM) classifier was trained using these 3 features and validated on a test set (n=20) that resulted in an Area Under Curve (AUC) of 0.71 to distinguish chemo-RT responders from non-responders. Laws energy descriptor (characterizing appearance of edges, spots, and ripples) from the enhancing region on Gd-T1w MR images were found to best predict chemo-RT response. Radiogenomic correlation with GSEA scores revealed that these radiomic features were significantly associated with PI3K/AKT/mTOR (promotes cell proliferation, survival) and apoptosis (programmed cell death) signaling pathways (p < 0.03, False Discovery Rate = 5%).

Identifying optimal input using multilevel radiomics and nested cross-validation for predicting pulmonary function in lung cancer patients treated with radiotherapy

Sang Ho Lee, Peijin Han, Russell K. Hales, et al.

Show abstract

Radiomics is a promising approach to identify patients at high risk of having pulmonary dysfunction caused by radiotherapy. This study aims to identify optimal radiomic input features for predicting pulmonary function. Forced expiratory volume in first second (FEV1) and forced vital capacity (FVC) were measured for 257 patients between 3 months prior to and 1 week after the first radiotherapy. FEV1/FVC ratio dichotomized at 70% was used as a target variable. Each patient had a radiotherapy planning CT and associated contours of gross tumor volume and left/right lungs. A total of 2,658 radiomic features were extracted and categorized into five levels: shape (S), first- (L1), second- (L2) and higher-order (L3) local texture, and global texture (G) features, as well as four multilevel groups: S+L1, S+L1+L2, S+L1+L2+L3, and S+L1+L2+L3+G. Nested cross-validation (NCV) was used to identify optimal input features. Cross-validated glmnet models optimized with unilevel or multilevel features were used to assess predictive performance on outer CV test sets. In unilevel analysis, the highest test AUC of 0.743±0.067 was obtained from NCV models optimized with L1 features. The best performance was achieved from NCV models optimized with S+L1+L2 features with AUC of 0.752±0.063. Paired Wilcoxon signed rank test results showed that AUC values of NCV models optimized with S, L2, L3, G or S+L1+L2+L3 features were statistically significantly different from those optimized with S+L1+L2 features (P<0.05). The multilevel analysis strategy will help to handle and optimize radiomic input features.

Texture-based prostate cancer classification on MRI: how does inter-class size mismatch affect measured system performance?

R. Alfano, D. Soetemans, G. S. Bauman, et al.

Show abstract

Multi-parametric MRI (mp-MRI) has shown to be useful in contemporary prostate biopsy procedures. Unfortunately, mp-MRI is relatively complex to interpret and suffers from inter-observer variability in lesion localization and grading. Computer-aided diagnosis (CAD) systems have been developed as a potential solution and have been shown to boost diagnostic accuracy. We measured the accuracy of a CAD model using a systematic sampling algorithm to remove any spatial bias present in our input. We trained four classifiers with 1–10 features chosen by forward feature selection for each and reported the system with the highest AUC in both the peripheral zone and central gland. Furthermore, we investigated the effect on system performance by varying the minimum tumour size threshold and by varying the average difference in area between malignant and healthy tissue samples. The CAD model was able to classify malignant vs. benign tissue with accuracies competitive with those reported in the literature. Eroding healthy tissue ROIs positively biased the system’s performance for the PZ, but no such trend was found in the CG. Once fully validated, this system has the potential to imp

Deformation heterogeneity radiomics to predict molecular subtypes of pediatric Medulloblastoma on routine MRI

Sukanya Iyer, Marwa Ismail, Benita Tamrazi, et al.

Show abstract

Medulloblastoma (MB) is the most common malignant brain tumor in children. Currently, "one-size-fits-all" radiation and chemotherapy treatment regimen is employed for treating MB patient, causing at least some children to undergo highly aggressive and in some cases, inadequate radiation therapy. Consequently, there is a need for prognostic and predictive tools for identifying disease aggressiveness and ultimately which patients with MB may be able to benefit from de-escalation of therapy. Genomic characterization of MB has recently identified 4 distinct molecular subgroups: Sonic Hedgehog (SHH) , Wingless (WNT) , Group 3, Group 4 each exhibiting different clinical behavior. The molecular sub-types have unique risk-profiles and outcomes, and patients could potentially benefit from sub-group specific treatments. However, the transition of these molecular MB subtypes into clinical practice has been limited due to challenges in availability of molecular profiling in most hospitals, as well as variability in clinical assessment. In this work, we present a radiomic feature that captures subtle tissue deformations caused due to the impact of tumor growth on the normal-appearing brain around tumor (BAT), to distinguish molecular sub-types of MB. First, we obtain voxel-wise deformation magnitude from the deformation orientations, after registering Gadolinium (Gd)-enhanced T1-w MRI scan for every study to a normal age-specific T1w MRI template. Deformation statistics are then computed within every 5mm annular BAT region, 0 < d < 60mm, where d is the distance from the tumor infiltrating edge, to capture subtle localized deformation changes around the tumor. Our results using multi-class comparison via one-way ANOVA and post-hoc comparison showed significant differences across deformation magnitudes obtained for Group 3, Group 4, and SHH molecular sub-types, observed up to 15-mm outside the infiltrating edge. Our feasibility results suggest that the subtle deformation features in BAT observed on routine Gd-T1w MRI may potentially serve as surrogate markers to non-invasively characterize molecular sub-types of pediatric MB.

Quantitative vessel tortuosity radiomics on baseline non-contrast lung CT predict response to immunotherapy and are prognostic of overall survival

Mehdi Alilou, Pranjal Vaidya, Mohammadhadi Khorrami, et al.

Show abstract

Recently immune-checkpoint inhibitors have demonstrated promising clinical efficacy in patients with advanced non-small cell lung cancer (NSCLC). However, the response rates to immune checkpoint blockade drugs remain modest (45% in the front line setting and 20% in the second line setting). Consequently, there is an unmet need to develop accurate, validated biomarkers to predict which NSCLC patients will benefit from immunotherapy. While there has been recent interest in evaluating the role of texture and shape patterns of the nodule on CT scans to predict response to checkpoint inhibitors for NSCLC, our group has shown that nodule vessel morphology might also play a role in determining tumor aggressiveness and behavior. In this work we present a new approach using quantitative vessel tortuosity (QVT) radiomics, to predict response to checkpoint inhibitors and overall survival for patients with NSCLC treated with Nivolumab (a PD1 inhibitor) on a retrospective data set of 111 patients (D1) including 56 responders and 45 non-responders. Patients who did not receive Nivolumab after 2 cycles due to a lack of response or progression as per Response Evaluation Criteria in Solid Tumors (RECIST) were classified as non-responders, patients who had radiological response or stable disease as per RECIST were classified as responders. On D₁, in conjunction with a linear discriminant analysis (LDA) classifier the QVT features were able to predict response to immunotherapy with an AUC of 0.73_0.04. Kaplan Meier analysis showed significant difference of overall survival between patients with low risk and high risk defined by the radiomics classifier (p-value = 0.004, HR= 2.29, 95% CI= 1.35 - 3.87).

The U-net and its impact to medical imaging (Conference Presentation)

Bernardino Romera-Paredes

Show abstract

The U-net has become the predominant choice when facing any medical image segmentation task. This is due to its high performance in many different medical domains. In this talk, I will introduce the U-net, and I will present three projects from DeepMind Health Research that use the U-net to address different challenges. The first project, a collaboration with University College London Hospital, deals with the challenging task of the precise segmentation of radiosensitive head and neck anatomy in CT scans, an essential input for radiotherapy planning. The second project, together with Moorfields Eye Hospital, developed a system that analyses 3D OCT (optical coherence tomography) eye scans to provide referral decisions for patients. The performance was on par with world experts with over 20 years experience. Finally, I will focus on the third project, which deals with the segmentation of ambiguous images. This is of particular relevance in medical imaging where ambiguities can often not be resolved from the image context alone. We propose a combination of a U-net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible segmentation map hypotheses for a given ambiguous image. We show that each hypothesis provides a globally consistent segmentation, and that the probabilities of these hypotheses are well calibrated.

Weakly-supervised deep learning of interstitial lung disease types on CT images

Chenglong Wang, Takayasu Moriya, Yuichiro Hayashi, et al.

Show abstract

Accurate classification and precise quantification of interstitial lung disease (ILD) types on CT images remain important challenges in clinical diagnosis. Multi-modality image information is required to assist diagnosing diseases. To build scalable deep-learning solutions for this problem, how to take full advantage of existing large-scale datasets in modern hospitals has become a critical task. In this paper, we present DeepILD, as a novel computer-aided diagnostic framework to address the ILD classification task only from single modality (CT image) using a deep neural network. More specifically, we propose integrating spherical semi-supervised K- means clustering and convolutional neural networks for ILD classification and disease quantification. We firstly use semi-supervised spherical K-means to divide the CT lung area into normal and abnormal sub-regions. A convolutional neural network (CNN) is subsequently invoked to perform training using image patches extracted from the abnormal regions. Here, we focus on the classification of three chronic fibrosing ILD types: idiopathic pulmonary fibrosis (IPF), idiopathic non-specific interstitial pneumonia (iNSIP), and chronic hypersensitivity pneumonia (CHP). Excellent classification accuracy has been achieved using a dataset of 188 CT scans; in particular, our IPF classification reached about 88% accuracy.

Efficient learning in computer-aided diagnosis through label propagation

Samuel Berglin, Eura Shin, Jacob Furst, et al.

Show abstract

Computer-Aided Diagnosis (CADx) systems can be used to provide second opinions in the medical diagnostic process. These CADx systems are expensive to build as they require a large amount of correctly labeled example data. In order to ensure the accuracy of a training label, a radiograph may be assessed by multiple radiologists, increasing the time and money necessary to build these diagnostic systems. In this paper, we minimize the cost necessary to train CADx systems while accounting for unreliable labels by reducing label uncertainty. We introduce a method which reduces the cost required to build a CADx system while improving the overall accuracy and demonstrate it on the Lung Image Database Consortium (LIDC) database. We exploit similarities between images by clustering image features of lung nodule CT scans and propagating a single label throughout the cluster. By informatively choosing better labels through clustering, this method achieves a stronger accuracy (5.2% increase) while using fewer labels (29% less) compared to a state of the art label saving technique designed for this medical dataset.

Computer-aided CT image features improving the malignant risk prediction in pulmonary nodules suspicious for lung cancer

Y. Kawata, N. Niki, M. Kusumoto, et al.

Show abstract

Screening for lung cancer with low-dose computed tomography (CT) has led to increased recognition of small lung cancers and is expected to increase the rate of detection of early-stage lung cancer. Major concerns in the implementation of the CT screening of large populations include determining the appropriate management of pulmonary nodules found on a scan. The identification of patients with early-stage lung cancer who have a higher risk for relapse and who require more aggressive surveillance has been a target of the intense investigation. This study was performed to investigate whether the computer-aided CT image features could improve the discrimination ability of lung cancer prediction models for nodules in whom malignancy was suspected.

Augmenting LIDC dataset using 3D generative adversarial networks to improve lung nodule detection

Chufan Gao, Stephen Clark, Jacob Furst, et al.

Show abstract

One drawback of Computer Aided Detection (CADe) systems is the large amount of data needed to train them, which may be expensive in the medical field. We propose using a generative adversarial network (GAN) as a potential data augmentation strategy to generate more training data to improve CADe. In our preliminary results, using the NIH/NCI Lung Image Database Consortium, we obtained a higher sensitivity when training a CADe system on our augmented lung nodule 3D data than training it without. We show that GANs are a viable method of data augmentation for lung nodule detection and are a promising area of potential research in the CADe domain.

A semi-supervised CNN learning method with pseudo-class labels for vascular calcification detection on low dose CT scans

Jiamin Liu, Jianhua Yao, Mohammadhadi Bagheri, et al.

Show abstract

The recent rapid success of deep convolutional neural networks (CNN) on many computer vision tasks largely benefits from the well-annotated Pascal VOC, ImageNet, and MS COCO datasets. However, it is challenging to get ImageNetlike annotations (1000 classes) in the medical imaging domain due to the lack of clinical training in the lay crowdsourcing community. We address this problem by presenting a semi-supervised training method for neural networks with true-class and pseudo-class (un-annotated class) labels on partially annotated training data. The true-class labels are supervised annotations from clinical professionals. The pseudo-class labels are unsupervised clustering of unannotated data. Our method rests upon the hypothesis of better coherent annotations with discriminative classes leading to better trained CNN models. We validated our method on extra-coronary calcification detection in low dose CT scans. The CNN trained with true-class and 10 pseudo-classes achieved a 78.0% sensitivity at 10 false positives per scan (0.3 false positive per slice), which significantly outperformed the CNN trained with true-class only (sensitivity=25.0% at 10 false positives per patient).

Variability in radiomics features among iDose4 reconstruction levels

Joseph J. Foy, Mena Shenouda, Sahar Ramahi, et al.

Show abstract

The assessment of tissues depicted in medical images using radiomics has been shown to depend on a number of imageacquisition and reconstruction parameters. This study assessed the variability in radiomics features due to variations in iDose⁴ reconstruction level. A database of 109 normal head and neck (HN) computed tomography (CT) scans was obtained for analysis with three levels of iDose⁴ reconstruction: 2, 4, and 6. Various two-dimensional regions of interest (ROIs) containing different tissues were manually contoured including the globes, sternocleidomastoid muscle (SCM), thyroid, clivus, and supraclavicular subcutaneous fat. A square ROI containing only air was also contoured. Each region was contoured at its largest axial cross section by area. Pixel information was extracted from each region in each patient for each iDose⁴ reconstruction, and 142 texture features were calculated using an in-house texture package. Differences in radiomics features between iDose⁴ levels were assessed using parametric paired Student’s t-tests or non-parametric Wilcoxon signed-rank tests for each tissue type after assessing normality using the Shapiro-Wilk test. Relative agreement among iDose⁴ reconstructions was quantified using the Spearman’s rank correlation coefficient. For all ROIs besides those containing only air, most features differed significantly between pairwise combinations of the three iDose⁴ levels. For air, all features were robust to differences in iDose⁴ levels. Therefore, if radiomics studies include images reconstructed using different iDose⁴ levels, robust radiomics features should be used. Additionally, to aid in the validation of radiomics research, iDose⁴ reconstruction levels should be reported.

Development and validation of a radiomics-based method for macrovascular invasion prediction in hepatocellular carcinoma with prognostic implication

Jingwei Wei, Sirui Fu, Shauitong Zhang, et al.

Show abstract

In hepatocellular carcinoma (HCC), more than one third of patients were accompanied by macrovascular invasion (MaVI) during diagnosis and treatment. HCCs with MaVI presented with aggressive tumor behavior and poor survival. Early identification of HCCs at high risk of MaVI would promote adequate preoperative treatment strategy making, so as to prolong the patient survival. Thus, we aimed to develop a computed tomography (CT)-based radiomics model to preoperatively predict MaVI status in HCC, meanwhile explore the prognostic prediction power of the radiomics model. A cohort of 452 patients diagnosed with HCC was collected from 5 hospitals in China with complete CT images, clinical data, and follow-ups. 15 out of 708 radiomic features were selected for MaVI prediction using LASSO regression modeling. A radiomics signature was constructed by support vector machine based on the 15 selected features. To evaluate the prognostic power of the signature, Kaplan-Meier curves with log-rank test were plotted on MaVI occurrence time (MOT), progression free survival (PFS) and overall survival (OS). The radiomics signature showed satisfactory performance on MaVI prediction with area under curves of 0.885 and 0.770 on the training and external validation cohorts, respectively. Patients could successfully be divided into high- and low-risk groups on MOT and PFS with p-value of 0.0017 and 0.0013, respectively. Regarding to OS, the Kaplan-Meier curve did not present with significant difference which may be caused by non-uniform following treatments after disease progression. To conclude, the proposed radiomics model could facilitate MaVI prediction along with prognostic implication in HCC management.

Efficient detection of vascular structures using locally connected filtering

Amele Florence Kouvahe, Catalin Fetita

Show abstract

Vascular segmentation is often required in medical image analysis for various imaging modalities. Despite the rich literature in the field, the proposed methods need most of the time adaptation to the particular investigation and may sometimes lack the desired accuracy in terms of true positive and false positive detection rate. This paper proposes a general method for vascular segmentation based on locally connected filtering applied in a multiresolution scheme. The filtering scheme performs progressive detection and removal of the vessels from the image relief at each resolution level, by combining directional 2D-3D locally connected filters (LCF). An important property of the LCF is that it preserves (positive contrasted) structures in the image if they are topologically connected with other similar structures in their local environment. Vessels, which appear as linear structures, can be filtered out by an appropriate LCF set-up which will minimally affect sheet-like structures. The implementation in a multiresolution framework allows dealing with different vessel sizes. The outcome of the proposed approach is illustrated on two anatomical territories - lung and liver. It is shown that besides preserving high accuracy in detecting small vessels, the proposed technique is less sensitive with respect to noise and the presence of pathologies of positive-contrast appearance on the images. The detection accuracy is compared with a previously developed approach on the 20 patient database from the VESSEL12 challenge.

Deep learning for automated screening and semantic segmentation of age-related and juvenile atrophic macular degeneration

Ziyuan Wang, SriniVas R. Sadda, Zhihong Hu

Show abstract

Atrophic age-related macular degeneration (AMD) or geographic atrophy (GA), and atrophic juvenile macular degeneration (JMD) or Stargardt atrophy, have been proven to be the leading cause of blindness respectively in older adults, and in children and young adults. Automated techniques of timely screening and detection of such atrophic diseases would appear to be of critical importance in prevention and early treatment of vision loss. We first developed a deep learning-based automated screening system using the residual networks (ResNet), which can differentiate the eyes with atrophic AMD and JMD from normal eyes on fundus autofluorescene (FAF) images. We further developed another deep learning-based automated system to segment the atrophic AMD and JMD lesions using a fully convolutional neural network - U-Net. Transfer learning based on a pre-trained model was applied for ResNet to facilitate the algorithm training, and excessive data augmentation techniques for both ResNet and U-Net were applied to enhance the algorithm generalization ability. In total, 320 FAF images from normal subjects, 320 with atrophic AMD, and 100 with atrophic JMD were included. The performance of the algorithms were evaluated by comparing with manual gradings by reading center graders. For the screening system, there was no reported algorithm and our algorithm demonstrated a high screening accuracy with 0.98 for atrophic AMD and 0.95 for atrophic JMD. For the segmentation system, our algorithm presented a high overlapping ratio with 0.89 ± 0.06 for atrophic AMD and 0.78 ± 0.17 for atrophic JMD.

Improved interpretability for computer-aided severity assessment of retinopathy of prematurity

Mara Graziani, James M. Brown, Vincent Andrearczyk, et al.

Show abstract

Computer-aided diagnosis tools for Retinopathy of Prematurity (ROP) base their decisions on handcrafted retinal features that highly correlate with expert diagnoses, such as arterial and venous curvature, tortuosity and dilation. Deep learning leads to performance comparable to those of expert physicians, albeit not ensuring that the same clinical factors are learned in the deep representations. In this paper, we investigate the relationship between the handcrafted and the deep learning features in the context of ROP diagnosis. Average statistics on the handcrafted features for each input image were expressed as retinal concept measures. Three disease severity grades, i.e. normal, pre-plus and plus, were classified by a deep convolutional neural network. Regression Concept Vectors (RCV) were computed in the network feature space for each retinal concept measure. Relevant concept measures were identified by bidirectional relevance scores for the normal and plus classes. Results show that the curvature, diameter and tortuosity of the segmented vessels are indeed relevant to the classification. Among the potential applications of this method, the analysis of borderline cases between the classes and of network faults, in particular, can be used to improve the performance.

Reproducibility of CT-based texture feature quantification of simulated and 3D-printed trabecular bone: influence of noise and reconstruction kernel

Nada Kamona, Qin Li, Benjamin Berman, et al.

Show abstract

Computed tomography (CT) based texture feature measurements of trabecular bone may be used as imaging biomarkers for bone health assessment. This study investigated the effects of image noise and reconstruction kernels on the reproducibility of CT-based texture features of simulated bone images and their correlation to underlying physical bone microarchitecture. We used the Voronoi tessellation method to create lattices and applied morphological processing, including stochastic edge pruning, plates filling, dilating, smoothing and thresholding, to achieve trabecular bone-like structures with controllable trabecular bone parameters. The simulated structures of various trabecular bone thicknesses were passed through an imaging model: CT images were created at a pixel size of 0.24 x 0.24 mm2 by convolving the structure with a proper modulation transfer function at multiple cutoff frequency values representing different sharpness levels associated with clinical CT reconstruction kernels. Noise was added using a range of standard deviations to simulate different dose levels. We examined 39 texture features’ correlation with trabecular bone thickness and assessed their reproducibility using the Concordance Correlation Coefficient metric. Our preliminary results show the realism of our simulation model when compared to the CT scans of 3D-printed phantoms. We found that first and second-order image texture feature measurements correlated better with trabecular thickness compared to higher order features. However, the reproducibility of texture features across different reconstruction kernels and noise levels was limited. High reproducibility required the use of very sharp kernels and similar noise levels across images.

Fusing attributes predicted via conditional GANs for improved skin lesion classification (Conference Presentation)

Faisal Mahmood, Jeremiah Johnson, Ziyun Yang, et al.

Show abstract

Skin cancer is the most commonly diagnosed cancer worldwide. It is estimated that there are over 5 million cases of skin cancer are diagnosed in the United States every year. Although less than 5% of all diagnosed skin cancers are melanoma it accounts for over 70% of skin cancer-related deaths. In the past decade, the number of melanoma cancer cases has increased by 53%. Recently, there has been significant work on segmentation and classification of skin lesions via deep learning. However, there is limited work on identifying attributes and clinically-meaningful visual skin lesion patterns from dermoscopic images. In this work, we propose to use conditional GANs for skin lesion segmentation and attribute detection and use these attributes to improve skin lesion classification. The proposed conditional GAN framework can generate segmentation and attribute masks from RGB dermoscopic images. The adversarial-image-to-image translation style architecture forces the generator to learn both local and global features. The Markovian discriminator classifies pairs of image and segmentation labels as being real or fake. Unlike previous approaches, such an architecture not only learns the mapping from dermoscopic images image to segmentation and attribute masks but also learns an optimal loss function to train such a mapping. We demonstrate that the such an approach significantly improves the Jaccard index for segmentation (with a 0.65 threshold) up to 0.893. Fusing the lesion attributes for classification of lesions yields a higher accuracy compared to those without predicted attributes.

Age prediction using a large chest x-ray dataset

A. Karargyris, S. Kashyap, J. T. Wu, et al.

Show abstract

Age prediction based on appearances of different anatomies in medical images has been clinically explored for many decades. In this paper, we used deep learning to predict a person’s age on Chest X-Rays. Specifically, we trained a CNN in regression fashion on a large publicly available dataset. Moreover, for interpretability, we explored activation maps to identify which areas of a CXR image are important for the machine (i.e. CNN) to predict a patient’s age, offering insight. Overall, amongst correctly predicted CXRs, we see areas near the clavicles, shoulders, spine and mediastinum being most activated for age prediction, as one would expect biologically. As CXR is the most commonly requested imaging exam, a potential use case for estimating age may be found in the preventative counselling of patient health status compared to their age-expected average, particularly when there is a large discrepancy between predicted age and the real patient age.

Using multi-task learning to improve diagnostic performance of convolutional neural networks

Mengjie Fang, Di Dong, Ruijia Sun, et al.

Show abstract

Due to the complex biological and physical mechanisms, the correlations between the classification objects of clinical tasks and the medical imaging phenotype are always ambiguous and implied, which makes it difficult to train a powerful diagnostic convolutional neural network (CNN) model efficiently. In this study, we propose a generic multi-task learning (MTL) CNN framework to achieve higher classification accuracy and better generalization. The proposed framework is designed to carry out the major diagnostic task and several auxiliary tasks simultaneously. It encourages the models to learn more beneficial representation following the underlying relation among patients’ clinical characteristics, obvious imaging findings and quantitative imaging phenotype. We evaluate our approach on two clinical applications, namely advanced gastric cancer (AGC) serosa invasion diagnosis and discrimination of lung invasive adenocarcinoma manifesting as ground-glass nodule (GGN). Two datasets are utilized, which contain 357 AGC patients’ venous phase contrast-enhanced CT volumes and 236 GGN patients’ non-contrast CT volumes respectively. Several subjective CT morphology characteristics and common clinical characteristics are collected and used as the auxiliary tasks. To evaluate the generality of our strategy, CNNs with and without natural image-based pre-training are successively incorporated into the framework. The experimental results demonstrate that the proposed MTL CNN framework is able to improve the diagnostic performance significantly (7.4%-12.8% AUC increase and 3.5%-7.9% accuracy increase).

Stability of radiomic features of liver lesions from manual delineation in CT scans

Jan Hendrik Moltz

Show abstract

We investigate the stability of radiomic features under variations in manual delineation of liver tumors. The analysis is based on 13 CT scans with ten expert segmentations of a lesion per patient. We computed 110 firstorder, shape, and texture features using the open-source software pyradiomics and created a ranking by intra-class correlation (ICC), discarding highly correlated features. Half of the 27 remaining features have very good stability (ICC > 0.9), with features relating to size, simple texture and average intensity performing best. Elongation and kurtosis are by far the least stable features (ICC < 0.65) and should be avoided.

Use of convolutional neural networks to predict risk of masking by mammographic density

Theo Cleland, James G. Mainprize, Olivier Alonzo-Proulx, et al.

Show abstract

Sensitivity of screening mammography is reduced by increased mammographic density (MD). MD can obscure or “mask” developing lesions making them harder to detect. Predicting masking risk may be an effective tool for a stratified screening program where selected women can receive alternative screening modalities that are less susceptible to masking. Here, we investigate whether the use of artificial intelligence can accurately predict the masking risk and compare its performance to that of conventional BI-RADS density classification. The analysis was based on mammograms of 214 subjects comprised of 147 women with a screen-detected (SD) or “non-masked” cancer and 67 that developed a non-screen detected (NSD) or presumably masked cancer within 2 years following a negative screen. Prior to analysis, mammograms were pre-processed into quantitative MD maps using an in-house algorithm. A transfer learning approach was used to train a convolutional neural network (CNN) based on VGG-16 in a seven cross-fold approach to classify masking status. A two-step transfer learning method was also used where the pre-trained CNN was initially trained on 5,865 mammograms to classify by BI-RADS density category and then trained for masking status. Using BI-RADS density as a masking risk predictor has an AUC of 0.64 [0.57 - 0.71 95CI]. The CNN-mask yielded an AUC of 0.76 [0.68 - 0.81]. Combining the CNN-mask with our previous hand-crafted masking risk predictor, the AUC improved to 0.78 [0.70 - 0.83]. The combined AUC improved to 0.81 [0.72-0.90] when analysis was restricted to NSD cancers surfacing clinically within one year after a negative screen. The two-step transfer learning yielded similar performance. This work suggests that a CNN masking risk predictor can be used to guide a stratified screening program to overcome the limitations of screening mammography in dense breasts.

A novel clinical gland feature for detection of early Barrett’s neoplasia using volumetric laser endomicroscopy

Thom Scheeve, Maarten R. Struyvenberg, Wouter L. Curvers, et al.

Show abstract

Volumetric laser endomicroscopy (VLE) is an advanced imaging system offering a promising solution for the detection of early Barrett’s esophagus (BE) neoplasia. BE is a known precursor lesion for esophageal adenocarcinoma and is often missed during regular endoscopic surveillance of BE patients. VLE provides a circumferential scan of near-microscopic resolution of the esophageal wall up to 3-mm depth, yielding a large amount of data that is hard to interpret in real time. In a preliminary study on an automated analysis system for ex vivo VLE scans, novel quantitative image features were developed for two previously identified clinical VLE features predictive for BE neoplasia, showing promising results. This paper proposes a novel quantitative image feature for a missing third clinical VLE feature. The novel gland-based image feature called “gland statistics” (GS), is compared to several generic image analysis features and the most promising clinically-inspired feature “layer histogram” (LH). All features are evaluated on a clinical, validated data set consisting of 88 non-dysplastic BE and 34 neoplastic in vivo VLE images for eight different widely-used machine learning methods. The new clinically-inspired feature has on average superior classification accuracy (0.84 AUC) compared to the generic image analysis features (0.61 AUC), as well as comparable performance to the LH feature (0.86 AUC). Also, the LH feature achieves superior classification accuracy compared to the generic image analysis features in vivo, confirming previous ex vivo results. Combining the LH and the novel GS features provides even further improvement of the performance (0.88 AUC), showing great promise for the clinical utility of this algorithm to detect early BE neoplasia.

Radiomics analysis potentially reduces over-diagnosis of prostate cancer with PSA levels of 4-10 ng/ml based on DWI data

Shuaitong Zhang, Yafei Qi, Jingwei Wei, et al.

Show abstract

Prostate specific antigen (PSA) screening is routinely conducted for suspected prostate cancer (PCa) patients. As this technique might result in high probability of over-diagnosis and unnecessary prostate biopsies, controversies on it remains especially for patients with “gray-zone” PSA levels, i.e. 4-10ng/ml. To improve the risk stratification of suspected PCa patients, Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) was released in 2015. Although PI-RADSv2 showed good performance in the detection of PCa, its specificity was relatively low for patients with gray-zone PSA levels. This indicated that over-diagnosis issue could not be dealt well by PI-RADSv2 in the gray zone. Addressing this, we attempted to validate whether radiomics analysis of Diffusion weighted Imaging (DWI) data could reduce over-diagnosis of PCa with gray-zone PSA levels. Here, 140 suspected PCa patients in Peking Union Medical College Hospital were enrolled. 700 radiomic features were extracted from the DWI data. Least absolute shrinkage and selection operator (LASSO) were conducted, and 7 radiomic features were selected on the training set (n=93). Based on these features, random forest classifier was used to build the Radiomics model, which performed better than PI-RADSv2 (area under the curve [AUC]: 0.900 vs 0.773 and 0.844 vs 0.690 on the training and test sets). Furthermore, the specificity values of Radiomics model and PI-RADSv2 was 0.815 and 0.481 on the test set, respectively. In conclusion, radiomics analysis of DWI data might reduce the over-diagnosis of PCa with gray-zone PSA levels.

Homogenization of breast MRI across imaging centers and feature analysis using unsupervised deep embedding

Ravi K. Samala, Heang-Ping Chan, Lubomir Hadjiiski, et al.

Show abstract

We propose an intensity-based technique to homogenize dynamic contrast-enhanced magnetic resonance imaging (DCEMRI) data acquired at six institutions. A total of 234 T1-weighted MRI volumes acquired at the peak kinetic curve were obtained for study of the homogenization and unsupervised deep-learning feature extraction techniques. The homogenization uses reference regions of adipose breast tissue since they are less susceptible to variations due to cancer and contrast medium. For the homogenization, the moments of the distribution of reference pixel intensities across the cases were matched and the remaining intensity distributions were matched accordingly. A deep stacked autoencoder with six convolutional layers was trained to reconstruct a 128×128 MRI slice and to extract a latent space of 1024 dimensions. We used the latent space from the stacked autoencoder to extract deep embedding features that represented the global and local structures of the imaging data. An analysis using spectral embedding of the latent space shows that, before homogenization the dominating factor was the dependency on the imaging center; after homogenization the histograms of the cases between different centers were matched and the center dependency was reduced. The results of feature analysis indicate that the proposed homogenization approach may lessen the effects of different imaging protocols and scanners in MRI, which may then allow more consistent quantitative analysis of radiomic information across patients and improve the generalizability of machine learning methods across different clinical sites. Further study is underway to evaluate the performance of machine learning models with and without image homogenization.

Shape variation analyzer: a classifier for temporomandibular joint damaged by osteoarthritis

Nina Tubau Ribera, Priscille de Dumast, Marilia Yatabe, et al.

Show abstract

We developed a deep learning neural network, the Shape Variation Analyzer (SVA), that allows disease staging of bony changes in temporomandibular joint (TMJ) osteoarthritis (OA). The sample was composed of 259 TMJ CBCT scans for the training set and 34 for the testing dataset. The 3D meshes had been previously classified in 6 groups by 2 expert clinicians. We improved the robustness of the training data using data augmentation, SMOTE, to alleviate over-fitting and to balance classes. We combined geometrical features and a shape descriptor, heat kernel signature, to describe every shape. The results were compared to nine different supervised machine learning algorithms. The deep learning neural network was the most accurate for classification of TMJ OA. In conclusion, SVA is a 3D Slicer extension that classifies pathology of the temporomandibular joint osteoarthritis cases based on 3D morphology.

Automatic detection and localization of bone erosion in hand HR-pQCT

Jintao Ren, Arash Moaddel H., Ellen M. Hauge, et al.

Show abstract

Rheumatoid arthritis (RA) is an inflammatory disease which afflicts the joints with arthritis and periarticular bone destruction as a result. One of its central features is bone erosion, a consequence of excessive bone resorption and insufficient bone formation. High-resolution peripheral quantitative computed tomography (HR-pQCT) is a promising tool for monitoring RA. Quantification of bone erosions and detection of possible progression is essential in the management of treatment. Detection is performed manually and is a very demanding task as rheumatologists must annotate hundreds of 2D images and inspect any region of the bone structure that is suspected to be a sign of RA. We propose a 2D based method which combines an accurate segmentation of bone surface boundary and classification of patches along the surface as healthy or eroded. We use a series of classical image processing methods to segment CT volumes semi-automatically. They are used as training data for a U-Net. We train a Siamese net to learn the difference between healthy and eroded patches. The Siamese net alleviates the problem of highly imbalanced class labels by providing a base for one-shot learning of differences between patches. We trained and tested the method using 3 full HR-pQCT scans with bone erosion of various size. The proposed pipeline succeeded in classifying healthy and eroded patches with high precision and recall. The proposed algorithm is a preliminary work to demonstrate the potential of our pipeline in automating the process of detecting and locating the eroded regions of bone surfaces affected by RA.

Spinal vertebrae segmentation and localization by transfer learning

Jiashi Zhao, Zhengang Jiang, Kensaku Mori, et al.

Show abstract

Spine curvature disorders have been found relevant as the nervous system diseases and may produce serious disturbances of the whole body. The ability to automatically segment and locate the spinal vertebrae is, therefore, an important task for modern studies of the spinal curvature disorders detection. In this work, we devise a modern, simple and automated human spinal vertebrae segmentation and localization method using transfer learning, that works on CT and MRI acquisitions. We exploit pre-trained models to spinal vertebrae segmentation and localization problem. We first explore and evaluate different medical imaging architectures and choose the deep dilated convolutions as the initialization for our spinal vertebrae segmentation and localization task. Then we conduct the pre-trained model from spinal cord gray matter dataset to our spinal vertebrae segmentation task with supervised fine-tuning. The vertebral centroid coordinate can be computed from the segmented result, and the centroid localization error is used as the feedback for fine-tuning. We evaluate our method against traditional method on medical image segmentation and localization task and report the comparison of evaluation metrics. We show the qualitative and quantitative evaluation on spine CT images which are from spine CT volumes on the publicity platform SpineWeb. The evaluation results show that our approach was able to capture many properties of the spinal vertebrae, and provided good segmentation and localization performance. From our research we show that the deep dilated convolutions pre-trained on MRI spinal cord gray matter images can be transfer to process CT spinal vertebrae images.

Ensembles of sparse classifiers for osteoporosis characterization in digital radiographs

Keni Zheng, Rachid Jennane, Sokratis Makrogiannis

Show abstract

The analysis and characterization of imaging patterns is a significant research area with several applications to biomedicine, remote sensing, homeland security, social networking, and numerous other domains. In this paper we study and develop mathematical methods and algorithms for disease diagnosis and tissue characterization. The central hypothesis is that we can predict the occurrence of diseases with a certain level of confidence using supervised learning techniques that we apply to medical imaging datasets that include healthy and diseased subjects. We develop methods for calculation of sparse representations to classify imaging patterns and we explore the advantages of this technique over traditional texture-based classification. We introduce integrative sparse classifier systems that utilize structural block decomposition to address difficulties caused by high dimensionality. We propose likelihood functions for classification and decision tuning strategies. We performed osteoporosis classification experiments on the TCB challenge dataset. TCB contains digital radiographs of the calcaneus trabecular bone of 87 healthy and 87 osteoporotic subjects. The scans of healthy and diseased subjects show little or no visual differences, and their density histograms have significant overlap. We applied 30-fold crossvalidation to evaluate the classification performances of our methods, and compared them to a texture based classification system. Our results show that ensemble sparse representations of imaging patterns provide very good separation between groups of healthy and diseased subjects and perform better than conventional sparse and texture-based techniques.

Multiclass vertebral fracture classification using ensemble probability SVM with multi-feature selection

Liyuan Zhang, Jiashi Zhao, Huamin Yang, et al.

Show abstract

Lumbar vertebral fracture seriously endangers the health of people, which has a higher mortality. Due to the tiny difference among various fracture features in CT images, multiple vertebral fractures classification has a great challenge for computer-aided diagnosis system. To solve this problem, this paper proposes a multiclass PSVM ensemble method with multi-feature selection to recognize lumbar vertebral fractures from spine CT images. In the proposed method, firstly, the active contour model is utilized to segment lumbar vertebral bodies. It is helpful for the subsequent feature extraction. Secondly, different image features are extracted, including 3 geometric shape features, 3 texture features, and 5 height ratios. The importance of these features is analyzed and ranked by using infinite feature selection method, thus selecting different feature subsets. Finally, three multiclass probability SVMs with binary tree structure are trained on three datasets. The weighted voting strategy is used for the final decision fusion. To validate the effectiveness of the proposed method, probability SVM, K-nearest neighbor, and decision tree as base classifiers are compared with or without feature selection. Experimental results on 25 spine CT volumes demonstrate that the advantage of the proposed method compared to other classifiers, both in terms of the classification accuracy and Cohen’s kappa coefficient.

Cranial localization in 2D cranial ultrasound images using deep neural networks

Pooneh R. Tabrizi, Awais Mansoor, Rawad Obeid, et al.

Show abstract

Premature neonates with intraventricular hemorrhage (IVH) followed by post hemorrhagic hydrocephalus (PHH) are at high risk for morbidity and mortality. Cranial ultrasound (CUS) is the most common imaging technique for early diagnosis of PHH during the first weeks after birth. Head size is one of the important indexes in the evaluation of PHH with CUS. In this paper, we present an automatic cranial localization method to help head size measurement in 2D CUS images acquired from premature neonates with IVH. We employ deep neural networks to localize the cranial region and minimum area bounding box. Separate deep neural networks are trained to detect the space parameters (position, scale, and orientation) of the bounding box. We evaluated the performance of the method on a set of 64 2D CUS images obtained from premature neonates with IVH through five-fold cross validation. Experimental results showed that the proposed method could estimate the cranial bounding box with the center point position error value of 0.33 ± 0.32 mm, the orientation error value of 1.75 ± 1.31 degrees, head height relative error (RE) value of 1.62 ± 2.9 %, head width RE value of 1.22 ± 1.24 %, head surface RE value of 2.27 ± 3.04 %, average Dice similarity score of 0.97 ± 0.01, and Hausdorff distance of 0.69 ± 0.46 mm. The method is computationally efficient and has the potential to provide automatic head size measurement in the clinical evaluation of neonates.

Learning imbalanced semantic segmentation through cross-domain relations of multi-agent generative adversarial networks

Mina Rezaei, Haojin Yang, Christoph Meinel

Show abstract

Inspired by the recent success of generative adversarial networks (GANs), we propose a multi-agent GANs, named 3DJoinGANs, for handling imbalanced training data for the task of semantic segmentation. Our proposed method comprises two conditional GANs with four agents: a couple segmentors and a couple discriminators. The proposed framework learns a joint distribution of magnetic resonance images (MRI) and computed tomography images (CT) from different brain diseases by enforcing a weight-sharing constraint. While the first segmentor is trained on 3D multi-model MRI to learn semantic segmentation of a brain tumor(s), the first discriminator classifies whether predicted output by segmentor is real or fake. On the other hand, the second segmentor takes 3D multi modal CT images to learn segmentation of brain stroke lesions, and the second discriminator classifies between a segmented output by segmentor and a ground truth data annotated by an expert. We investigate, the 3DJoinGANs is able to mitigate imbalanced data problems and improve segmentation results due to oversampling and training through a joint distribution of cross-domain images.The proposed architecture has shown promising performance on the ISLES-2018 benchmark for segmentation of 3D multi modal ischemic stroke lesions and semantic segmentation of 3D multi modal brain tumors from the BraTS-2018 challenge.

Spatial and depth weighted neural network for diagnosis of Alzheimer’s disease

Qingfeng Li, Quan Huo, Xiaodan Xing, et al.

Show abstract

Objective and efficient diagnosis of Alzheimer’s disease (AD) has been a major topic with extensive researches in recent years, and some promising results have been shown for imaging markers using magnetic resonance imaging (MRI) data. Beside conventional machine learning methods, deep learning based methods have been developed in several studies, where layer-by-layer neural network settings were purposed to extract features for disease classification from the patches or whole images. However, as the disease develops from subcortical nuclei to cortical regions, specific brain regions with morphological changes might contribute to the diagnosis of disease progress. Therefore, we propose a novel spatial and depth weighted neural network structure to extract effective features, and further improve the performance of AD diagnosis. Specifically, we first use group comparison to detect the most distinctive AD-related landmarks, and then sample landmark-based image patches as our training data. In the model structure, with a 15-layer DenseNet as backbone, we introduce a attention bypass to estimate the spatial weights in the image space to guide the network to focus on specific regions. A squeeze-and-excitation (SE) mechanism is also adopted to further weight the feature map channels. We used 2335 subjects from public datasets (i.e., ADNI-1, ADNI-2 and ADNI-GO) for experiment and results show that our framework achieves 90.02% accuracy, 81.25% sensitivity, and 96.33% specificity in diagnosis AD patients from normal controls.

Study on discrimination of Alzheimer’s disease states using an ensemble neural network’s model

Junsik Eom, Hanbyol Jang, Sewon Kim, et al.

Show abstract

Alzheimer’s Disease (AD) is an irreversible disease that gradually worsens with time. Therefore, early diagnosis of Alzheimer’s disease is important to prevent brain tissue damage and treat the patient properly. Mild Cognitive Impairment (MCI) is a prodromal stage of AD, which has no harm to the patient’s ability to have functional activities in daily life except a minor cognitive deficiency. Since MCI can be detected at the earliest stage of AD, it is critical to detect patients with MCI to delay the progression of AD. It is possible to distinguish patients with AD, MCI, and Normal Control (NC) from one another by the size of brain volume, hippocampus and patient’s clinical information. The brain and hippocampus gradually shrink in size and shape as AD develops. In this study, we propose a deep learning-based technique to classify patients with AD, MCI and NC by brain Magnetic Resonance (MR) images. Deep learning has shown human-level performance in a lot of studies including medical image analysis with constrained amount of training data. We propose a deep learning-based ensemble model which consists of 3 Convolutional Neural Networks (CNN) [1] with Network In Network (NIN) [2] architecture. The kernel size is 3x3 convolution followed by 1x1 convolution to reduce the number of trainable parameters and extract features for classification better. In addition, Global Averaging Pooling (GAP) is used instead of Fully-Connected (FC) layers to avoid overfitting by reducing the number of trainable parameters. By using the ensemble model, this shows the 81.66% in classifying 3 classes.

Feasibility study of deep neural networks to classify intracranial aneurysms using angiographic parametric imaging

Mohammad Mahdi Shiraz Bhurwani, Alexander R. Podgorsak, Anusha Ramesh Chandra, et al.

Show abstract

Purpose: Angiographic Parametric Imaging (API) based on Digital Subtraction Angiography (DSA) of Intracranial Aneurysms (IA) can provide parameters related to contrast flow. In this study we propose to investigate the use of a Deep Neural Network (DNN) to analyze API parameters to classify IAs as un-treated or treated, quantify the prediction accuracy, and compare its performance with the Naïve Bayes (NB) and K-Nearest Neighbor (KNN) algorithms. Materials and Methods: DSA scans were obtained from patients with un-treated and treated IAs. Three datasets were created based on treatment method: coiled, flow-diverted and combined. These scans were analyzed to provide API parameters for the IA and corresponding main artery. IA parameters were normalized to the main artery parameters. Data was augmented by adding Gaussian noise. The DNN, NB and KNN models were trained on API parameters and tested to classify aneurysms as un-treated or treated. This was performed on each dataset for both normalized and un-normalized data. Results: The DNN had an accuracy and ROC AUC of 72.4% and 0.80 respectively on un-normalized coiled data, 87.9% and 0.95 respectively on normalized coiled data, 73.9% and 0.79 respectively on un-normalized flow-diverted data, 85.3% and 0.80 respectively on normalized flow-diverted data, 62.9% and 0.64 respectively on un-normalized combined data, 64.8% and 0.73 respectively on normalized combined data. Conclusions: This study proves feasibility of using DNNs to classify IAs and make other clinical predictions using normalized API data with treatment methods separated, in addition to being more effective than other classifiers.

Non-invasive genotype prediction of chromosome 1p/19q co-deletion by development and validation of an MRI-based radiomics signature in lower-grade gliomas

Yuqi Han, Zhen Xie, Yali Zang, et al.

Show abstract

To pre-operatively and non-invasively predict 1p/19q co-deletion in grade II and III (lower-grade) glioma based on a radiomics method using magnetic resonance imaging (MRI). We obtained 105 patients pathologically diagnosed with lower-grade glioma. We extracted 647 MRI-based features from T2-weighted images and selected discriminative features by lasso logistic regression approaches on the training cohort (n=69). Radiomics, clinical, and combined models were constructed separately to verify the predictive performance of the radiomics signature. The predictability of the three models were validated on a time-independent validation cohort (n = 36). Finally, 7 discriminative radiomic features were used constructed radiomics signature, which demonstrated satisfied performance on both the training and validation cohorts with AUCs of 0.822 and 0.731, respectively. Particularly, the combined model incorporating the radiomics signature and the clinic-radiological factors achieved the best discriminative capability with AUCs of 0.911 and 0.866 for training and validation cohorts, respectively.

Diagnosis of OCD using functional connectome and Riemann kernel PCA

Xiaodan Xing, Lili Jin, Feng Shi, et al.

Show abstract

Obsessive-compulsive disorder (OCD) is a mental disorder characterized by repeated thoughts or behaviors, which is also associated with anxiety and tics. Clinically, the diagnosis of OCD mainly depends on subjects symptoms and psychological rating scales. In this study, we proposed an imaging based diagnosis method using functional MRI to classify OCD patients and healthy controls, with a novel log Euclidean based kernel Principal Component Analysis (PCA) as feature extractor. In particular, functional connectivity (FC) matrix was computed for each subject as the FC correlations of each pair of brain regions of interest. To better reduce feature dimension and extract the most discriminative features, we propose to use log Euclidean geodesic distance as the distance of two matrices and apply a Gaussian kernel PCA to FC matrix for feature extraction, given the graph Laplacian matrix of a FC matrix is symmetric positive define (SPD) matrix and the set of SPD matrix forms a Riemannian manifold. We further employed gradient boosted decision trees (XGBoost) to classify the features extracted from log Euclidean based kernel PCA to diagnosis patient groups. Results show that the classification accuracy reaches 91.8% with 90.7% sensitivity and 92.6% specificity, which outperforms current start-of-the-art imaging based diagnosis methods such as 85% in an EEG study. Next, by evaluating the feature importance in the classifier, we found that most contributed connections are cerebellum related, such as cerebellar vermis. These findings may help the understanding of pathology of OCD and provide a surrogate means for clinical diagnosis.

Evaluation of U-net segmentation models for infarct volume measurement in acute ischemic stroke: comparison with fixed ADC threshold-based methods

Yoon-Chul Kim, Ji-Eun Lee, Inwu Yu, et al.

Show abstract

Ischemic stroke volume is a strong predictor of functional outcome and may play a role in decision making of reperfusion therapy in the late time window (< 6hr of stroke onset to MRI time) when it is obtained along with penumbra volume. Automatic diffusion lesion segmentation can be performed using a commercial software package and is typically based on a fixed apparent diffusion coefficient (ADC) threshold. ADC values alone may not be guaranteed to be highly accurate in the identification of diffusion lesions. Deep learning has the potential to improve the accuracy of diffusion lesion segmentation, provided that a large set of correctly labeled lesion mask data is used for training. The purpose of this study is to evaluate deep learning-based segmentation methods and compare them with three fixed ADC threshold-based methods. U-net was adopted to train a segmentation model. Two U-net models were developed: a model “U-net (DWI+ADC)” trained from DWI and ADC data, and a model “U-net (DWI)” trained from DWI data only. 296 subjects were used for training, and 134 subjects were used for testing. An expert neurologist manually delineated infarct masks on DWI, which served as ground-truth reference. Lesion volume measurements from the two U-net methods and three fixed ADC threshold-based methods were compared against lesion volume measurements from manual segmentation. In testing, the “U-net (DWI+ADC)” method outperformed other methods in lesion volume measurement, with the smallest root-mean-square error of 2.96 ml and the highest Pearson correlation coefficient of 0.997. The proposed method has the potential to automatically measure diffusion lesion volume in a fast and accurate manner, in patients with acute ischemic stroke.

Multi-path deep learning model for automated mammographic density categorization

Xiangyuan Ma, Caleb Fisher, Jun Wei, et al.

Show abstract

Breast density is one of the strongest risk factors for breast cancer. Our purpose of this study is to develop a deep learning model for BI-RADS density classification on digital mammograms (DM). With IRB approval, 2581 DMs were retrospectively collected from 672 women in our institution. We designed a multi-path DCNN (MP-DCNN) to classify each DM into one of four BI-RADS density categories. The MP-DCNN has four inputs: (1) subsampled DM (800 μm pixel spacing), (2) a mask of dense area (MDA) obtained with a U-net (800 μm pixel spacing), (3) the largest square region of interest (ROI) within mammographic breast (100 μm pixel spacing), and (4) automated percentage of breast density (PD). As the baseline statistic, a single path DCNN with subsampled DM (800 um pixel spacing) as input was used. An experienced Mammography Quality Standards Act (MQSA) radiologist provided BI-RADS density category and PD by interactive thresholding as the reference standards. With ten-fold cross-validation, the BI-RADS categories by MP-DCNN for 2068 of the 2581 cases agreed with radiologist’s assessment (accuracy = 80.7%, weighted kappa = 0.83) and the accuracy reached 89.0% if the breasts were categorized as non-dense (BI-RADS A & B) and dense (BIRADS C & D). For comparison, a single path DCNN as the baseline model obtained agreement in 1906 of the 2581 cases (accuracy = 73.8%, weighted kappa = 0.75). The improvement in BI-RADS classification from the baseline to the MP-DCNN was statistically significant (p<0.001).

Exploratory learning with convolutional autoencoder for discrimination of architectural distortion in digital mammography

Helder C. R. de Oliveira, Carlos F. E. Melo, Juliana H. Catani, et al.

Show abstract

This work presents a deep learning approach based on autoencoder to improve the detection of architectural distortion (AD) in digital mammography. AD can be the earliest sign of breast cancer, appearing before the formation of any mass or calcification. However, it is very diffcult to be detected and almost 50% of the cases are missed by the radiologists. Thus, we designed an autoencoder, based on a convolutional neural network (CNN), to work as a feature descriptor in a computer-aided detection (CAD) pipeline with the objective of detecting AD in digital mammography. This model was trained with 140,000 regions-of-interest (ROI) extracted from clinical mammograms. These samples were divided in two groups, with and without AD, according to the radiologist's report. Validation was done comparing the classifier performance when using the proposed autoencoder and other well-known feature descriptors, commonly used for the task of detecting AD in digital mammograms. The results showed that the performance of the autoencoder is slightly higher than that of other descriptors. However, the complexity and the computational cost of the autoencoder is much higher when compared to the hand-crafted descriptors.

Computationally-efficient wavelet-based characterization of breast tumors using conventional B-mode ultrasound images

Manar N. Mahmoud, Muhammad A. Rushdi, Iman Ewais, et al.

Show abstract

Breast cancer is among the leading causes of mortality in women worldwide. Early detection can increase the survival rate and limit cancer metastasis to other organs. Recently, the role of ultrasound (US) imaging in the diagnosis and monitoring of breast tumors besides X-ray mammography has been increasing. Several computer-aided diagnosis (CAD) systems were proposed to improve the classification of breast tumors. This work presents a fast and computationally-efficient technique to distinguish between malignant and benign breast tumors. The technique applies wavelet packet transform (WPT) on conventional brightness mode (B-mode) US images, and then extracts several textural and morphological features from the approximation decomposition part. Features include first-order statistics (FOS), fractal dimension texture analysis (FDTA), spatial gray-level dependence matrices (SGLDM), area, perimeter, and compactness of the lesion. When the support vector machine was applied for classification on original US images, the classifier exhibited 97.4% accuracy, 98.3% sensitivity, and 92.1% specificity. These performance parameters were slightly changed to 96.9% accuracy, 96.7% sensitivity, and 97% specificity, when the same features were extracted from WPT. However, the frame classification time was reduced drastically from 1.1284s using original US images to 0.0604s after incorporating WPT. Hence, the proposed CAD system using WPT was able to decrease the computational complexity and processing time by, at least, eight times. This shall improve the early detection of breast cancer via developing real-time and noninvasive computer-aided diagnostic software.

Breast dispersion imaging using undersampled rapid dynamic contrast-enhanced MRI

Linxi Shi, Subashini Srinivasan, Brian Hargreaves, et al.

Show abstract

Purpose: Rapid dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) enables the tracking of rapid contrast accumulation, which is an important indication for cancer angiogenesis. Conventional pharmacokinetic models focusing on evaluating microvascular perfusion have limited abilities in detecting these features. In this work, we explore the performance of a novel dispersion pharmacokinetic model in discriminating benign and malignant tumor tissues and compare that with conventional non-dispersion methods. Methods: According to the convective-dispersion equation, the microvascular architecture changes can be explained using dispersion parameters. The dispersion maps are estimated by fitting a modified local density random walk (mLDRW) dispersion model to the concentration-time curves (CTC) in a voxel-by-voxel level. Measurement of an arterial input function is no longer required. We compare the fitting performance of this model with three classic non-dispersion pharmacokinetic models (i.e., Tofts, extended Tofts and comprehensive 2 compartment exchange model (2CXM)) that are commonly used for tumor characterization. The performance in discriminating benign and malignant tumors for dispersion and non-dispersion parameter maps are compared using receiver operating characteristic curve (ROC). Evaluation study is performed on 60 tumors that are acquired from 37 patients. Results: The goodness-of-fit is significantly improved with mLDRW model. Comparing to non-dispersion parameter maps, the dispersion related parameter maps provide the highest area under the ROC (AUC) of 0.96 with a sensitivity of 84.7 and specificity of 90.5. Conclusion: In this work, we provide a new window to investigate the physiology of breast tumor microcirculation through the estimation of intravascular dispersion property. The dispersion related parameter demonstrates superior performance in discriminating benign and malignant tumors.

Deep Learning approach predicting breast tumor response to neoadjuvant treatment using DCE-MRI volumes acquired before and after chemotherapy

Mohammed El Adoui, Mohamed Amine Larhmam, Stylianos Drisis, et al.

Show abstract

Purpose: In breast cancer medical follow-up, due to the lack of specialized aided diagnosis tools, many breast cancer patients may continue to receive chemotherapy even if they do not respond to the treatment. In this work, we propose a new approach for early prediction of breast cancer response to chemotherapy from two follow-up DCE-MRI exams. We present a method that takes advantage of a deep convolutional neural network (CNN) model to classify patients who are responsive or non-responsive to chemotherapy.

Methods and material: To provide an early prediction of breast cancer response to chemotherapy, we used a two branch Convolution Neural Network (CNN) architecture, taking as inputs two breast tumor MRI slices acquired before and after the first round of chemotherapy. We trained our model on a 693 x 2 ROIs belonging to 42 patients with local breast cancer. Image pretreatment, volumetric image registration and tumor segmentation were applied to MRI exams as a preprocessing step. As a ground truth, we used the anapathological standard reference provided of each patient.

Results: Within 80 training epochs, an accuracy of 92.72% was obtained using 20% as validation data. The Area Under the Curve (AUC) was 0.96.

Conclusion: In this paper, it was demonstrated that deep CNNs models can be used to solve breast cancer follow-up related problems. Therefore, the model obtained in this work can be exploited in future clinical applications after improving its efficiency with the used data.

Computer-aided detection and classification of microcalcification clusters on full field digital mammograms using deep convolution neural network

Guanxiong Cai, Yanhui Guo, Weiguo Chen, et al.

Show abstract

Breast cancer is presently one of the most common cancer among women and has high morbidity and mortality worldwide. The emergence of microcalcifications (MCs) is an important early sign of breast cancer. In this study, a computer-aided detection and diagnosis (CAD) system is developed to automatically detect MC clusters (MCCs) and further providing cancer likelihood prediction. Firstly, each individual MC is detected using our previously designed MC detection system, which includes preprocessing, MC enhancement, MC candidate detection, false positive (FP) reduction of MCs and regional clustering procedures. Secondly, a deep convolution neural network (DCNN) is trained on 394 clinical high-resolution full field digital mammograms (FFDMs) containing biopsy-proven MCCs to discriminate MCC lesions. For cluster-based detection evaluation, a 90% sensitivity is obtained with a FP rate of 0.2 FPs per image. The classification performance of the whole system is validated on 70 cases and tested on 71 cases, and for case-based diagnosis evaluation, the area under the receiver operating characteristic curve (AUC) on validation and testing sets are 0.945 and 0.932, respectively. Different from previous literatures committing to finding and selecting effective features, the proposed method replaces manual feature extraction step by using deep convolution neural network. The obtained results demonstrate that the proposed method is effective in the automatically detection and classification of MCCs.

Associations between mammographic phenotypes and histopathologic features in ductal carcinoma in situ

Ruvini Navaratna, Aimilia Gastounioti, Meng-Kang Hsieh, et al.

Show abstract

With the advent of regular breast screening, ductal carcinoma in situ (DCIS) diagnoses have risen in number, now making up almost 20% of all detected breast cancers at screening. Women diagnosed with DCIS are almost universally treated. However, recent studies suggest that up to 70% of DCIS lesions will never become life-threatening, which emphasizes the need for better risk stratification strategies. Considering that histopathologic features have been shown to be predictive of DCIS aggressiveness, our aim was to study associations between DCIS histopathologic features and mammographic phenotypes towards identifying readily-available mammography-based prognostic biomarkers. To this end, breast density and parenchymal texture features were extracted from screening digital mammograms and principal component analysis was used to capture the dominant textural components. Primary analyses included statistical tests to compare feature distributions between histopathologic subgroups. Logistic regression models were, then, applied to evaluate trends in DCIS histopathologic characteristics among mammographic features, after adjustment for risk factors known to affect mammographic phenotypes. We found that HER2 had a significant association with breast percent density (p = 0.006) and the first principal component (PC1) of texture features (p = 0.034). Our risk-factor-adjusted logistic regression analyses showed that breast percent density was predictive of HER2 status (AUC = 0.71), while prediction performance was further increased when PC1 was added to the model (AUC = 0.74). These findings provide preliminary evidence about the potential value of mammographic phenotypes in prediction of DCIS aggressiveness and could ultimately contribute to identifying patients who do not require treatment.

Synthesis and texture manipulation of screening mammograms using conditional generative adversarial network

Dehan Kong, Yinhao Ren, Rui Hou, et al.

Show abstract

Annotated data availability has always been a major limiting f actor for the development of algorithms in the field of computer aided diagnosis. The purpose of this study is to investigate the feasibility of using a conditional generative adversarial network (GAN) to synthesize high resolution mammography images with semantic control. We feed a binary mammographic texture map to the generator to synthesize a full-field digital-mammogram (FFDM). Our results show the generator quickly learned to grow anatomical details around the edges within the texture mask. However, we found the training unstable and the quality of generated images unsatisfactory due to the inherent limitation of latent space and sample space mapping by the pix2pix framework. In order to synthesize high resolution mammography images with semantic control, we identified the critical challenge is to build the efficient mappings of binary textures with a great variety of pattern realizations with the image domain.

Breast MRI radiomics for the pre-treatment prediction of response to neoadjuvant chemotherapy in node-positive breast cancer patients

Karen Drukker, Iman El-Bawab, Alexandra Edwards, et al.

Show abstract

The purpose of this study was to evaluate breast MRI radiomics in predicting, prior to any treatment, the response to neoadjuvant chemotherapy (NAC) in patients with invasive lymph node-positive breast cancer for 2 tasks: 1) prediction of pathologic complete response and 2) prediction of post- NAC lymph node (LN) status. Our study included 158 patients with 19 showing post-NAC complete pathologic response (pathologic TNM stage T0,N0,MX) and 139 showing incomplete response, 42 patients were post-NAC LN-negative and 116 were post-NAC LN-positive. Only pre-NAC MRIs underwent computer analysis, initialized by an expert breast radiologist indicating index cancers and metastatic axillary sentinel lymph nodes on DCE-MRI images. Forty-nine radiomic features were extracted, both for the primary cancers and for the metastatic sentinel lymph nodes. Since the dataset contained MRIs acquired at 1.5T and at 3.0T, we eliminated features affected by magnet strength as demonstrated in the Mann-Whitney U-test by rejection of the null-hypothesis that samples were selected from populations having the same distribution. ROC analysis was used to assess performance of individual features in the 2 classification tasks. Eighteen features appeared unaffected by magnet strength, of which only a single pre-NAC tumor feature outperformed random guessing in predicting pathologic complete response. On the other hand, 13 and 10 pre-NAC lymph node features were able to predict pathologic complete response and post-NAC LN-status, respectively, with as most promising feature the standard deviation within the LN at the first postcontrast DCE-MRI time-point (areas under the ROC curve: 0.79 (standard error 0.06) and 0.70 (0.05), respectively).

Developing a new quantitative imaging marker to predict pathological complete response to neoadjuvant chemotherapy

Faranak Aghaei, Alan B. Hollingsworth, Seyedehnafiseh Mirniaharikandehei, et al.

Show abstract

Neoadjuvant (NAT) chemotherapy is a standard treatment option for many breast cancer patients. Patients who achieved pathologic complete response (pCR) after NAT, usually have better prognosis than those without. Thus, prediction of pathologic response is an important clinical issue for breast cancer patient. The purpose of this study is to develop and analyze a new computer-aided detection (CAD) and machine learning scheme using the quantitative kinetic and texture based image features extracted from breast magnetic resonance imaging (MRI) performed before and after NAT chemotherapy to predict the pCR. The images of 153 breast cancer patients underwent NAT were included in the analytical dataset. Among them, 52 achieved pCR and 101 cases were non-pCR after the NAT. A CAD scheme was developed to segment breast region and compute a total of 38 kinetic and texture features from the segmented breast regions. An image feature reduction method was used to identify 8 optimal features from the original feature pool. Then, a fine Gaussian support vector machine (FGSVM) based classifier was used to classify the two categories of pCR and non-pCR cases, which were optimized and tested using a ten-fold cross validation method. The results indicated that using features extracted from the post-chemotherapy yielded the higher area under receiver operating characteristic curves (AUC) of 0.81±0.04 and accuracy of 82% compared to using pre-chemotherapy images. This study demonstrated that image features extracted from breast MR images acquired after the NAT chemotherapy have good potential in prediction of pathology complete response.

Deep learning of sub-regional breast parenchyma in mammograms for localized breast cancer risk prediction

Giacomo Nebbia, Aly Mohamed, Ruimei Chai, et al.

Show abstract

Breast cancer risk prediction refers to the task of predicting whether a healthy patient is likely to develop breast cancer in the future. Breast density and parenchymal texture features are well-known imaging-based breast cancer risk markers that can be qualitatively/visually assessed by radiologists or even quantitatively measured by computerized software. Recently, deep learning has emerged as a promising strategy to solve tasks in a variety of classification and prediction scenarios, including breast imaging. Building on this premise, we propose a deep learning-based modeling method for breast cancer risk prediction in a case-control setting purely using prior normal screening mammogram images. In addition, considering the fact that clinical statistics shows that the upper outer quadrant is the most common site of origin for breast cancer, we designed a simple experiment on 226 patients (a total of 1,632 images) to explore the concept of localized breast cancer risk prediction. We built two deep learning models with the same settings but fed one with the top halves of the mammogram images (corresponding to the outer portion of a breast) and the other with the bottom halves (corresponding to the inner portion of a breast). Our preliminary results showed that the top halves have a higher prediction performance (AUC=0.89) than the bottom halves (AUC=0.69) in predicting the case/control outcome. This indicates a relation between localized imaging features extracted from a sub-region of the full mammogram images and the underlying risk of developing breast cancer in this specific sub-region.

Malignant microcalcification clusters detection using unsupervised deep autoencoders

Rui Hou, Yinhao Ren, Lars J. Grimm, et al.

Show abstract

Detection and localization of microcalcification (MC) clusters are very important in mammography diagnosis. Supervised MC detectors require learning from extracted individual MCs and MC clusters. However, they are limited by number of datasets given that MC images are hard to obtain. In this work, we propose a method to detect malignant microcalcification (MC) clusters using unsupervised, one-class, deep convolutional autoencoder. Specifically, we designed a deep autoencoder model where only patches extracted from normal cases’ mammograms are used during training. We then applied our trained model on patches extracted from testing images. Our training dataset contains 408 normal subjects, including 1961 full-field digital mammography images. Our testing datasets contains 276 subjects. Specifically, 106 of them were patients diagnosed with Ductal Carcinoma In-Situ (DCIS); 70 of them were diagnosed with Invasive Ductal Carcinoma (IDC); the rest 100 are normal cases containing 484 negative screening mammograms. Patches extracted from DCIS and IDC cases (positive patches) contain MC clusters, whereas patches extracted from normal cases (negative patches) don’t. As the model is trained only on negative images that do not contain MCs, it cannot reconstruct MCs well, and thus, the reconstruction error will be larger on positive patches than negative patches. Our detection algorithm’s decision is made based on Max-Squared Error between autoencoder’s input and output patches. To confirm the results were not simply due to blurring, we then compared our designed detector with unsharp mask with Gaussian blur results. The results using the unsupervised autoencoder on testing patches with size 64×64 achieves an AUC result of 0.93. The best performance on testing patches using Gaussian blur with kernel size equal to 11has an overall AUC of 0.82.

Automated deep-learning method for whole-breast segmentation in diffusion-weighted breast MRI

Lei Zhang, Ruimei Chai, Aly A. Mohamed, et al.

Show abstract

The essential sequences in breast magnetic resonance imaging (MRI) are the dynamic contrast-enhanced (DCE) images, which are widely used in clinical settings. Diffusion-weighted imaging (DWI) MRI also plays an important role in many diagnostic applications and in developing novel imaging bio-makers. Compared to DCE MRI, technical advantages of DWI include a shorter acquisition time, no need for administration of any contrast agent, and availability on most commercial scanners. Segmenting the whole-breast region is an essential pre-processing step in many quantitative and radiomics breast MRI studies. However, it is a challenging task for computerized methods due to the low contrast of intensity along breast chest wall boundaries. While several studies have reported computational methods for automated whole-breast segmentation in DCE MRI, the segmentation in DWI MRI is still underdeveloped. In this paper, we propose to use deep learning and transfer learning methods to segment the whole-breast in DWI MRI, by leveraging pretraining on a DCE MRI dataset. Experiments are reported in multiple breast MRI datasets including an external evaluation dataset and encouraging results are demonstrated.

A shell and kernel descriptor based joint deep learning model for predicting breast lesion malignancy

Zhiguo Zhou, Genggeng Qin, Pingkun Yan, et al.

Show abstract

Predicting lesion malignancy accurately and reliably in digital breast tomosynthesis is critically important for breast cancer screening. Tumor shape and interactive effect between the tumor and surrounding normal tissue are two of the most important indicators in radiologists’ reading. On the other hand, the density and texture of region within the tumor also play an important role in malignancy classification. Inspired by the above observations, shell and kernel descriptors were proposed in this work for breast lesion malignancy prediction, in which the shell descriptor is used for describing the tumor shape and surrounding normal tissue while the kernel descriptor is used to describe the internal tumor region. A joint deep learning model based on the AlexNet was designed to learn and fuse features from shell and kernel. Additionally, to obtain more reliable predictive results, a multi-objective optimization algorithm and a reliable classifier fusion strategy were used to train the predictive model and optimally combine outputs from both shell and kernel descriptors. In this study, 278 malignant and 685 benign cases were used through 2-fold cross validation. Compared with the single descriptor based models using either shell or kernel, the experimental results demonstrated that the combined shell and kernel descriptors can capture the most important features and the corresponding predictive model achieved the best performance as well.

Automatic cell segmentation using mini-u-net on fluorescence in situ hybridization images

Jianhuo Shen, Teng Li, Chuanrui Hu, et al.

Show abstract

Fluorescent in situ hybridization (FISH) is a molecular cytogenetic technique that provides reliable imaging biomarkers to diagnose cancer and genetic disorders in the cellular level. One prerequisite step to identify carcinoma cells in FISH images is to accurately segment cells, so as to quantify DNA/RNA signals within each cell. Manual cell segmentation is a tedious and time-consuming task, which demands automatic methods. However, automatic cell segmentation is hindered by low image contrast, weak cell boundaries, and cell touching in FISH images. In this paper, we develop a fast mini-U-Net method to address these challenges. Some special characteristics are tailored in the mini-U-Net, including connections between input images and their feature maps to accurately localize cells, mlpcon (multilayer perceptron + convolution) to segment cell regions, and morphology operators and the watershed algorithm to separate each individual cell. In comparison with the U-Net, the miniU-Net has fewer training parameters and less computational cost. The validation on 510 cells indicated that the Dice coefficients of the mini-U-Net and U-Net were 80.20% and 77.27%, and area overlap ratios were 69.17% and 68.04%, respectively. These promising results suggest that the mini-U-Net could generate accurate cell segmentation for fully automatic FISH image analysis.

A pyramid machine learning model for polyp classification via CT colonography

Weiguo Cao, Marc J. Pomeroy, Perry J. Pickhardt, et al.

Show abstract

In this article, we propose a pyramid multilayer machine learning method to combine classification and feature selection into the same model for polyp classification. This model provides a solution to pick the best attributes from three different texture features to form a new descriptor set with much better classification results. Generally, this method has several good properties including generalization, extendibility, and monotonicity. From its performance, the original metric image descriptor (MD) and the post-histogram-equalized metric image descriptor (PMD) form a descriptor pair as the preliminary unit of this pyramid framework. This model is driven by a feature merging performance unit run iteratively until the final results are obtained. After every feature merging step, a new attribute group is selected to construct a shorter but much stronger new descriptor. To reach this purpose, a forward selection method is adopted only to select attributes from every descriptor with positive gains for classification. Therefore, this feature merging performance provides a guarantee of the classification’s monotonicity in the practice. In our experiments, a simple scheme is designed to illustrate its construction and performance. Three image metrics are selected including intensity, gradient and curvature which are put into the gray-level co-occurrence matrix (CM) model to construct polyp descriptors. Random forest is chosen as the classifier and Gini coefficient is used to be the importance score. The AUC (area under the curve of receiver operating characteristics) scores are our evaluation measure. Experimental results showed that the pyramid learning model outperforms other methods over 4%-6% by AUC scores.

Polyp classification by Weber’s Law as texture descriptor for clinical colonoscopy

Yi Wang, Marc Pomeroy, Weiguo Cao, et al.

Show abstract

Weber’s law for image feature descriptor (WLD) is based on the theory that the ratio of the increment threshold to the background intensity is a constant. It has been used in facial recognition, structure detection, and tissue classification in X-ray images. In this paper, WLD is explored in the polyp classification in color colonoscopy images for the first time. An open, on-line colonoscopy image database is used to evaluate the new descriptor. The database contains 74 polyps, including 19 benign polyps and 55 malignant ones. Each polyp has a white light image (WLI) and a narrow band image (NBI), both were obtained by the same fibro-colonoscopy from the same patient. WLD image texture features are extracted from three color channels of (1) color WLI, (2) color NBI and (3) WLI+NBI. The extracted features are analyzed, ranked and classified using a Random Forest package based on the merit of the area under the curve (AUC) of the Receiver Operating Characteristics (ROC). The performance of WLD is quantitatively documented by the AUC, the ROC curve, the P-R (precision-recall) plot and the accuracy measure with comparison to commonly used features, such as Haralick and local binary pattern feature descriptors. The results demonstrate the advantage of WLD in the polyp classification in terms of the quantitative measures.

Texture feature analysis of neighboring colon wall for colorectal polyp classification

Marc Pomeroy, Almas Abbasi, Kevin Baker, et al.

Show abstract

Colorectal cancer (CRC) remains one of the leading causes of cancer deaths today. Since precancerous colorectal polyps slowly progress into cancer, screening methods are highly effective in reducing the overall mortality rate of CRC by removing them before developing into later stages. Virtual colonoscopy has been shown to be a practical screening method and provide a high sensitivity and specificity for diagnosis between hyperplastic polyps and precancerous adenomas or adenocarcinomas through the use of texture feature analysis. We hypothesize that effects from nonhyperplastic polyps, such as angiogenesis from adenocarcinomas, may result in changes to the texture of the colon wall that could help with computer aided diagnosis of the colorectal polyps. Here we present the preliminary results of incorporating the texture features of neighboring colon wall tissue into the diagnostic classification. We use gray level co-occurrence matrices to calculate the established Haralick features and a set of supplemental features for colorectal polyp regions of interest, as well as for the neighboring colon wall environment of the polyp. A random forest package was then used to perform the classification tests on different sets of features, with and without the inclusion of the environment to obtain an area under the curve (AUC) value of the receiver operating characteristic (ROC). Experiments show approximately a 1% increase in overall classification performance with the inclusion of the environment features.

The detection of non-polypoid colorectal lesions using the texture feature extracted from intact colon wall: a pilot study

Hainan Sang, Jiang Meng, Yang Liu, et al.

Show abstract

The detection of non-polypoid colorectal lesions (e.g., the flat and small sessile polyps) is still a challenging task for the computer-aided detection (CADe) method. Different from previous CADe method, we proposed a new scheme to detect the lesion using the texture feature extracted from intact colon wall, since texture feature is sensitivity to detect subtle lesion and all the available information of the lesion imbeds in the colon wall. In this scheme, the inner and outer wall surface were segmented. Then, for each voxel of inner surface, a fixed size neighborhood was projected onto the colon model and the intersection volume of the projection through the colon model was selected as the volume of interest (VOI). From each VOI, three images were obtained: the original CT intensity and its gradient and curvature maps, Gray-scale co-occurrence matrices (CMs) were calculated from these 3 volumetric images, respectively. A total of 196 texture features (60 Haralick features and 6 CT histogram features extracted from each CM) were used to detect initial polyp candidates by a piecewise anomaly detection method of isolation forest, followed by a supervised classification (random Forests) for false positive (FP) reduction. The detection performance was evaluated by a 10-fold cross-validation and free-response receiver operating characteristics analysis. We evaluated our method via 10 patients with 36 confirmed flat and small sessile polyps were collected, including 16 flat, 18 sessile, and 2 pedunculated polyps. The presented detection method achieved 80% sensitivity with 9.98 FPs per dataset. The experiment results demonstrate that our method is a potential way to detect nonpolypoid polyps, particularly flat and depressed ones.

Differentiation of polyps by clinical colonoscopy via integrated color information, image derivatives and machine learning

Yi Wang, Marc Pomeroy, Weiguo Cao, et al.

Show abstract

Clinical colonoscopy is currently the gold standard for polyp detection and resection. Both white light images (WLI) and narrow band images (NBI) could be obtained by the fibro-colonoscope from the same patient and currently used as a diagnosing reference for differentiation of hyperplastic polyps from adenomas. In this paper, we investigate the performance of WLI and NBI in different color spaces for polyp classification. A Haralick model with 30 co-occurrence matrix features is used in our experiments on 74 polyps, including 19 hyperplastic polyps and 55 adenomatous ones. The features are extracted from different color channels in each of three color spaces (RGB, HSV, chromaticity) and different derivative (intensity, gradient and curvature) images. The features from each derivative image in each color space are classified. The classification results from all the color spaces and all the derive images are input to a greedy machine learning program to verify the necessity of the integration of derivative image data and different color spaces. The feature classification and machine learning are implemented by the use of the Random Forest package. The wellknown area under the receiver operating characteristics curve is calculated to quantify the performances. The experiments validated the advantage of using the integration of the three derivatives of WLI and NBI and the three different color spaces for polyp classification.

Early detection of retinopathy of prematurity stage using deep learning approach

Supriti Mulay, Keerthi Ram, Mohanasankar Sivaprakasam, et al.

Show abstract

Retinopathy of Prematurity (ROP) is a fibrovascular proliferative disorder, which affects the developing peripheral retinal vasculature of premature infants. Early detection of ROP is possible in stage 1 and stage 2 characterized by demarcation line and ridge with width which separates vascularised retina and the peripheral retina. To detect demarcation line/ ridge from neonatal retinal images is a complex task because of low contrast images. In this paper we focus on detection of ridge, the important landmark in ROP diagnosis, using Convolutional Neural Network(CNN). Our contribution is to use a CNN-based model Mask R-CNN for demarcation line/ridge detection allowing clinicians to detect ROP stage 2 better. The proposed system applies a pre-processing step of image enhancement to overcome poor image quality. In this study we use labelled neonatal images and we explore the use of CNN to localize ridge in these images. We used a dataset of 220 images of 45 babies from the KIDROP project. The system was trained on 175 retinal images with ground truth segmentation of ridge region. The system was tested on 45 images and reached detection accuracy of 0.88, showing that deep learning detection with pre-processing by image normalization allows robust detection of ROP in early stages.

Longitudinal matching of in vivo adaptive optics images of fluorescent cells in the human eye using stochastically consistent superpixels

Jianfei Liu, HaeWon Jung, Tao Liu, et al.

Show abstract

Fluorescence microscopy has transformed our understanding of modern biology. Recently, this technology was translated to the clinic using adaptive optics enhanced indocyanine green ophthalmoscopy, which enables retinal pigment epithelial cells to be fluorescently-labeled and imaged in the living human eye. Monitoring these cells across longitudinal images on the time scale of months is important for understanding blinding diseases, but remains challenging due to inherent eye-motion-caused distortions, substantial visit-to-visit image displacements, and weak cell boundaries due to the nature of fluorescence data. This paper introduces a stochastically consistent superpixel method to address these issues. First, large displacement optical flow is estimated by embedding global image displacements from a set of maximal stable extremal regions into a variational framework. Next, optical flow is utilized to initialize bilateral Gaussian processes that model superpixel movements. Finally, a generative probabilistic framework is developed to create consistent superpixels constrained with maximal likelihood criterion. Consistent superpixels were evaluated on images from 11 eyes which were longitudinally imaged over 3-12 months. Validation datasets revealed high accuracy across time points despite the presence of visit-to-visit changes.

Computer-based detection of age-related macular degeneration and glaucoma using retinal images and clinical data

Vinayak Joshi, Jeffrey Wigdahl, Jeremy Benson, et al.

Show abstract

Worldwide, glaucoma and age-related macular degeneration (AMD) cause 12.3% and 8.7% of the cases of blindness and/or vision loss, respectively. According to a 5-year study of Medicare beneficiaries, patients who undergo a regular eye screening, experience less decline of vision than those who had less-frequent examinations. A computer-based screening of retinopathies can be highly cost-effective and efficient; however, most auto-screening software address only one eye disease, limiting their clinical utility and cost-effectiveness. Therefore, we propose a computer-based retinopathy screening system for detection of AMD and glaucoma by integrating information from retinal fundus images and clinical data. First, the retinal image analysis algorithms were developed using Transfer Learning approach to determine presence or absence of the eye disease. The clinical data was then utilized to improve disease detection performance where the image-analysis based algorithms provided sub-optimal classification. The results for binary detection (present/absent) of AMD and Glaucoma were compared with the ground truth provided by a certified retinal reader. We applied the proposed method to a dataset of 304 retinal images with AMD, 299 retinal images with Glaucoma, and 2,341 control retinal images. The algorithms demonstrated sensitivity/specificity of 100%/99.5% for detection of any AMD, 82%/70% for detection of referable AMD, and 75%/81% for detection of referable Glaucoma. The automated detection results agree well with the ground truth suggesting its potential in screening for AMD and Glaucoma.

Fully-automated segmentation of optic disk from retinal images using deep learning techniques

F. Zabihollahy, E. Ukwatta

Show abstract

Segmentation of optic disk (OD) from retinal images is a crucial task for early detection of many eye diseases, including glaucoma and diabetic retinopathy. The main goal of this research is to facilitate early diagnosis of certain pathologies via fully automated segmentation of the OD from retinal images. We propose a deep learning-based technique to delineate the boundary of OD from retinal images of patients with diabetic retinopathy and diabetic macular edema. In our method, we first localized OD within a region of interest (ROI) using random forest (RF). The RF is an ensemble algorithm, which trains and combines multiple decision trees to produce a highly accurate classifier. We then used a convolutional neural network (CNN) based model to segment OD from chosen ROIs in the retinal images. The developed algorithm has been validated on 480,249 image patches extracted from 49 images of public Indian diabetic retinopathy image dataset (IDRiD). This dataset includes images with large variability in terms of the spatial location of OD and presence of other eye lesions that resemble the contrast of OD. Validation metrics including average of Dice and Jaccard indexes (DI and JI), Hausdorff distance (HD), and absolute surface difference (ASD) were reported as 82.62 ± 11.07%, 71.78 ± 14.87%, 13.19 ± 10.90 mm, and 22.74 ± 19.78%, respectively. As compared to other alternative methods, such as K-nearest neighbors (KNN), deformable models, graph-cuts, and image thresholding, our method yielded higher accuracy for OD segmentation in comparison to manual expert delineation. The algorithm-generated results demonstrate the usefulness of our proposed method for automated segmentation of OD from retinal images.

Deep learning-based detection of anthropometric landmarks in 3D infants head models

Helena R. Torres, Bruno Oliveira, Fernando Veloso, et al.

Show abstract

Deformational plagiocephaly (DP) is a cranial deformity characterized by an asymmetrical distortion of an infant’s skull. The diagnosis and evaluation of DP are performed using cranial asymmetry indexes obtained from cranial measurements, which can be estimated using anthropometric landmarks of the infant’s head. However, manual labeling of these landmarks is a time-consuming and tedious task, being also prone to observer variability. In this paper, a novel framework to automatically detect anthropometric landmarks of 3D infant’s head models is described. The proposed method is divided into two stages: (i) unfolding of the 3D head model surface; and (ii) landmarks’ detection through a deep learning strategy. In the first stage, an unfolding strategy is used to transform the 3D mesh of the head model to a flattened 2D version of it. From the flattened mesh, three 2D informational maps are generated using specific head characteristics. In the second stage, a deep learning strategy is used to detect the anthropometric landmarks in a 3-channel image constructed using the combination of informational maps. The proposed framework was validated in fifteen 3D synthetic models of infant’s head, being achieved, in average for all landmarks, a mean distance error of 3.5 mm between the automatic detection and a manually constructed ground-truth. Moreover, the estimated cranial measurements were comparable to the ones obtained manually, without statistically significant differences between them for most of the indexes. The obtained results demonstrated the good performance of the proposed method, showing the potential of this framework in clinical practice.

Quantitative evaluation of local head malformations from 3 dimensional photography: application to craniosynostosis

Liyun Tu, Antonio R. Porras, Albert Oh, et al.

Show abstract

The evaluation of head malformations plays an essential role in the early diagnosis, the decision to perform surgery and the assessment of the surgical outcome of patients with craniosynostosis. Clinicians rely on two metrics to evaluate the head shape: head circumference (HC) and cephalic index (CI). However, they present a high inter-observer variability and they do not take into account the location of the head abnormalities. In this study, we present an automated framework to objectively quantify the head malformations, HC, and CI from three-dimensional (3D) photography, a radiation-free, fast and non-invasive imaging modality. Our method automatically extracts the head shape using a set of landmarks identified by registering the head surface of a patient to a reference template in which the position of the landmarks is known. Then, we quantify head malformations as the local distances between the patient’s head and its closest normal from a normative statistical head shape multi-atlas. We calculated cranial malformations, HC, and CI for 28 patients with craniosynostosis, and we compared them with those computed from the normative population. Malformation differences between the two populations were statistically significant (p<0.05) at the head regions with abnormal development due to suture fusion. We also trained a support vector machine classifier using the malformations calculated and we obtained an improved accuracy of 91.03% in the detection of craniosynostosis, compared to 78.21% obtained with HC or CI. This method has the potential to assist in the longitudinal evaluation of cranial malformations after surgical treatment of craniosynostosis.

Predicting resection volumes within the nasal cavity to improve patients breathing

Manuel Berger, Martin Pillei, Andreas Mehrle, et al.

Show abstract

Current investigation techniques (Rhinomanometry, Endoscopy, e.g.) of the nasal cavity are not always able to diagnose a successful surgery outcome to relieve breathing. The presented approach uses medical imaging datasets (i.e. cone beam Computer Tomography) to simulate the flow through the nasal cavity with a Laser Doppler Anemometry (LDA) validated lattice Boltzmann (LB) simulation. In order to find potential surgically relevant points (SRP), the determination of the patient’s LB-simulated pressure drop is used for comparison with a critical pressure drop found by a sensitivity analysis (3 patients with nasal septum deviation, 1 person without breathing problems). Based on SRPs a developed optimizer shapes the nasal cavity. All locations of SRPs in postoperative CT datasets show an increase of the crosssection of the nasal cavity. The difference of cross-section between pre- and optimized nasal cavity in the postoperative CT is smaller than 15%. The optimization result shows good and this method will be validated in a future clinical trial.

Automated scoring of aortic calcification in vertebral fracture assessment images

Luke Chaplin, Tim Cootes

Show abstract

The severity of abdominal aortic calcification (AAC) is a strong, independent predictor of cardiovascular disease (CVD). Vertebral fracture assessment (VFA) is a low radiation screening tool which can be used to incidentally measure AAC. This work compares the performance of Haar feature random forest classification with a Unet based convolutional neural network (CNN) segmentation, to automatically quantify AAC. Clinical semiquantitative scores were also generated using U-net. Scores were calculated using the relative length of labelled calcification and compared to manual scoring. The U-net outperformed the random forest, showed sensible segmentations and AAC scores, though it could not match human annotation accuracy.

Detection and classification of coronary artery calcifications in low dose thoracic CT using deep learning

Jordan D. Fuhrman, Jennie Crosby, Rowena Yip, et al.

Show abstract

Deep Learning is expanding in the detection and diagnosis of abnormalities , including coronary artery calcification (CAC), in CT. CACs can also be visualized on low-dose thoracic screening CTs (LDCT), and thus, in this study, deep learning is investigated for the detection of CACs and assessment of their severity on LDCT images. The study dataset included 863 LDCT cases, each assigned a case severity score, which is related to the Agatston score, ranging between 0 and 12 (0 = no CAC present, 12 = severe CACs). Within the cardiac region, 224 × 224 pixel ROIs were extracted from each CT slice and input to a convolutional neural network (CNN). CNN-based features were extracted using a pre-trained VGG19 and merged with a support vector machine (SVM) yielding a slice likelihood score of the presence of CACs . Case prediction scores were obtained by using the maximum and mean scores of all slices belonging to that case. Area under the ROC curve (AUC) was used as a metric to assess the discrimination performance level. Using a randomly selected subset of images containing similar amounts of each severity subtype, the SVM performed better using the max slice score per case (AUC = 0.79, standard error = 0.03). While this AUC value does not reach those found in similar studies for diagnostic CT and cardiac CT angiography, this study demonstrates potential for deep learning use in LDCT screening programs.

Computerized identification of early ischemic changes in acute stroke in noncontrast CT using deep learning

Noriyuki Takahashi, Yuki Shinohara, Toshibumi Kinoshita, et al.

Show abstract

Treatment for patients with acute ischemic stroke is most commonly determined based on findings on noncontrast computerized tomography (CT). Identifying hypoattenuation of the early ischemic changes on CT images is crucial for diagnosis. However, it is difficult to identify hypoattenuation with certainty. We present an atlas-based computerized method using a convolutional neural network (CNN) to identify hypoattenuation in the lentiform nucleus and the insula, two locations where hypoattenuation appears most frequently. The algorithm for this method consisted of anatomic standardization, setting of regions, creation of input images for classification, training on the CNN and classification of hypoattenuation. The regions of the lentiform nucleus and insula were set according to the Alberta Stroke Programme Early CT score (ASPECTS) method, a visual quantitative CT scoring system. AlexNet was used in the classification of the CNN architecture. We applied this method to the lentiform nucleus and insula using a database of 20 patients with right-sided hypoattenuation, 20 patients with left-sided hypoattenuation, and 20 normal subjects. Our method was evaluated using a leave-one-case-out cross-validation test. This new method had an average accuracy of 88.3%, an average sensitivity of 87.5%, and an average specificity of 90% for identifying hypoattenuation in the two regions. These results indicate that this new method has the potential to accurately identify hypoattenuation in the lentiform nucleus and the insula in patients with acute ischemic stroke.

Fully automated segmentation of left ventricular myocardium from 3D late gadolinium enhancement magnetic resonance images using a U-net convolutional neural network-based model

Fatemeh Zabihollahy, James A. White, Eranga Ukwatta

Show abstract

Myocardial tissue characterization on 3-dimensional late gadolinium enhancement magnetic resonance (3D LGE MR) is of increasing clinical importance for the quantification and spatial mapping of myocardial scar, a recognized substrate for malignant ventricular arrhythmias. Success of this task is dependent upon reproducible segmentation of myocardial architecture in 3D-space. In this paper, we describe a novel method to segment left ventricle (LV) myocardium from 3D LGE MR images using a U-Net convolutional neural network (CNN)-based model. Our proposed network consists of shrinking and expanding paths, where image features are captured and localized through several convolutional, pooling and up-sampling layers. We trained our model using 2090 slices extracted and artificially augmented from 14 3D LGE MR datasets, followed by validation of the trained model on ten 3D LGE MR unobserved test datasets inclusive of 926 slices. Averages of Dice index (DI) and absolute volume difference as a percentage versus manual defined myocardial volumes (NAVD) on the test dataset were obtained, providing values of 86.61 ± 3.80 % and 12.95 ± 9.56%, respectively. These algorithm-generated results demonstrate usefulness of our proposed fully automated method for segmentation of the LV myocardium from 3D LGE MR images.

Deep learning based bladder cancer treatment response assessment

Eric Wu, Lubomir M. Hadjiiski, Ravi K. Samala, et al.

Show abstract

We compared the performance of different Deep Learning - Convolutional Neural Network (DL-CNN) models for bladder cancer treatment response assessment based on transfer learning by freezing different DL-CNN layers and variation of the DL-CNN structure. Pre- and post-treatment CT scans of 123 patients (129 cancers, 158 pre- and posttreatment cancer pairs) undergoing chemotherapy were collected. 33% of patients had T0 stage cancer (complete response) after chemotherapy. Regions of interest (ROIs) of pre- and post-treatment scans were extracted from the segmented lesions and combined into hybrid pre-post image pairs. The dataset was split into training (94 pairs and 6209 hybrid ROIs), validation (10 pairs) and test sets (54 pairs). The DL-CNN consists of 2 convolution (C1, C2), 2 locally connected (L1, L2), and 1 fully connected layers, implemented in TensorFlow. The DL-CNN was trained to classify the bladder cancers as fully responding (stage T0) or not fully responding to chemotherapy based on the hybrid ROIs. Two blinded radiologists provided an estimate of the likelihood of the lesion being stage T0 post-treatment by reading the pairs of pre- and post-treatment CT volumes. The test AUC was 0.73 for T0 prediction by the base DL-CNN structure with randomly initialized weights. The base DL-CNN structure with transfer learning pre-trained weights (no frozen layers) achieved a test AUC of 0.79. The test AUCs for 3 modified DL-CNN structures (different C1, C2 max pooling filter sizes, strides, and padding, with transfer learning) were 0.72, 0.86, and 0.69, respectively. For the base DL-CNN with (C1) frozen, (C1, C2) frozen, and (C1, C2, L3) frozen during transfer learning, the test AUCs were 0.81, 0.78, and 0.71, respectively. The radiologists’ AUCs were 0.76 and 0.77. The DL-CNN performed better with pre-trained than randomly initialized weights.

Deep learning convolutional neural networks for the estimation of liver fibrosis severity from ultrasound texture

Alex Treacher, Daniel Beauchamp, Bilal Quadri, et al.

Show abstract

Diagnosis and staging of liver fibrosis is a vital prognostic marker in chronic liver diseases. Due to the inaccuracies and risk of complications associated with liver core needle biopsy, the current standard for diagnosis, other less invasive methods are sought for diagnosis. One such method that has been shown to correlate well with liver fibrosis is shear wave velocity measured by ultrasound (US) shear wave elastography; however, this technique requires specific software, hardware, and training. A current perspective in the radiology community is that the texture pattern from an US image may be predictive of the stage of liver fibrosis. We propose the use of convolutional neural networks (CNNs), a framework shown to be well suited for real world image interpretation, to test whether the texture pattern in gray scale elastography images (B-mode US with fixed, subject-agnostic acquisition settings) is predictive of the shear wave velocity (SWV). In this study, gray scale elastography images from over 300 patients including 3,500 images with corresponding SWV measurements were preprocessed and used as input to 100 different CNN architectures that were trained to regress shear wave velocity. In this study, even the best performing CNN explained only negligible variation in the shear wave velocity measures. These extensive test results suggest that the gray scale elastography image texture provides little predictive information about shear wave velocity and liver fibrosis.

Automated identification of thoracic pathology from chest radiographs with enhanced training pipeline

Adora M. DSouza, Anas Z. Abidin, Axel Wismüller

Show abstract

Chest x-rays are the most common radiology studies for diagnosing lung and heart disease. Hence, a system for automated pre-reporting of pathologic findings on chest x-rays would greatly enhance radiologists’ productivity. To this end, we investigate a deep-learning framework with novel training schemes for classification of different thoracic pathology labels from chest x-rays. We use the currently largest publicly available annotated dataset ChestX-ray14 of 112,120 chest radiographs of 30,805 patients. Each image was annotated with either a 'NoFinding' class, or one or more of 14 thoracic pathology labels. Subjects can have multiple pathologies, resulting in a multi-class, multi-label problem. We encoded labels as binary vectors using k-hot encoding. We study the ResNet34 architecture, pre-trained on ImageNet, where two key modifications were incorporated into the training framework: (1) Stochastic gradient descent with momentum and with restarts using cosine annealing, (2) Variable image sizes for fine-tuning to prevent overfitting. Additionally, we use a heuristic algorithm to select a good learning rate. Learning with restarts was used to avoid local minima. Area Under receiver operating characteristics Curve (AUC) was used to quantitatively evaluate diagnostic quality. Our results are comparable to, or outperform the best results of current state-of-the-art methods with AUCs as follows: Atelectasis:0.81, Cardiomegaly:0.91, Consolidation:0.81, Edema:0.92, Effusion:0.89, Emphysema: 0.92, Fibrosis:0.81, Hernia:0.84, Infiltration:0.73, Mass:0.85, Nodule:0.76, Pleural Thickening:0.81, Pneumonia:0.77, Pneumothorax:0.89 and NoFinding:0.79. Our results suggest that, in addition to using sophisticated network architectures, a good learning rate, scheduler and a robust optimizer can boost performance.

3D fully convolutional network-based segmentation of lung nodules in CT images with a clinically inspired data synthesis method

Atsushi Yaguchi, Kota Aoyagi, Akiyuki Tanizawa, et al.

Show abstract

In the management of lung nodules, it is important to precisely assess nodule size on computed tomography (CT) images. Given that the malignancy of nodules varies according to their composition, component-wise assessment is useful for diagnosing lung cancer. To improve the accuracy of volumetric measurement of lung nodules, we propose a deep learning-based method for segmenting nodules into multiple components, namely, solid, ground glass opacity (GGO), and cavity. We train a 3D fully convolutional network (FCN) with component-wise dice loss and apply a conditional random field (CRF) to refine the segmentation boundaries. To further gain the accuracy, we artificially generate synthetic cavitary nodules based on clinical observations and then augment the dataset for training the network. In experiments using about 300 CT images of clinical nodules, we evaluated our method in terms of mean absolute percentage error of volumetric measurement. We confirmed that our method achieved 15.84% lower error (averaged over 2 components of solid and GGO) compared with a conventional method based on image processing, and the error for cavity was decreased by 2.87% with our data-synthesis method.

A lung graph model for the classification of interstitial lung diseases on CT images

Guillaume Vanoost, Yashin Dicente Cid, Daniel Rubin, et al.

Show abstract

Diagnosing Interstitial Lung Diseases (ILD) is a difficult task. It requires experienced chest radiologists that may not be available in less-specialized health centers. Moreover, a correct diagnosis is needed to decide for an appropriate treatment and prognostic. In this paper, we focus on the classification of 3 common subtypes of ILDs: Usual Interstitial Pneumonia (UIP), Non-Specific Interstitial Pneumonia (NSIP) and Chronic Hypersensitivity Pneumonitis (CHP). We propose a graph model of the lungs built from a large dataset. The structure of the graph is inspired from medical knowledge of disease predominance, where the nodes correspond to 24 distinct regions obtained from lateral, anterior-posterior and vertical splits of the images. The adjacency matrix is built from distances between intensity distributions of distinct regions. Graphs models are interpretable and were successfully used in neuroimaging. However, to the best of our knowledge, this is the first attempt to use a graph model of the lungs for classifying ILDs. In the particular case of ILDs, graph methods are relevant for the following reasons. In order to differentiate between the subtypes, not only the types of local patterns of the disease are important but also their anatomical location. Therefore, we hypothesize that the comparison between regional distributions of Hounsfield Unit (HU) values is relevant to discriminate between the considered ILD subtypes. For instance, typical UIP shows a spatial predominance of reticular abnormalities and honeycombing in the peripheral regions of the lung bases. Therefore, we expect a marked difference of HU distributions between the central and peripheral regions of the lung bases. Moreover, the construction of the graph leads to an interpretable patient descriptor. The descriptor led to encouraging area under the Receiver Operating Characteristic (ROC) curve in 0.6-0.8 for one-versus-one classification configurations, which also showed to outperform feature sets based on a simple concatenation of regional HU distributions.

Improving pulmonary lobe segmentation on expiratory CTs by using aligned inspiratory CTs

Oliver Weinheimer, Mark O. Wielpütz, Philip Konietzke, et al.

Show abstract

Quantitative computed tomography (QCT) indices calculated on paired inspiratory/expiratory multidetector computer tomography (MDCT) can deliver valuable information about functional changes in airway diseases like cystic fibrosis (CF). Air trapping is an important early sign of CF which can only be quantified on expiratory CTs. An accurate lobe segmentation is needed for a regional analysis. Direct lobe segmentation (DLS) is more challenging to perform on expiratory CT images than on inspiratory images. We suggest a registration-based lobe segmentation (RLS) procedure for expiratory CTs if paired inspiratory/expiratory CTs are available. Firstly, our existing fully automated lobe segmentation algorithm was applied to the inspiratory images. Secondly, inspiratory and expiratory images were aligned by a deformable image registration algorithm. Thirdly, the calculated transformation between inspiratory and expiratory images was applied on the lobe segmentation determined on the inspiratory images. Finally, the transferred lobe segmentation was slightly adjusted to the expiratory CT. Validation of the procedure was performed on 128 paired inspiratory/expiratory CTs. The scans were acquired from 16 children with mild CF at 4 time points reconstructed with two different kernels. 6 lobes were segmented, the lingula was treated as separate lobe. We validated the registration-based lobe masks against manually corrected lobe masks. The mean spatial overlap (Dice Index) for DLS was 0.97±0.02 on the inspiratory CTs, and 0.82 ± 0.09 on expiratory CTs, determined in a previous study. In the present study the overlap was significantly improved for the expiratory CTs by the new RLS approach to 0.91 ± 0.05 (p < 2.2e − 16). This significant improvement brings the quality of lung lobe segmentation on expiratory CTs closer to the already very good lobe segmentation results on inspiratory CTs by DLS, thus reducing the need for manual post-processing.

Pre-trained deep convolutional neural networks for the segmentation of malignant pleural mesothelioma tumor on CT scans

Eyjolfur Gudmundsson, Christopher M. Straus, Samuel G. Armato III

Show abstract

Pre-trained deep convolutional neural networks (CNNs) have shown promise in the training of deep CNNs for medical imaging applications. The purpose of this study was to investigate the use of partially pre-trained deep CNNs for the segmentation of malignant pleural mesothelioma tumor on CT scans. Four network configurations were investigated: (1) VGG16/U-Net network with pre-trained layers fixed during training, (2) VGG16/U-Net network with pre-trained layers fine-tuned during training, (3) VGG16/U-Net network with all except the first two pre-trained layers fine-tuned during training, and (4) a standard U-Net architecture trained from scratch. Deep CNNs were trained separately for tumor segmentation in left and right hemithoraces using 4259 and 6441 contoured axial CT sections, respectively. A test set of 61 CT sections from 16 patients excluded from training was used to evaluate segmentation performance; the Dice similarity coefficient (DSC) was calculated between computer-generated and reference segmentations provided by two radiologists and one radiology resident. Median DSC on the test set was 0.739 (range 0.328–0.920), 0.772 (range 0.342–0.949), 0.777 (range 0.216–0.946), and 0.758 (range 0.099–0.943) across all observers for network configurations (1), (2), (3) and (4) above, respectively. The median DSC achieved with configuration (3) when compared with the standard U-Net trained from scratch was found to be significantly higher for two out of three observers. A fine-tuned VGG16/U-Net deep CNN showed significantly higher overlap with two out of three observers when compared with a standard U-Net trained from scratch for the segmentation of malignant pleural mesothelioma tumor.

Artificial intelligence for point of care radiograph quality assessment

Satyananda Kashyap, Mehdi Moradi, Alexandros Karargyris, et al.

Show abstract

Chest X-rays are among the most common modalities in medical imaging. Technical flaws of these images, such as over- or under-exposure or wrong positioning of the patients can result in a decision to reject and repeat the scan. We propose an automatic method to detect images that are not suitable for diagnostic study. If deployed at the point of image acquisition, such a system can warn the technician, so the repeat image is acquired without the need to bring the patient back to the scanner. We use a deep neural network trained on a dataset of 3487 images labeled by two experienced radiologists to classify the images as diagnostic or non-diagnostic. The DenseNet121 architecture is used for this classification task. The trained network has an area under the receiver operator curve (AUC) of 0.93. By removing the X-rays with diagnostic quality issues, this technology could potentially provide significant cost savings for hospitals.

Exploring features towards semantic characterization of lung nodules in computed tomography images

Maysa M. G. Macedo, Dario A. B. Oliveira

Show abstract

One of the main challenges in the integration of medical data reports is translating numerical features from different sources into a common abstract vocabulary that support a seamless combination of such data. When it comes to image analysis, a very common pipeline to describe the image involves extracting numerical features from image data and translate them into meaningful pre-defined semantic concepts. In this context, we propose a methodology for selecting numerical features and relating them to semantic features using the publicly available categorization in the lung nodules LIDC NIH database. We present several numerical features joined several classifiers, and a comparison between two feature selection methods and discuss how different features contribute to the discrimination of different semantic characteristics of lung nodules. Our results show the potential of such methodology for translating features into abstract semantic concepts for lung nodules characterization.

Lung segmentation based on a deep learning approach for dynamic chest radiography

Yuki Kitahara, Rie Tanaka, Holger R. Roth, et al.

Show abstract

The purpose of this study was to develop a lung segmentation based on a deep learning approach for dynamic chest radiography, and to assess the clinical utility for pulmonary function assessment. Maximum inhale and exhale images were selected in dynamic chest radiographs of 214 cases, comprising 150 images during respiration. In total, 534 images (2 to 4 images per case) with annotations were prepared for this study. Three hundred images were fed into a fullyconvolutional neural network (FCNN) architecture to train a deep learning model for lung segmentation, and 234 images were used for testing. To reduce misrecognition of the lung, post processing methods on the basis of time-series information were applied to the resulting images. The change rate of the lung area was calculated throughout all frames and its clinical utility was assessed in patients with pulmonary diseases. The Sorenson-Dice coefficients between the segmentation results and the gold standard were 0.94 in inhale and 0.95 in exhale phases, respectively. There were some false recognitions (214/234), but 163 were eliminated by our post processing. The measurement of the lung area and its respiratory change were useful for the evaluation of lung conditions; prolonged expiration in obstructive pulmonary diseases could be detected as a reduced change rate of the lung area in the exhale phase. Semantic segmentation deep learning approach allows for the sequential lung segmentation of dynamic chest radiographs with high accuracy (94%) and is useful for the evaluation of pulmonary function.

Computer-aided detection using non-convolutional neural network Gaussian processes

Devanshu Agrawal, Hong-Jun Yoon, Georgia Tourassi, et al.

Show abstract

Deep convolutional neural networks (CNNs) have in recent years achieved record-breaking performance on many image classification tasks and are therefore well-suited for computer aided detection (CAD). The need for uncertainty quantification for CAD motivates the need for a probabilistic framework for deep learning. The most well-known probabilistic neural network model is the Bayesian neural network (BNN), but BNNs are notoriously difficult to sample for large complex network architectures, and as such their use is restricted to small problems. It is known that the limit of BNNs as their widths increase toward infinity is a Gaussian process (GP), and there has been considerable research interest in these infinitely wide BNNs. Recently, this classic result has been extended to deep architectures in what is termed the neural network Gaussian process (NNGP) model. In this work, we implement an NNGP model and apply it to the ChestXRay14 dataset at the full resolution of 1024x1024 pixels. Even without any convolutional aspects to the network architecture and without any data augmentation, our five layer deep NNGP model outperforms other non-convolutional models and therefore helps to narrow the performance gap between non-convolutional and convolutional models. Our NNGP model is fully Bayesian and therefore offers uncertainty information through its predictive variance that can be used to formulate a predictive confidence measure. We show that the performance of the NNGP model is significantly boosted after low-confidence predictions are rejected, suggesting that convolution is most beneficial only for these low-confidence examples. Finally, our results indicate that an extremely large fully-connected neural network with appropriate regularization could perform as well as the NNGP if not for the computational bottleneck resulting from the large number of model parameters.

2.5D CNN model for detecting lung disease using weak supervision

Yue Geng, Yinhao Ren, Rui Hou, et al.

Show abstract

Our goal is to develop a 2.5D CNN model to detect multiple diseases in multiple organs in CT scans. In this study we investigated detection of 4 common diseases in the lungs, which are atelectasis, edema, pneumonia and nodule. Most existing algorithms for computer-aided diagnosis (CAD) of CT use 2D models for the axial slices. Our hypothesis is that by using information from all of the three views (coronal, sagittal and axial), we may achieve a better classification result, because some diseases may be more obvious from a different view or from the combination of multi-views. Our data consisted of 1089 CT scans, which contains 288 normal cases, 224 atelectasis cases, 156 edema cases, 225 pneumonia cases and 196 nodule cases. The cases were selected from approximately 5,000 chest CTs from Duke University Health System, and case-level labels were automatically extracted by simple rule-based filtering of the unstructured text from the radiology report. Each of these 5 categories excluded the others, which indicates that cases from each category will have either only one of the four diseases or no disease. To create 2.5D volume patches, we combined together three channels representing parallel slices in each of the three intersecting, orthogonal directions, resulting in sparsely sampled cubes of 20.2 x 20.2 x 20.2 mm. For each CT scan, the volume containing the lungs was identified with thresholding, and 30 patches were randomly sampled within that volume. Then three 3-channel images in each patch representing those 3 different directions were entered into 3 independent CNN paths separately, which were finally fused by a fully connected layer. We used a 4 fold cross-validation and evaluated our results using receiver operating characteristic (ROC) area under the curve (AUC). We achieved an average AUC of 0.891 for classifying normal vs. atelectasis disease, 0.940 for edema disease, 0.869 for pneumonia disease and 0.784 for nodule disease. We also implemented a train-validation-test process for each disease to evaluate the generalization of our model and again got comparable test results, 0.818 for atelectasis, 0.963 for edema, 0.878 for pneumonia and 0.784 for nodule. Despite the limitation of the small dataset scale, we demonstrated that we developed a generalizable 2.5D CNN model for detection of multiple lung diseases.

Lung nodule retrieval using semantic similarity estimates

Mark Loyman, Hayit Greenspan

Show abstract

Retrieval of similar cases is an important approach for the design of an automated system to support the radiologist decision making process. The ability to utilize machine learning for the task of image retrieval is limited by the availability of ground truth on similarity between dataset elements (e.g. between nodules). Consequently, past approaches have focused on manual feature extraction and unsupervised approaches. Currently, medical retrieval studies have focused on learning similarity from a binary classification task (e.g. malignancy), and the performance evaluation was also based on a binary classification framework. Such similarity measure is far from being adequate and fails to capture true retrieval performance. Current study explores the task of similarity learning in the context of lung nodule retrieval, using LIDC’s public dataset. LIDC offers annotations of nodules that include ratings of 9 characteristics per each nodule. These rating are used as our golden-standard similarity measure. Four architectures that utilize the same core network are being explored. These architectures correspond to four unique tasks: binary classification, binary similarity, rating regression and similarity regression. Results show clear discrepancy between classic performance measures and the correlation to the reference similarity measure: all methods had precision in the range of 0.73-0.75, while rating correlation ranged 0.22 to 0.51, with the highest correlation achieved with the rating-regression approach. Additionally, a measure of the uniformity of the embedding space (Hubness) is introduced. The importance of Hubness, as an independent success criteria, is explained, and the measure is evaluated for all architectures. Our rating-regression network has reached state-of-the-art result in several tasks.

Predicting unnecessary nodule biopsy for a small lung cancer screening dataset by less-abstractive deep features

Fangfang Han, Linkai Yan, Chen Li, et al.

Show abstract

Screening lung cancer by computed tomography (CT) has shown great benefit for early cancer detection, but requires a great effort to eliminate the associated false detection, where the biopsy option costs most among other eliminating options. Therefore it is significant to study lung cancer through image analysis to decrease biopsy tests. However, it is extremely difficult to get enough data with biopsy reports from hospital for machine learning study in a short period. So this study aims to explore machine transfer learning innovations to predict unnecessary biopsies from a very small dataset of pathologically proven nodule CT images. To overcome the problem of big data requirement of the CNN architecture (such as VGG used in this study), we used the parameters trained by ImageNet as the initial features. Then we put part of the labeled pulmonary nodule dataset with the ground truth into the training dataset to fine-tune the parameters of different architectures. Fifty repetitions of the cross validation method of two-thirds training and one-third testing are used to measure the efficiency of different deep transfer learning architectures. Through the classification results shown in ROC curves and AUC values, we find that deep features transferred from natural images can enhance 0.1663 more than the traditional machine learning method based on texture features extracted from gray images directly. And our improved VGG architecture with 8 layers for achieving less-abstractive features can obtain 0.1081 better performance than the more-abstractive ones on the recognition of malignant nodules.

Lung tissue characterization for emphysema differential diagnosis using deep convolutional neural networks

Mohammadreza Negahdar, David Beymer

Show abstract

In this study, we propose and validate an end-to-end pipeline based on deep learning for differential diagnosis of emphysema in thoracic CT images. The five lung tissue patterns involved in most differential restrictive and obstructive lung disease diagnoses include: emphysema, ground glass, fibrosis, micronodule, and normal. Four established network architectures have been trained and evaluated. To the best of our knowledge, this is the first comprehensive end-to-end deep CNN pipeline for differential diagnosis of emphysema. A comparative analysis shows the performance of the proposed models on two publicly available datasets.

Fine-grained lung nodule segmentation with pyramid deconvolutional neural network

Xinzhuo Zhao, Wenqing Sun, Wei Qian, et al.

Show abstract

An improved pyramid deconvolutional neural network is proposed to fine-grained segment pulmonary nodules of CT scan images. The fully convolutional neural network (FCN) can train images end-to-end, pixel-to-pixel, realizing object detection, segmentation and classification in one single CNN structure. However, the original FCN is utilized by the natural object tasks, which can hardly maintain the precision degree required by the medical images. To further improve the detection precision and segment accuracy, we improve the FCN by fusing more pooling layers, because the deconvolution of higher convolution layers give the coarser segmentations and lower convolution layers generate detail contour. The experiment is based on LIDC- IDRI datasets. Tenfold cross-validation is used to train and evaluate the performance. The experiment shows that the detection precise and the fineness of segmentation ascend with the number of the fused pooling layers. The detection rate can be achieved as high as 0.931 ± 0.042. Meanwhile, for the segmentation performance evaluation, the score of intersection over Union (IoU) is applied, reaching 0.628 ± 0.065. And the overlap rate (i.e. the overlap percentage of the segment result compared with the original label) is also calculated. The same as the detect accuracy, the improved architecture, which fuses more pooling layers, achieves the highest overlap rate, which is 0.739 ± 0.076.

Similar CT image retrieval method based on lesion nature and its three-dimensional distribution

Yasutaka Moriwaki, Nobuhiro Miyazaki, Hiroaki Takebe, et al.

Show abstract

In imaging diagnosis, radiologists refer to the CT images of the similar cases. However, it is a big burden for them to search such CT images from the huge numbers of CT images. To solve this problem, many retrieval methods of CT images have been developed. Most existing retrieval methods target cases in which lesions exist within a limited region of the lung. These methods retrieve similar cases by calculating the similarity to the region specified on a slice image of the query case, for example, solitary pulmonary nodules. However, radiologists diagnose not only such cases but also diffuse lung disease (DLD), where lesions exist throughout the lung. Radiologists diagnose DLDs by grasping the threedimensional (3D) distribution of lesion textures. However, the existing methods cannot retrieve similar DLDs. We propose a novel method that can retrieve morphologically similar cases based on the radiologist’s knowledge, how they diagnose DLDs. In the proposed method, we configure a 3D model for the central-peripheral region of a lung, represent the similarity for the 3D distribution of lesions as histograms, and then retrieve the cases of the similar histograms. We evaluate the average precision of the proposed method for DLD CT images. For the top 5 cases, the mean of the average precisions of the proposed method is 0.84 and is better than that of the method that only calculates the volume rate of the lesions in the lung (0.64). The proposed method retrieves similar DLDs based on 3D distribution of lesion textures and is expected to contribute to diagnosis support in clinical practice.

Metastatic lymph node analysis of colorectal cancer using quadruple-phase CT images

Keisuke Bando, Ren Nishimoto, Mikio Matsuhiro, et al.

Show abstract

The mortality rate of colorectal cancer in Japan is increasing year by year. The mortality rate is 3rd in males and 1st in death rate in females. However, it is possible to raise the 5-year survival rate to over 80% by discovering and resecting it early. Colorectal cancer is a malignant tumor that spreads and spreads. There are various types of metastasis, but the most important factor in predicting the prognosis of early colorectal cancer patients is lymph node metastasis. Generally, it is said that the greater the lymph node diameter, the higher the possibility of positive for metastasis. However, there are cases where metastasis is also confirmed in those with small lymph node diameters. The purpose of this study is to detect the metastatic lymph nodes using spatiotemporal features of triplet-phase CT images (arterial phase, portal vein phase, equilibrium phase). This method consists of 1) lymph node extraction, 2) metastatic lymph node classification and 3) quantitative assessment of metastatic lymph node. The method was applied to 33 cases of rectal cancer. For quantitative analysis of lymph node metastasis, logistic regression analysis is used to identify the image feature dominant in lymph node metastasis.

CT-realistic data augmentation using generative adversarial network for robust lymph node segmentation

You-Bao Tang, Sooyoun Oh, Yu-Xing Tang, et al.

Show abstract

As an important task in medical imaging analysis, automatic lymph node segmentation from computed tomography (CT) scans has been studied well in recent years, but it is still very challenging due to the lack of adequately-labeled training data. Manually annotating a large number of lymph node segmentations is expensive and time-consuming. For this reason, data augmentation can be considered as a surrogate of enriching the data. However, most of the traditional augmentation methods use a combination of affine transformations to manipulate the data, which cannot increase the diversity of the data’s contextual information. To mitigate this problem, this paper proposes a data augmentation approach based on generative adversarial network (GAN) to synthesize a large number of CT-realistic images from customized lymph node masks. In this work, the pix2pix GAN model is used due to its strength for image generation, which can learn the structural and contextual information of lymph nodes and their surrounding tissues from CT scans. With these additional augmented images, a robust U-Net model is learned for lymph node segmentation. Experimental results on NIH lymph node dataset demonstrate that the proposed data augmentation approach can produce realistic CT images and the lymph node segmentation performance is improved effectively using the additional augmented data, e.g. the Dice score increased about 2.2% (from 80.3% to 82.5%).

Transfer learning for automatic cancer tissue detection using multispectral photoacoustic imaging

Kamal Jnawali, Bhargava Chinni, Vikram Dogra, et al.

Show abstract

Pathology diagnosis is usually done by a human pathologist observing tissue stained glass slide under a microscope. In the case of multi-specimen study to locate cancer region, such as in thyroidectomy, significant labor-intensive processing is required at high cost. Multispectral photoacoustic (MPA) specimen imaging, has proven successful in differentiating photoacoustic (PA) signal characteristics between a histopathology defined cancer region and normal tissue. A more pragmatic research question to ask is, can MPA imaging data predict, whether a sectioned tissue slice has cancer region(s)? We propose to use inception-resnet-v2 convolutional neural networks (CNNs) on the thyroid MPA data to evaluate this potential by transfer learning. The proposed algorithm first extracts features from the thyroid MPA image data using CNN and then detects cancer using the softmax function, the last layer of the network. The AUCs (area under curve) of the receiver operating characteristic (ROC) curve of cancer, benign nodule and normal are 0.73, 0.81, and 0.88 respectively with a limited number of the MPA dataset.

Automatic MRI prostate segmentation using 3D deeply supervised FCN with concatenated atrous convolution

Bo Wang, Yang Lei, Jiwoong Jason Jeong, et al.

Show abstract

Prostate segmentation of MR volumes is a very important task for treatment planning and image-guided brachytherapy and radiotherapy. Manual delineation of prostate in MR image is very time-consuming and depends on the subjective experience of the physicians. On the other hand, automatic prostate segmentation becomes a reasonable and attractive choice for its speed, even though the task is very challenging because of inhomogeneous intensity and variability of prostate appearance and shape. In this paper, we propose a method to automatically segment MR prostate image based on 3D deeply supervised FCN with concatenated atrous convolution (3D DSA-FCN). More discriminative features provide explicit convergence acceleration in training stage using straightforward dense predictions as deep supervision and the concatenated atrous convolution extract more global contextual information for accurate predictions. The presented method was evaluated on the internal dataset comprising 15 T2-weighted prostate MR volumes from Winship Cancer Institute and obtained a mean Dice similarity coefficient (DSC) of 0.852±0.031, 95% Hausdorff distance (95%HD) 7.189±1.953 mm and mean surface distance (MSD) of 1.597±0.360 mm. The experimental results show that our 3D DSA-FCN could yield satisfied MR prostate segmentation, which can be used for image-guided radiotherapy.

Radiomic features derived from pre-operative multi-parametric MRI of prostate cancer are associated with Decipher risk score

Lin Li, Rakesh Shiradkar, Ahmad Algohary, et al.

Show abstract

Decipher, a genomic test, is used to predict the likelihood of metastasis and prostate cancer (PCa) specific mortality based on expression patterns of 22 RNA markers from radical prostatectomy (RP) specimens. It has been shown to be strongly correlated with metastasis-free prognosis and has been integrated with the National Comprehensive Cancer Network (NCCN) guidelines. However, Decipher is expensive and tissue destructive. Radiomic features refer to the high-throughput computational texture or shape features extracted from radiographic scans. Radiomic features derived from multi-parametric magnetic resonance imaging (mpMRI) of prostate cancer have been shown to be associated with clinically significant PCa. In this study, we sought to evaluate whether radiomic features derived from T2-weighted MRI (T2WI) and apparent diffusion coefficient (ADC) maps of the prostate could distinguish different Decipher risk groups (low, intermediate and high). We also explored correlations between Decipher risk associated radiomic features and features relating to gland morphology on corresponding digitized surgical specimens. A retrospectively acquired, de-identified cohort of 70 PCa patients (N = 74 lesions) who underwent 3T mpMRI prior to RP and Decipher tests after RP were used in this study. The Decipher risk score, ranging from 0 to 1, was used to categorize patients into low/intermediate (D₁) and high (D₂) risk groups. A multivariate logistic regression model was trained (N = 37 lesions) using radiomic features selected via elastic-net regularization to predict the Decipher risk groups. The model was evaluated on a hold-out test set (N = 37 lesions) and resulted in an area under the receiver operating characteristic curve (AUC) = 0:80. Our model outperformed the prediction using PIRADS v2 (AUC = 0:67), but showed comparable performance with Gleason Grade Group (GGG) (AUC = 0:80). We observed that the best discriminating radiomic features were correlated with gland morphology and gland packing on corresponding histopathology (R = 0.43, p < 0.05).

Towards deep radiomics: nodule malignancy prediction using CNNs on feature images

Rahul Paul, Dmitry Cherezov, Matthew B. Schabath, et al.

Show abstract

Lung cancer is a leading cause of cancer-related death worldwide and in the USA. Low Dose Computed tomography (LDCT) is the primary method of detection and diagnosis of lung cancers. Radiomics provides further analysis using LDCT scans which provide an opportunity for early detection and diagnosis of lung cancers. The convolutional neural network (CNN), a powerful method for image classification and recognition, has opened an alternative path for tumor identification and detection from LDCT scans. Nodules have different shapes, boundaries or patterns. In this study, we created feature images from different texture features of nodules and then used a CNN to classify each of the feature images. We call this approach “Deep Radiomics”. Law’s 3-D texture images were used for our analysis. Ten Law’s texture images were generated and used to train an ensemble of CNNs. Texture provides information about how an image looks. The use of feature images as CNN input is a novel approach to enable the generation and extraction of new types of features and lends itself to ensemble generation. From the LDCT arm of the national lung cancer screening study (NLST) dataset, a subset of nodule positive and screen-detected lung cancer (SDLC) cases were used in our study. The best result obtained from this study was 79.32% accuracy and 0.88 AUC, which is an improvement in accuracy over using just image features or just original images as input to CNNs for classification.

Radiomics analysis on T2-MR image to predict lymphovascular space invasion in cervical cancer

Shou Wang, Xi Chen, Zhenyu Liu, et al.

Show abstract

Lymphovascular space invasion (LVSI) is an important determinant for selecting treatment plan in cervical cancer (CC). For CC patients without LVSI, conization is recommended; otherwise, if LVSI is observed, hysterectomy and pelvic lymph node dissection are required. Despite the importance, current identification of LVSI can only be obtained by pathological examination through invasive biopsy or after surgery. In this study, we provided a non-invasive and preoperative method to identify LVSI by radiomics analysis on T2-magnetic resonance image (MRI), aiming at assisting personalized treatment planning. We enrolled 120 CC patients with T2 image and clinical information, and allocated them into a training set (n = 80) and a testing set (n= 40) according to the diagnostic time. Afterwards, 839 image features were extracted to reflect the intensity, shape, and high-dimensional texture information of CC. Among the 839 radiomic features, 3 features were identified to be discriminative by Least absolute shrinkage and selection operator (Lasso)-Logistic regression. Finally, we built a support vector machine (SVM) to predict LVSI status by the 3 radiomic features. In the independent testing set, the radiomics model achieved area under the receiver operating characteristic curve (AUC) of 0.7356, classification accuracy of 0.7287. The radiomics signature showed significant difference between non-LVSI and LVSI patients (p<0.05). Furthermore, we compared the radiomics model with clinical model that uses clinical information, and the radiomics model showed significant improvement than clinical factors (AUC=0.5967 in the validation cohort for clinical model).

Temporal mammographic registration for evaluation of architecture changes in cancer risk assessment

Kayla Mendel, Hui Li, Nabihah Tayob, et al.

Show abstract

While breast cancer screening recommendations vary by agency, all agencies recommend mammographic screening with some frequency over some portion of a woman’s lifetime. Temporal evaluation of these images may inform personalized risk of breast cancer. However, due to the highly deformable nature of breast tissue, the positioning of breast tissue may vary widely between exams. Therefore, registration of physical regions in the breast over time points is a critical first step in computerized analysis of changes in breast parenchyma over time. While a postregistration image is altered and therefore not appropriate for radiomic texture analysis, the registration process produces a mapping of points which may aid in aligning similar image regions across multiple time points. In this study, a total of 633 mammograms from 87 patients were retrospectively collected. These images were sorted into 1144 temporal pairs, where each combination of images of a given women of a given laterality was used to form a temporal pair. B-splines registration and multi-resolution registration were performed on each mammogram pair. While the B-splines took an average of 552.8 CPU seconds per registration, multi-resolution registration took only an average of 346.2 CPU seconds per registration. Multi-resolution registration had a 15% lower mean square error, which was significantly different than that of B-splines (p<0.001). While previous work aimed to allow radiologists to visually evaluate the registered images, this study identifies corresponding points on images for use in assessing interval change for risk assessment and early detection of cancer through deep learning and radiomics.

PI-RADS guided discovery radiomics for characterization of prostate lesions with diffusion-weighted MRI

Farzad Khalvati, Yucheng Zhang, Phuong H. U. Le, et al.

Show abstract

To demonstrate the added predictive value of radiomic features to prostate radiology scoring scheme (PIRADS), a systematic approach is required to determine whether there is indeed latent predictive information of prostate cancer in diffusion-weighted magnetic resonance images (DW-MRI) that cannot be captured by radiologists’ visual interpretations alone. In this work, we propose a PI-RADS guided discovery radiomics solution where a predictive model for prostate cancer is built by discovering radiomic features that capture information on the phenotype of lesions, which is not visible to radiologists when using PI-RADS scoring system. We investigated patients with PI-RADS scores indicating presence or absence of significant prostate cancer separately and ran experiments on patients with DW-MRI followed by targeted biopsy, using first and second order quantitative imaging features. Our experiments on DW-MRI and pathology data of 50 patients show that the proposed approach improves the overall accuracy of prostate cancer diagnosis significantly compared to PI-RADS scores alone.

Non-invasive transcriptomic classification of de novo Glioblastoma patients through multivariate quantitative analysis of baseline preoperative multimodal magnetic resonance imaging

Saima Rathore, Hamed Akbari, Spyridon Bakas, et al.

Show abstract

Glioblastoma, the most common primary malignant brain tumor, is genetically diverse and classified into four transcriptomic subtypes, i.e., classical, mesenchymal, proneural, neural. We sought a noninvasive robust quantitative imaging phenomic (QIP) signature associated with this transcriptomic classification of glioblastoma patients, derived from clinically-acquired imaging protocols and without the need for advanced genetic testing. This QIP signature was discovered and evaluated in a retrospective cohort of 112 pathology-proven de novo glioblastoma patients for whom basic preoperative multi-parametric MRI (mpMRI) data (T1, T1-Gd, T2, T2-FLAIR) were available, and compared with the tumor subtype as obtained through an RNA isoform-based classifier. Comprehensive and diverse QIP features capturing intensity distributions, volume, morphology, statistics, tumors’ anatomical location, and texture for each tumor sub-region, were multivariately integrated via support vector machines to construct our QIP signature. The performance/generalizability of the model was evaluated using 5-fold cross-validation. The overall accuracy of the proposed method was estimated equal to 71% for identifying the transcriptomic tumor subtype; 82.14% [AUC:0.82], 75.89% [AUC:0.78], 75.89% [AUC:0.81], and 88.39% [AUC:0.84] for predicting proneural, neural, mesenchymal and classical subtypes, respectively. The obtained QIP signature revealed a macroscopic biological insight of the complex tumor subtypes, including more pronounced presence of tissue with higher water content in neural subtype, larger enhancement component of the tumor in mesenchymal subtype, and overall smaller tumors in classical subtype. Our results indicate that quantitative analysis of imaging features extracted from clinically-acquired mpMRI yields prompt non-invasive biomarkers of the molecular profile of glioblastoma patients, important in influencing surgical decision-making, treatment planning, and assessment of inoperable tumors.

Radiomics analysis of MRI for predicting molecular subtypes of breast cancer in young women

Qinmei Li, James Dormer, Priyanka Daryani, et al.

Show abstract

Breast cancer in young women is commonly aggressive, in part because the proportion of high-grade, triple-negative (TN) tumor is too high. There are certain limitations in the detection of biopsies or surgical specimens which only select part of tumor sample tissue and ignore the possible heterogeneity of tumors. In clinical practice, MRI is used for the diagnosis of breast cancer. MRI-based radiomics is a developing approach that may provide not only the diagnostic value for breast cancer but also the predictive or prognostic associations between the images and biological characteristics. In this work, we used radiomics methods to analyze MR images of breast cancer in 53 young women, and correlated the radiomics data with molecular subtypes. The results indicated a significant difference between TN type and non-TN type of breast cancer in young women on the radiomics features based on T2-weighted MR images. This may be helpful for the identification of TN type and guiding the therapeutic strategies.

General purpose radiomics for multi-modal clinical research

Michael G. Wels, Félix Lades, Alexander Muehlberg, et al.

Show abstract

In this paper we present an integrated software solution^∗ targeting clinical researchers for discovering relevant radiomic biomarkers covering the entire value chain of clinical radiomics research. Its intention is to make this kind of research possible even for less experienced scientists. The solution provides means to create, collect, manage, and statistically analyze patient cohorts consisting of potentially multimodal 3D medical imaging data, associated volume of interest annotations, and radiomic features. Volumes of interest can be created by an extensive set of semi-automatic segmentation tools. Radiomic feature computation relies on the de facto standard library PyRadiomics and ensures comparability and reproducibility of carried out studies. Tabular cohort studies containing the radiomics of the volumes of interest can be managed directly within the software solution. The integrated statistical analysis capabilities introduce an additional layer of abstraction allowing non-experts to benefit from radiomics research as well. There are ready-to-use methods for clustering, uni- and multivariate statistics, and machine learning to be applied to the collected cohorts. They are validated in two case studies: for one thing, on a subset of the publicly available NSCLC-Radiomics data collection containing pretreatment CT scans of 317 non-small cell lung cancer (NSCLC) patients and for another, on the Lung Image Database Consortium imaging study with diagnostic and lung cancer screening CT scans including 2,753 distinct lesions from 870 patients. Integrated software solutions with optimized workflows like the one presented and further developments thereof may play an important role in making precision medicine come to life in clinical environments.

Quantitative MRI biomarker for treatment response assessment of multiple myeloma: robustness evaluation using independent test set of prospective cases

Chuan Zhou, Qian Dong, Heang-Ping Chan, et al.

Show abstract

We are developing quantitative radiomic biomarkers for treatment response assessment of multiple myeloma (MM) in MRI. In our previous study, we have developed a 3D dynamic intensity entropy transformation (DIET) method with MRI for deriving a radiomic biomarker for differentiating responders from non-responders in treatment of MM. DIET transforms MR signal voxel by voxel to a quantitative entropy enhancement value (qEEV), from which predictor variables are extracted and combined into a qEEV-based response index (qERI) for quantitative assessment of treatment response. We developed the qERI in a previous retrospective data set of 64 MRI cases. This study evaluated the performance of the qERI-based MRI biomarker using an independent test set of 15 MRI cases collected from an ongoing prospective study, in which 10 and 5 patients were clinically diagnosed as responders and non-responders. The results showed that the agreement between the qERI prediction and the clinical outcome reached 0.80 with a kappa value of 0.57. The area under the ROC curve (AUC) achieved 0.78.

Machine-learning-based classification of Glioblastoma using MRI-based radiomic features

Ge Cui, Jiwoong Jason Jeong, Yang Lei, et al.

Show abstract

Glioblastoma (GBM) is the most frequent and lethal primary brain cancer. Due to its therapeutic resistance and aggressiveness, clinical management is challenging. This study aims to develop a machine-learning-based classification method using radiomic features of multiparametric MRI to correctly identify high grade (HG) and low grade (LG) GBMs. Multiparametric MRI of 50 patients with GBM, 25 HG and 25 LG, were used. Each patient has T1, contrast-enhanced T1, T2 and FLAIR MRI, as well as provided tumor contours. These tumor contours were used to extract features from the multiparametric MRI. Once these features have been extracted, the most significant and informative features were selected to train random forests to differentiate HG and LG GBMs while varying feature correlation limits were applied to remove redundant features. Then leave-one-out cross-validation random forests were applied to the dataset to classify HG or LG GBMs. The prediction accuracy, receiver operating characteristic (ROC) curves, and area under the curve (AUC) were obtained at each correlation limit to evaluate the performance of our machine-learning-based classification. The best performing parameters predicted on an average, a prediction accuracy of 0.920 or 46 out of 50 patients, 22/25 for HGG and 24/25 for LGG, consistently with an AUC of 0.962. We investigated the process of distinguishing between HG GBM and LG GBM using multiparametric MRI, radiomic features, and machine learning. The result of our study shows that grade of GBM could be predicted accurately and consistently using the proposed machine-learning-based radiomics approach.

Prediction of low-grade glioma progression using MR imaging

Zeina A. Shboul, Khan M. Iftekharuddin

Show abstract

Diffuse or infiltrative gliomas are a type of Central Nervous System (CNS) brain tumor. Among different types of primary CNS tumors, diffuse low-grade gliomas (LGG) are World Health Organization (WHO) Grade II and III gliomas. This study investigates the prediction of LGG progression using imaging features extracted from conventional MRI. First, we extract the imaging features from raw MRI including intensity, and fractal and multiresolution fractal representations the of the MRI tumor volume. This study uses a total of 108 LGG patients that is divided into 75% of the patients for training and the remaining 25% of the patients for testing from a pre-operative TCGA-LGG data. LGG progression prediction training model is performed using nested Leave-one-out cross-validation (LOOCV) on the training set. Recursive feature selection (RFS) method and LGG progression model training are performed in the inner cross-validation loop. The LGG progression prediction model is trained using Extreme Gradient Boosting technique. The performance of LGG progression prediction model is estimated using the outer cross-validation loop. Finally, we assess the predictive performance of the LGG progression model using the testing set. The training and testing procedures are repeated 10 times using 10 different training and testing sets. Our LGG progression prediction model achieves an AUC of 0.81±0.03, a sensitivity of 0.81±0.09, and a specificity of 0.81±0.10. Our results show promise of using non-invasive MRI in predicting LGG progression.

Radiomics and deep learning of diffusion-weighted MRI in the diagnosis of breast cancer

Qiyuan Hu, Heather M. Whitney, Alexandra Edwards, et al.

Show abstract

The incorporation of diffusion-weighted imaging (DWI) in breast magnetic resonance imaging (MRI) has shown potential in improving the accuracy of breast cancer diagnosis. Since DWI measures possibly complementary biological properties to dynamic contrast-enhanced (DCE) MRI parameters, DWI computer-aided diagnosis (CADx) can potentially improve the performance of current CADx systems in distinguishing between benign and malignant breast lesions. This study was performed on a database of 397 diffusion-weighted breast MR images (69 benign and 328 malignant). Lesions were automatically segmented using a fuzzy C-means method. The apparent diffusion coefficient (ADC)-based radiomic features were extracted and used to train a classifier. Another classifier was trained on convolutional neural network (CNN)-based features extracted by a pre-trained VGG19 network. The outputs from these two classifiers were fused by averaging the posterior probability of malignancy for each case to construct a fusion classifier. The performance evaluation for the three proposed classifiers was performed with five-fold cross-validation. The area under the receiver operating characteristic curve (AUC) was 0.68 (se = 0.04) for the ADC-based classifier, 0.74 (se = 0.03) for the CNN-based classifier, and 0.76 (se = 0.03) for the fusion classifier. The fusion classifier performed significantly better than the ADC-based classifier (݌ = 0.013). The CNN-based classifier failed to show statistically significant performance difference from the ADC-based classifier or the fusion classifier. The findings demonstrate promising performance of the proposed classifiers and the potential for DWI CADx as well as for the development of multiparametric CADx that incorporates information from both DWI and DCE-MRI in breast lesion classification.

Acral melanocytic lesion segmentation with a convolution neural network (U-Net)

Joanna Jaworek-Korjakowska

Show abstract

Melanocytic lesions of acral sites (ALM) are common, with an estimated prevalence of 28 - 36% in the USA. While the majority of these lesions are benign, differentiation from acral melanoma (AM) is often challenging. Much research has been done in segmenting and classifying skin moles located in acral volar areas. However, methods published to date cannot be easily extended to new skin regions because of different appearance and properties. In this paper, we propose a deep learning (U-Net) architecture to segment acral melonacytic lesions which is a necessary initial step for skin lesion pattern recognition, furthermore it is a prerequisite step to provide an accurate classification and diagnosis. The U-Net is one of the most promising deep learning solution for image segmentation and is built upon fully convolutional network. On the independent validation dataset including 210 dermoscopy images our implemented method showed high segmentation accuracy. For the U-Net convolutional neural network, an average DSC of 0.92, accuracy 0.94, sensitivity 0.91, and specificity 0.92 has been achieved. ALM due to small size and similarity to other local structures create enormous difficulties during the segmentation and assessment process. The use of advanced segmentation methods like deep learning models especially convolutional neural networks have the potential to improve the accuracy of advanced medical area segmentation.

Dose distribution as outcome predictor for Gamma Knife radiosurgery on vestibular schwannoma

P. P. J. H. Langenhuizen, H. van Gorp, S. Zinger, et al.

Show abstract

Vestibular schwannomas are benign brain tumors that can be treated radiosurgically with the Gamma Knife in order to stop tumor progression. However, in some cases tumor progression is not stopped and treatment is deemed a failure. At present, the reason for these failed treatments is unknown. Clinical factors and MRI characteristics have been considered as prognostic factors. Another confounder in the success of treatment is the treatment planning itself. It is thought to be very uniformly planned, even though dose distributions among treatment plans are highly inhomogeneous. This paper explores the predictive value of these dose distributions for the treatment outcome. We compute homogeneity indices (HI) and three-dimensional histogram-of-oriented gradients (3D-HOG) and employ support vector machine (SVM) paired with principal component analysis (PCA) for classification. In a clinical dataset, consisting of 20 tumors that showed treatment failure and 20 tumors showing treatment success, we discover that the correlation of the HI values with the treatment outcome presents no statistical evidence of an association (52:5% accuracy employing linear SVM and no statistical significant difference with t-tests), whereas the 3D-HOG features concerning the dose distribution do present correlations to the treatment outcome, suggesting the influence of the treatment on the outcome itself (77:5% accuracy employing linear SVM and PCA). These findings can provide a basis for refining towards personalized treatments and prediction of treatment efficiency. However, larger datasets are needed for more extensive analysis.

Learning-based automatic segmentation on arteriovenous malformations from contract-enhanced CT images

Tonghe Wang, Yang Lei, Ghazal Shafai-Erfani, et al.

Show abstract

We propose a learning-based method to automatically segment arteriovenous malformations (AVM) target volume from computed tomography (CT) in stereotactic radiosurgery (SRS). A deeply supervised 3D V-Net is introduced to enable end-to-end segmentation. Deep supervision mechanism is integrated into the hidden layers to overcome the optimization difficulties when training such a network with limited training data. The probability map of new AVM contour is generated by the well-trained network. To evaluate the proposed method, we retrospectively investigate 30 AVM patients treated by SRS. For each patient, both digital subtraction angiography (DSA) and CT with contrast had been acquired. Using our proposed method, the AVM contours are generated solely based on contrast CT images, and are compared with the AVM contours delineated from DSA by physicians as ground truth. The average centroid distance, volume difference and DSC value among all 30 patients are 0.83±0.91mm, -0.01±0.79 and 0.84±0.09, which indicates that the proposed method is able to generate AVM target contour with around 1mm error in displacement, 1cc error in volume size and 84% overlapping compared with ground truth. The proposed method has great potential in eliminating DSA acquisition and developing a solely CT-based treatment planning workflow for AVM SRS treatment.

Use of a convolutional neural network for aneurysm identification in digital subtraction angiography

Alexander R. Podgoršak, Mohammad Mahdi Bhurwani, Ryan A. Rava, et al.

Show abstract

Angiographic Parametric Imaging (API) is a quantitative image analysis method that uses a digital subtraction angiography (DSA) to characterize contrast media dynamics throughout vasculature. The parameters acquired through API may be used to assess the success of a neurovascular intervention such as the stenting or coiling of an aneurysm. This imaging tool requires manual contouring of the aneurysm sac and the surrounding vasculature, which is not realistic in an interventional suite. To address this challenge, we studied whether convolutional neural networks can carry out a three-class segmentation problem differentiating between the background, vasculature, and aneurysm sac in a DSA acquisition. Image data were retrospectively collected from patients being monitored or treated for cerebral aneurysms at Gates Vascular Institute. While VGG-16 and U-NET architecture were both investigated, a modified VGG architecture was developed and used. Network training was carried out over 100 epochs. Our training dataset comprised of 12000 DSA acquisitions. Our validation dataset comprised of 2000 DSA acquisitions. The Jaccard Index was above 0.74 for both classes. The Dice similarity coefficient was above 0.83 for both classes. Area under the ROC curve was above 0.72 for both classes. These results indicate good agreement between the ground-truth labels and the network predicted labels. Our network proved not sensitive to motion artifacts or the presence of skull in the image data. This work indicates the potential clinical utility of a convolutional neural network in the context of aneurysm detection in DSA for feature extraction using parametric imaging to support a clinical decision.

U-Net based automatic carotid plaque segmentation from 3D ultrasound images

Ran Zhou, Wei Ma, Aaron Fenster, et al.

Show abstract

Ultrasound image assessment plays an important role in the diagnosis of carotid artery atherosclerosis. The segmentation of plaques from carotid artery ultrasound images is critical for the atherosclerotic diagnosis. In this paper, a novel automatic plaque segmentation method is presented based on U-Net deep learning network which allows to train the network end-to-end for pixel-wise classification. A large number of labeled examples are required for traditional supervised learning techniques as to obtain the global optimization. However, in this task, it is unavailable to obtain so many labeled examples since manually segmentation of plaques is a time-consuming task and its reliability relies to the experience of experts. In order to solve the problem of lack of labeled samples, an unsupervised learning technique, the deep convolutional encoder-decoder architecture, was proposed to pre-train the parameters of U-Net by amount of unlabeled data. Then the parameters learned from the deep convolutional encoder-decoder network were applied to initialize a U-Net from the labeled images for fine-tuning. Algorithm accuracy was examined on the common carotid artery part of 26 3D carotid ultrasound images (34 plaques) by comparing the results of our algorithm with manual segmentations and the Dice similarity coefficient (DSC) is 90.72±6.2% which was better than the previous level set method with the DSC of 88.2±8.3%. The automatic method provides a more convenient way to segment carotid plaques in 3D ultrasound images.

Machine learning for segmenting cells in corneal endothelium images

Chaitanya Kolluru, Beth A. Benetz, Naomi Joseph, et al.

Show abstract

Images of the endothelial cell layer of the cornea can be used to evaluate corneal health. Quantitative biomarkers extracted from these images such as cell density, coefficient of variation of cell area, and cell hexagonality are commonly used to evaluate the status of the endothelium. Currently, fully-automated endothelial image analysis systems in use often give inaccurate results, while semi-automated methods, requiring trained image analysis readers to identify cells manually, are both challenging and time-consuming. We are investigating two deep learning methods to automatically segment cells in such images. We compare the performance of two deep neural networks, namely U-Net and SegNet. To train and test the classifiers, a dataset of 130 images was collected, with expert reader annotated cell borders in each image. We applied standard training and testing techniques to evaluate pixel-wise segmentation performance, and report corresponding metrics such as the Dice and Jaccard coefficients. Visual evaluation of results showed that most pixel-wise errors in the U-Net were rather non-consequential. Results from the U-Net approach are being applied to create endothelial cell segmentations and quantify important morphological measurements for evaluating cornea health.

Classifying abnormalities in computed tomography radiology reports with rule-based and natural language processing models

Songyue Han, James Tian, Mark Kelly, et al.

Show abstract

Purpose: When conducting machine learning algorithms on classification and detection of abnormalities for medical imaging, many researchers are faced with the problem that it is hard to get enough labeled data. This is especially difficult for modalities such as computed tomography (CT) with potentially 1000 or more slice images per case. To solve this problem, we plan to use machine learning algorithms to identify abnormalities within existing radiologist reports, thus creating case-level labels that may be used for weakly supervised training on the image data. We used a two-stage procedure to label the CT reports. In the first stage, a rule-based system labeled a smaller set of cases automatically with high accuracy. In the second stage, we developed machine learing algorithms using the labels from the rule-based system and word vectors learned without supervision from unlabeled CT reports. Method: In this study, we used approximately 24,000 CT reports from Duke University Health System. We initially focused on three organs, the lungs, liver/gallbladder, and kidneys. We first developed a rule-based system that can quickly identify certain types of abnormalities within CT reports with high accuracy. For each organ and disease combination, we produced several hundred cases with rule-based labels. These labels were combined with word vectors generated using word2vec from all the unlabeled reports to train two different machine learning algorithms: (a) average of word vectors merged by logistic regression, and (b) recurrent neural network (RNN). Result: Performance was evaluated by receiver operating characteristic (ROC) area under the curve (AUC) over an independent test set of 440 reports for which those organs were manually labeled as normal or abnormal by clinical experts. For lungs, the performance was 0.796 for average word vector and 0.827 for RNN. Liver performance was 0.683 for average word vector and 0.791 for RNN. For kidneys, it was 0.786 for average word vector and 0.928 for RNN. Conclusion: It is possible to label large numbers of cases automatically. These rule-based labels can then be used to build a classification model for large numbers of medical reports. With word2vec and other transfer learning techniques, we can get a good generalization performance.

Patient-specific outcome simulation after surgical correction of Pectus Excavatum: a preliminary study

Mafalda Couto, João Gomes-Fonseca, António H. J. Moreira, et al.

Show abstract

Although minimally invasive Nuss procedure is frequently performed to correct Pectus Excavatum, the successful aesthetical outcome is not always ensured. Using the computed tomography (CT) data of six patients, high-quality surfaces of the anterior chest wall were generated, alongside with a personalized corrective-bar. Through finite element method (FEM), replicating the surgical procedure, a simulation of the anterior chest wall correction was conducted. The assessment of this methodology was verified by comparing the metrics from the real meshes (3D scanned before and after surgery) and simulated meshes (obtained before and after FEM). Results show a mean difference of 2.85 ± 5.77 mm on the point of maximum correction between simulated and real outcomes. No statistical differences were found (p = 0.281). High aesthetical similarity was observed concerning simulated and real outcomes. The proposed methodology presents a patient-specific simulation that may be used to plan, predict and improve the surgical outcome of the Nuss procedure. Further studies should continue to improve the presented methodology

Medical Imaging 2019: Computer-Aided Diagnosis

Volume Details

Table of Contents

Table of Contents