Deep learning based automatic detection algorithm for acute intracranial haemorrhage: a pivotal randomized clinical trial

Yun, Tae Jin; Choi, Jin Wook; Han, Miran; Jung, Woo Sang; Choi, Seung Hong; Yoo, Roh-Eul; Hwang, In Pyeong

doi:10.1038/s41746-023-00798-8

Download PDF

Article
Open access
Published: 07 April 2023

Deep learning based automatic detection algorithm for acute intracranial haemorrhage: a pivotal randomized clinical trial

Tae Jin Yun ORCID: orcid.org/0000-0001-8441-4574^1,2,
Jin Wook Choi ORCID: orcid.org/0000-0002-2396-4705³,
Miran Han³,
Woo Sang Jung³,
Seung Hong Choi^1,2,
Roh-Eul Yoo^1,2 &
…
In Pyeong Hwang^1,2

npj Digital Medicine volume 6, Article number: 61 (2023) Cite this article

7856 Accesses
11 Altmetric
Metrics details

Subjects

Abstract

Acute intracranial haemorrhage (AIH) is a potentially life-threatening emergency that requires prompt and accurate assessment and management. This study aims to develop and validate an artificial intelligence (AI) algorithm for diagnosing AIH using brain-computed tomography (CT) images. A retrospective, multi-reader, pivotal, crossover, randomised study was performed to validate the performance of an AI algorithm was trained using 104,666 slices from 3010 patients. Brain CT images (12,663 slices from 296 patients) were evaluated by nine reviewers belonging to one of the three subgroups (non-radiologist physicians, n = 3; board-certified radiologists, n = 3; and neuroradiologists, n = 3) with and without the aid of our AI algorithm. Sensitivity, specificity, and accuracy were compared between AI-unassisted and AI-assisted interpretations using the chi-square test. Brain CT interpretation with AI assistance results in significantly higher diagnostic accuracy than that without AI assistance (0.9703 vs. 0.9471, p < 0.0001, patient-wise). Among the three subgroups of reviewers, non-radiologist physicians demonstrate the greatest improvement in diagnostic accuracy for brain CT interpretation with AI assistance compared to that without AI assistance. For board-certified radiologists, the diagnostic accuracy for brain CT interpretation is significantly higher with AI assistance than without AI assistance. For neuroradiologists, although brain CT interpretation with AI assistance results in a trend for higher diagnostic accuracy compared to that without AI assistance, the difference does not reach statistical significance. For the detection of AIH, brain CT interpretation with AI assistance results in better diagnostic performance than that without AI assistance, with the most significant improvement observed for non-radiologist physicians.

Impact of a deep learning-based brain CT interpretation algorithm on clinical decision-making for intracranial hemorrhage in the emergency department

Article Open access 27 September 2024

Uncertainty-aware deep-learning model for prediction of supratentorial hematoma expansion from admission non-contrast head computed tomography scan

Article Open access 06 February 2024

A comparison of performance between a deep learning model with residents for localization and classification of intracranial hemorrhage

Article Open access 20 June 2023

Introduction

Acute intracranial haemorrhage (AIH) is a life-threatening disease with a 30-day mortality rate ranging from 35% to 52%. Most notably, only 20% of survivors are expected to achieve full functional recovery at 6 months^1,2,3. Magnetic resonance imaging (MRI) scans may be as accurate as CT scans with regard to the detection of AIH in patients presenting with acute focal stroke symptoms⁴ and are more accurate than CT scans in terms of detecting microhaemorrhage. Nevertheless, non-contrast brain CT scans are the most widely used first-line diagnostic approach for identifying AIH due to the several disadvantages of MRI scans, including their limited availability, long image acquisition times, high cost, and issues with patient tolerance^5,6

Despite the clinical relevance of diagnosing AIH using brain CT scans—false negatives may delay correct diagnosis, which can cause devastating consequences, whereas false positives will lead to unnecessary examinations—prompt and accurate assessment of AIH using brain CT scans remains a challenge for physicians. In addition, the high volumes of imaging data that require assessment place a significant burden on radiologists who need to maintain diagnostic accuracy and efficiency^7,8.

Over the past decade, deep learning-based artificial intelligence (AI) technology has made significant advances with improvements in computer power and accumulation of ‘big data’. Advances in deep learning-based image recognition, as a part of machine learning, are transforming the medical field and have the potential to further improve the processes in the medical imaging domain⁹. These innovations may increase diagnostic accuracy, enable prompt diagnosis and improved management of various conditions, and facilitate new biological insights. Various AI algorithms for AIH diagnosis have been developed and shown promising results in the detection, classification, quantification, and prediction of AIH using brain CT scans^{7,8,10,11,12,13,14,15}.

Previous studies employing deep learning architectures have predominantly used haemorrhage detection methods based on labelling or segmentation by experts^{7,8,10,11,13,15,16,17}. However, the classification of AIH is contingent on the opinion of experts, and the training of the system depends on the labelling of AIH-suspected areas by experts. As such, discordance between experts regarding the final classifications or labelling of images is inevitable. In addition, poorly defined characteristics, variability in sizes and morphologies, and the attenuation of AIH contribute to inter-observer discordance even between expert neuroradiologists. In this regard, an anomaly detection process based on unsupervised training alongside a haemorrhage detection process can overcome the drawbacks of the supervised haemorrhage detection process used in conventional AI algorithms for intracranial haemorrhage detection, leading to an improvement in diagnostic performance^{18,19,20,21,22}. In terms of deep learning architectures used for haemorrhage detection, the majority of previous investigations have relied on convolutional neural network (CNN)-based AI algorithms that have been reported to classify and quantify intracranial haemorrhages with good diagnostic performance^{11,13,23,24,25,26}. Recent studies have proposed new deep learning architectures based on a joint recurrent neural network (CNN-RNN) approach with promising results, highlighting its potential for assisting radiologists and physicians in their clinical diagnosis workflow^15,27.

Although the excellent performance of deep learning-based AI algorithms has been proven in the internal validation cohort, achieving persistent favourable results without performance decline in the external validation dataset consisting of a diverse patient population and scanner remains challenging^28,29.

In this study, we developed a deep learning-based automatic detection AI algorithm for identifying AIH on brain CT scans based on a new approach that combined haemorrhage detection (based on a joint CNN-RNN system) and anomaly detection (based on unsupervised training) using a large dataset. We evaluate the diagnostic performance of this AI algorithm in a large external validation dataset to validate our approach and also conduct a retrospective multi-reader study to validate the improvement in the diagnostic performance with the assistance of our AI algorithm by clinicians of varying expertise levels.

Results

Diagnostic performance of the AI-based diagnostic support software in the external validation dataset

The overall AUROC for AI performance in the external validation dataset was 0.992 and 0.977 for patient-wise and slice-wise analyses, respectively. The patient and slice-wise analyses indicated a sensitivity of 94.4% and 79.0% and a specificity of 98.2% and 99.3%, respectively. Details regarding the results for external validation are presented in Table 1 and Supplementary Tables 1–3.

Table 1 Diagnostic performance of AI in the external validation set (full analysis set: 49,841 patients, 1,855,465 slices).

Full size table

Evaluation of the diagnostic performance of the AI-based diagnostic support software

The overall AUROC for AI standalone performance in the dataset for the reader assessment study was 0.9874 and 0.9671 for patient-wise and slice-wise analyses, respectively (Figs. 3 and 4). For patient-wise analysis, the best diagnostic performance was achieved with a cut-off level of 39.84%, sensitivity of 95.89%, and specificity of 95.33%. For slice-wise analysis, the best diagnostic performance was achieved with a cut-off level of 7.70%, sensitivity of 89.87%, and specificity of 91.60%. At a cut-off level of 50.0%, the sensitivity and specificity were 93.84% and 97.33%, respectively, in the patient-wise analysis and 67.26% and 99.60%, respectively, in the slice-wise analysis (Figs. 1 and 2).

**Fig. 1: Diagnostic performance of reviewers in terms of basic ROC curves for patient-wise AI standalone performance.**

**Fig. 2: Diagnostic performance of reviewers in terms of basic ROC curves for slice-wise AI standalone performance.**

Reader assessment study

In the reader assessment study, the AI-assisted group exhibited a significantly higher diagnostic accuracy in detecting AIH than the AI-unassisted group for both patient-wise (0.9703 vs. 0.9471, p < 0.0001) and slice-wise analyses (0.9581 vs. 0.9522, p < 0.0001). Compared with the AI-unassisted group, the AI-assisted group achieved significantly higher sensitivity (0.9718 vs. 0.9437, p = 0.0003 for patient-wise analysis and 0.8469 vs. 0.8299, p < 0.0001 for slice-wise analysis) and specificity (0.9689 vs. 0.9504, p = 0.0145 for patient-wise analysis and 0.9855 vs. 0.9824, p < 0.0001 for slice-wise analysis) (Tables 2 and 3, Figs. 1 and 2).

Table 2 Accuracy, sensitivity, and specificity of each subgroup after conducting AI-assisted or AI-unassisted evaluation in patient-wise analysis.

Full size table

Table 3 Accuracy, sensitivity, and specificity of each subgroup after conducting AI-assisted or AI-unassisted evaluation in slice-wise analysis.

Full size table

Among the three subgroups of reviewers, the non-radiologist physicians demonstrated the greatest improvement in diagnostic accuracy with the use of AI assistance compared with that without AI assistance (0.9505 vs. 0.9189, with a difference of 3.15%, p = 0.0072 for patient-wise analysis and 0.9393 vs. 0.9306, with a difference of 0.87%, p < 0.0001 for slice-wise analysis). For the board-certified radiologists, AIH detection with AI assistance resulted in a significantly higher diagnostic accuracy compared with that without AI assistance (0.9741 vs. 0.9459, with a difference of 2.82%, p = 0.0025 for patient-wise analysis and 0.9632 vs. 0.9567, with a difference of 0.75%, p < 0.0001 for slice-wise analysis). For neuroradiologists, although AIH detection with AI assistance exhibited a trend for higher diagnostic accuracy compared with that without AI assistance, this did not reach statistical significance (0.9865 vs. 0.9764, with a difference of 1.01%, p = 0.1138 for patient-wise analysis and 0.9706 vs. 0.9691, with a difference of 0.15%, p = 0.2345 for slice-wise analysis) (Tables 2 and 3, Figs. 1 and 2). The diagnostic performance of the reviewers with basic ROC curves for AI standalone performance based on patient- and slice-wise analyses are presented in Figs. 1 and 2, respectively.

GEE analysis revealed that AI assistance resulted in a significant increase in accuracy in both patient- (3.67 for the AI-assisted group and 3.01 for the AI-unassisted group, with a difference of 0.66, p = 0.0075) and slice-wise analyses (3.21 for the AI-assisted group and 3.03 for the AI-unassisted group, with a difference of 0.18, p < 0.0001). Sensitivity increased significantly in both patient- (4.24 for the AI-assisted group and 2.89 for the AI-unassisted group, with a difference of 1.35, p = 0.017) and slice-wise analyses (1.75 for the AI-assisted group and 1.69 for the AI-unassisted group, with a difference of 0.05, p = 0.3273). Specificity also increased significantly in both patient- (3.81 for the AI-assisted group and 3.17 for the AI-unassisted group, with a difference of 0.364, p = 0.0376) and slice-wise analyses (4.56 for the AI-assisted group and 4.15 for the AI-unassisted group, with a difference of 0.41, p < 0.0001) (Supplementary Tables 4–7).

The ICC indicated that the AI-assisted and AI-unassisted groups demonstrated excellent (0.9193) and good (0.8475) reliability, respectively. Representative images of AIH detection from brain CT images are presented in Fig. 3 and Supplementary Fig. 1.

**Fig. 3: Representative images of AIH detection.**

Discussion

In the present study, we reported a new AI algorithm that uses a combination of supervised training for haemorrhage detection and unsupervised training for anomaly detection. In addition, we applied a joint CNN-RNN architecture for haemorrhage detection. Our AI algorithm achieved high accuracy for standalone AI detection, and its use in AI-assisted interpretation resulted in superior diagnostic performance in detecting AIH relative to interpretation without AI assistance.

With respect to the AUROC values, the performance of the standalone AI algorithm in the external validation study (0.992 and 0.977 in patient- and slice-wise analyses, respectively) and reader assessment study (0.9874 and 0.9671 in patient- and slice-wise analyses, respectively) were comparable with the performance of the neuroradiologist subgroup without AI assistance (0.9764 and 0.9691 in patient- and slice-wise analyses, respectively). These diagnostic accuracies were higher than those reported by the majority of previous studies^{7,8,10,11,13,15} and were comparable with the results achieved in a previous study (AUROC = 0.991), which reported that AI standalone performance was comparable with that of highly trained experts¹³. Furthermore, in the present study, the high sensitivity of 95.89% and specificity of 95.33% achieved by our approach at a cut-off level of 39.84% in the patient-wise analysis was higher than those achieved by reviewers without AI assistance (94.37% and 95.04%, respectively). The promising results achieved by our AI algorithm highlight its potential for the accurate detection of AIH on brain CT images.

In the reader assessment study, which employed a retrospective, multi-reader, pivotal, crossover, randomised study design, the AI-assisted group demonstrated a significantly higher diagnostic accuracy in detecting AIH than the AI-unassisted group. In addition, the superior performance of the AI-assisted group in terms of diagnostic accuracy was validated using GEE analysis. To the best of our knowledge, the beneficial effects of AI assistance in reader interpretation for the detection of AIH on brain CT images have not been previously reported. The promising findings in this study support the practical relevance of using AI in clinical settings to improve patient care. Notably, with the aid of our AI algorithm, the diagnostic performance of non-radiologist physicians reached the level for radiologists and the diagnostic performance of radiologists reached the level for neuroradiologists for the detection of AIH on brain CT images. We believe that our AI algorithm may play a key role as a reliable assistant in real-world clinical practice where prompt aid by expert radiologists or neuroradiologists may be unavailable. In addition, our AI algorithm may partly relieve the burden of radiologists and neuroradiologists who encounter large volumes of CT images that require interpretation with high diagnostic accuracy and efficiency in a timely manner. The significant improvement in sensitivity observed in this study implies that the present AI algorithm may reduce the occurrence of false negatives in which AIH may be erroneously excluded, thereby enabling prompt management that is critical for patients with AIH.

It is interesting to note that the difference between AI-assisted and AI-unassisted sensitivities shows a lower value for the slice-wise manner (1.70%) than that for the patient-wise manner (2.82%), and the improvement in terms of sensitivity for non-radiologist physicians in the patient-wise manner failed to achieve the statistical significance in the slice-wise manner (Tables 2 and 3). In addition, according to the GEE analysis, achievement of statistically significant superiority failed only in the analysis of sensitivity in slice-wise manner (Supplementary Table 6). The low sensitivity of AI standalone in the slice-wise manner (89.87%) compared with that in the patient-wise manner (95.89%) might make a consistent positive effect on the decision a challenge. In addition, the decrease in terms of the positive role might affect the non-radiology physicians group at a greater intensity. However, the statistically significant improvement of sensitivity in the neuroradiologists group only in the slice-wise manner remains a challenge that needs to be explained.

Although specificity was significantly improved in the AI-assisted group for all readers, we did not observe a statistically significant improvement in specificity for each group. This suggests that the ability of the present AI algorithm to reduce false positives may be limited and that our AI algorithm is more suitable as a supportive tool rather than an alternative method for the detection of AIH on brain CT images.

In the present study, we describe the development of a new AI algorithm, which combines haemorrhage detection and anomaly detection processes, with the aim of improving diagnostic performance for the identification of AIH on brain CT images. The majority of previous AI algorithms used to analyse medical imaging, including those designed for intracranial haemorrhage detection, have been developed using supervised labelling of training images to facilitate the biomarker detection process^{7,8,10,11,13,15,16,17}. Although training using expert-labelled images has produced promising results^27,30,31, discordance in labelled areas between experts is unavoidable. In addition, poorly defined characteristics, variation in sizes and morphologies, and the attenuation of AIH contribute to inter-observer discordance that may occur even between expert neuroradiologists. Anomaly detection is the process of identifying abnormal areas based on unsupervised training using normal data^21,22,32. The application of anomaly detection based on unsupervised training using normal brain CT images may overcome the drawbacks of conventional AI algorithms for AIH detection that rely on supervised training. In the present study, the combination of haemorrhage detection and anomaly detection based on a relatively large dataset may have contributed to the superior performance demonstrated by the current AI algorithm.

To overcome the aforementioned issues and improve diagnostic performance, we used a combined CNN-RNN in our AI algorithm. With regard to deep learning architectures, previous studies have predominantly used algorithms based on 2D or 3D CNNs^{11,13,23,24,25,26}. However, brain CT images consist of a series of 2D images that contain information about actual 3D structures. Therefore, in the present study, we designed an architecture that was more appropriate for processing 3D data and additionally applied an RNN module to the more common CNN module. The additional use of this RNN facilitated more accurate patient-wise AIH probability scores and improved diagnostic performance at both patient- and slice-wise levels.

Further work is warranted to address the utility of this AI algorithm from a clinical perspective, including investigations on related morbidity or mortality. In the present study, we addressed the diagnostic accuracy of the present AI algorithm in the detection of AIH on brain CT images; however, the critical characteristics of AIH evolution that are associated with clinical outcomes, including haemorrhage volume and expansion, require assessment with follow-up imaging to gain a full understanding of the diagnostic accuracy of our approach. As such, further investigations regarding the clinical utility of the present AI algorithm in patients with critical AIH for which clinical outcomes are available will clarify its potential role in diagnosing and managing this condition. In addition, the reading environment in this experimental study did not replicate that of daily practice, especially with regard to the use of clinical information. In clinical settings, patient information, including the chief complaints, symptoms, physical examination results, and past medical history contributed to superior diagnostic performance of the physicians. Therefore, the direct application of the present AI algorithm based on its excellent diagnostic performance in this experimental study may be premature. In addition, the classification of AIH by the gold-standard review board in this study may be a limitation. Determining the gold standard for AIH is challenging, particularly when the amount of haemorrhage is subtle such that no management is indicated, and further diagnostic steps, such as a lumbar puncture, would not be routinely considered, and may even be inaccurate. The ground truth may not be knowable in such cases in routine clinical medicine. To minimise the natural drawback in the diagnosis of AIH, in the present study, the gold standard for AIH classification was based on the interpretation of the gold-standard review board comprising three neuroradiologists with at least 11 years of relevant experience as radiologists, including at least 7 years of experience as neuroradiologists. However, achieving complete agreement between the two primary neuroradiologists was challenging. In the present study, the weighted kappa value for the inter-rater agreement between the experienced neuroradiologists was 0.9865, and two cases that were initially included in the AIH group were reclassified to the normal (without AIH) group. Although our approach to achieve a gold standard diagnosis was reasonable, there may be limitations in terms of the appropriateness of our method for identifying the gold standard used for validation of the AI algorithm, which achieved a diagnostic accuracy of up to 0.9874 according to these decisions. Finally, demographic traits of the included cases and retrospective design of the study that allows for possible selection bias are additional limitations.

In conclusion, we developed a deep learning-based AI algorithm for automatic AIH detection on brain CT images based on a combination of a haemorrhage detection process, which employed a combined CNN-RNN architecture, and an anomaly detection process, which used unsupervised training. The diagnostic performance of the AI algorithm was validated in a large external validation dataset. Additionally, the improvement in diagnostic performance with AI assistance versus that without AI assistance was also validated in this retrospective multi-reader study.

Methods

Study design

We developed and validated a deep learning-based AI algorithm (Medical Insight+ Brain Hemorrhage, SK Inc. C&C, Seongnam, Republic of Korea) for automatic AIH detection on brain CT scans. This study was approved by the institutional review boards of the participating institutions (H-2007-061-1140, Seoul National University Hospital Institutional Review Board [institution A] and AJIRB-DEV-DE3-20-379, Ajou University Medical Center Institutional Review Board [institution B]), and the requirement for informed consent was waived owing to the retrospective nature of this study.

Development dataset

To develop the AI algorithm for use with our diagnostic support software, 104,666 slices (28,351 [27.1%] with AIH and 76,315 [72.9%] without AIH) from 3010 patients (2010 [66.8%] with AIH and 1000 [33.2%] without AIH) from two institutions (Seoul National University Hospital [institution A] and Ajou University Medical Center [institution B]) were used for model development. Data were collected from patients in institutions A and B between April 2009 and December 2015 and between April 2004 and April 2020, respectively. AIH at the underlying pathology (including intratumoural haemorrhage and haemorrhagic transformation at the site of acute ischaemic stroke) as well as solitary AIH were also enrolled in the AIH group. Most of the development dataset (2632 among total 3010 patients [87 4%]) had a slice thickness of 5 mm (2 5 mm [n = 3], 3.0 mm [n = 104], 3.75 mm [n = 1], 4.0 mm [n = 40], 4.5 mm [n = 209], 4.8 mm [n = 12], 5.3125 mm [n = 1], 6.0 mm [n = 4], and 7.0 mm [n = 4]).

External validation dataset

For the external validation of the diagnostic performance of the AI algorithm, 1,855,465 slices (73,467 [4 0%] with AIH and 1,781,998 [96.0%] without AIH) from 49,841 patients (6442 [12.9%] with AIH and 43,399 [87.1%] without AIH) in the AI hub under the direction of the Korean National Information Society Agency (https://aihub.or.kr/aidata/34101) were used. This dataset was collected from six medical institutions in Korea in 2020 as a big data collection project on cerebrovascular disease, and the hospitals contributing to the data collection for the AI hub are different from the hospitals from which the development dataset was collected. The decision regarding whether all 1,855,465 slices from 49,841 patients were either AIH or normal was made based on the image interpretation by the neuroradiologists at each institution. A total of 6442 CT images showed AIH, including 2424 cases of subarachnoid haemorrhage, 2738 cases of subdural haemorrhage, 371 cases of epidural haemorrhage, 1266 cases of intraventricular haemorrhage, and 3367 cases of intraparenchymal haemorrhage (note: overlapping subtypes were possible). A total of 73,467 slices exhibited AIH, including 32,751 cases of subarachnoid haemorrhage, 39,604 cases of subdural haemorrhage, 4567 cases of epidural haemorrhage, 18,220 cases of intraventricular haemorrhage, and 35,669 cases of intraparenchymal haemorrhage (note: overlapping subtypes were possible). A summary of the patient and scanner information regarding the external validation is presented in Supplementary Tables 8 and 9.

Reader study dataset

A dataset temporally separated from the development dataset was obtained for reader assessment. A total of 12,663 brain (2508 AIH [19 8%] and 10,155 normal [81.2%]) from 296 patients (146 AIH [49 3%] and 150 normal [51.7%]) CT slices were obtained from two institutions (Seoul National University Hospital [institution A] and Ajou University Medical Center [institution B]). Data were collected from patients in institutions A and B between January 2016 and December 2019 and between April 2004 and April 2020, respectively. Patients enrolled in the development dataset were not enrolled in the reader study dataset.

All 296 complete CT images that satisfied the criteria for image quality modified from previously reported criteria were enrolled as the dataset for the reader assessment study (Supplementary Table 10)^33,34. The number of required CT images was calculated using the power estimation method with the significance level set to 5% and the power to 90%, which was based on a sensitivity of 88.6% as reported previously²⁷ and a sensitivity of 98.5% from internal validation of the present AI algorithm. This resulted in a total of 148 CT images for each group while accounting for a 15% dropout rate. In addition, based on a specificity of 88.6% reported in a previous study²⁷ and a specificity of 96.0% from internal validation of the present AI algorithm, 114 CT images for each group were obtained while accounting for a 15% dropout rate.

The gold standard for interpretation of all 12,663 slices from 296 CT images as either AIH or normal was achieved via careful consensus of a gold-standard review board comprising three neuroradiologists with at least 11 and 7 years of experience as radiologists and neuroradiologists, respectively. For CT interpretation, two radiologists independently interpreted the presence or absence of AIH in both a patient-wise and slice-wise manner. A third neuroradiologist reviewed the cases for which there was a disagreement between the two initial neuroradiologists to make a final decision. The weighted kappa value of the inter-rater agreement between the initial independent interpretations by the experienced neuroradiologists was 0.9865 [95% CI: 0.9732, 0.9997] for patient-wise analysis and was based on the interpretations of the gold-standard review board. Two cases that had initially been categorised in the AIH group according to medical records were reclassified to the normal group. In total, 146 CT images exhibited AIH, including 101 cases of subarachnoid haemorrhage, 72 cases of subdural haemorrhage, 20 cases of epidural haemorrhage, 40 cases of intraventricular haemorrhage, and 66 cases of intraparenchymal haemorrhage (note: overlapping subtypes were possible). A total of 2508 slices exhibited AIH, including 1408 cases of subarachnoid haemorrhage, 1150 cases of subdural haemorrhage, 228 cases of epidural haemorrhage, 240 cases of intraventricular haemorrhage, and 535 cases of intraparenchymal haemorrhage (note: overlapping subtypes were possible). A summary of the reader study population is presented in Supplementary Table 11.

Development of the AI algorithm

For AI algorithm development, 28,351 slices from 2010 patients with AIH and 1000 normal participants were annotated by neuroradiologists using nordicICE version 4.1.3 (NordicNeuroLab, Bergen, Norway), with a particular focus on AIH areas. To overcome the drawbacks of inter-observer variability by supervised training, we developed a new AI algorithm based on a combination of a supervised haemorrhage detection process and an unsupervised anomaly detection process.

The purpose of the haemorrhage detection process is to predict whether AIH is present on brain CT images. This process consists of two modules^15,27,35. The first is a CNN-based haemorrhage detection module that provides the feature vector and AIH score for the target. The second is an RNN-based sequence module with double layers. In this module, more accurate AIH scores for each slice are produced using the feature vectors and scores from the first module as inputs to overcome the limitations of CNNs in terms of 3D image data analysis. In addition, scores for each patient were acquired simultaneously.

An anomaly detection process was applied to predict whether anomalies were present on brain CT images. A generation module based on a variational auto-encoder^36,37 and a generative adversarial network³⁸ was used in this process. The generation module was trained to generate normal CT slices (restored CT images) using images from the normal group. As such, a comparison of restored and input CT images indicated areas of anomaly when considering areas presumed to have AIH in the haemorrhage detection process.

Finally, AI-assisted brain CT images, which included an embedded heatmap depicting the probable location of AIH according to patient- and slice-wise AIH probability scores, were provided to the picture archiving and communication system (PACS) viewer alongside original brain CT images (Fig. 1). An overview and details of the AI algorithm architecture are presented in Fig. 4 and Supplementary Figs. 2 and 3.

**Fig. 4: Overview of the AI algorithm.**

Diagnostic performance of the AI-based diagnostic support software in the external validation dataset

Per-patient and per-slice AIH probability scores were used to evaluate the standalone performance metrics of our AI algorithm, including the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and area under the receiver operating characteristic curve (AUROC).

Evaluation of the diagnostic performance of the AI-based diagnostic support software

Per-patient and per-slice AIH probability scores were used to evaluate the standalone performance metrics of our AI algorithm, including the AUROC, sensitivity, and specificity.

Reader assessment study

A retrospective, multi-reader, crossover, superiority, pivotal, randomised study was performed to evaluate the efficacy of the software assisting the diagnosis decision with respect to the identification and detection of intracranial haemorrhage on brain CT scans (Clinical Research Information Service of Republic of Korea [https://cris.nih.go.kr; identifier: KCT0006734], which is a Korean primary registry of the World Health Organization’s International Clinical Trials Registry Platform that is under the direction of the Korea Disease Control and Prevention Agency) (Supplementary Note (Study Details)).

This retrospective multi-reader study was conducted with nine reviewers from four institutions in South Korea (Seoul National University Hospital, Ajou University Medical Center, Bundang Seoul National University Hospital, and Seongnam Medical Center) using 12,663 brain CT slices from 296 patients as the study dataset. Nine physicians from three different subgroups with equal numbers (i.e., three non-radiologist physicians with 5–7 years of experience in that role, three board-certified radiologists with 5–7 years of experience in that role, and three subspecialty-trained neuroradiologists with 7–11 years of experience as radiologists, including 3–7 years of experience as neuroradiologists) participated as reviewers.

In this retrospective, multi-reader, pivotal, crossover, randomised study, prior to the first assessment, the full CT dataset was split into groups A and B, each comprising CT images from 148 patients, and numbers for sequential assessment were randomly assigned. Group A consisted of original CT images and corresponding AI-assisted CT images, while group B consisted of only original CT images without AI-assisted CT images. The AI-assisted CT images provided a heatmap with information on the suspected location of AIH and probability of AIH in a patient- and slice-wise manner. Each reviewer independently reviewed the CT images for the detection of AIH. The PACS image viewer was used to assess CT images in a patient- and slice-wise manner. The reviewers were blinded to the decisions of the gold-standard review board with regard to AIH and proportion of AIH cases in the assessed dataset. After a washout period of 4–5 weeks, a second assessment was conducted. In the second assessment, the group A dataset comprising original and AI-assisted CT images during the first assessment was changed to include only the original CT images without any AI-assisted CT images, whereas AI-assisted CT images were added to the group B dataset that had previously included only the original CT images without AI-assisted CT images. The numbers for sequential assessments were randomly re-assigned. Each reviewer repeated the same review process as per the first assessment. A schematic overview of the study design is presented in Fig. 5.

**Fig. 5: Schematic overview of study design.**

Statistical analysis

AI determination was based on whether the probability provided by the AI algorithm was equal or over the cut-off level. For external validation, a decision was considered correct if the AI determination matched the suggested decision made on the basis of the basic information on the external validation dataset; sensitivity and specificity were calculated at a cutoff level of 50.0%. However, for standalone AI assessment, a decision was considered correct if the AI determination matched the decision made by the gold-standard review board for AUROC analysis; sensitivity and specificity were also calculated at a cutoff level of 50.0%.

In the reader study, the correctness of a decision was determined based on whether the decision of the reader matched the decision made by the gold-standard review board. Sensitivity, specificity, and accuracy were compared between AI-assisted and AI-unassisted groups using the chi-square test. To validate the superior performance of the AI-assisted group as compared with that of the AI-unassisted group, logistic regression using the generalised estimating equation (GEE) method was used for significance testing and for estimating the 95% confidence intervals (CIs). Inter-observer agreement according to AIH subtype was analysed using an intra-class correlation coefficient based on a patient-wise analysis. All analyses were performed using SAS statistical software (version 9.4; SAS Institute, Cary, NC, USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Additional documents related to this study are available from the corresponding author upon reasonable request. The datasets from Seoul National University Hospital and Ajou University Medical Center were used under license for the current study and are not publicly available.

Code availability

The code used to train the AI model is dependent on annotation, infrastructure, and hardware; thus, it cannot be released. However, all experimental and implementation details that can be shared are described in detail in the Supplementary Note (Study Details). The AI algorithm developed from this study is available through the commercial product, SK Inc. C&C Medical Insight+ Brain Hemorrhage.

References

Qureshi, A. I., Mendelow, A. D. & Hanley, D. F. Intracerebral haemorrhage. Lancet 373, 1632–1644 (2009).
Article PubMed PubMed Central Google Scholar
Broderick, J. et al. Guidelines for the management of spontaneous intracerebral hemorrhage in adults: 2007 update: a guideline from the American Heart Association/American Stroke Association Stroke Council, High Blood Pressure Research Council, and the Quality of Care and Outcomes in Research Interdisciplinary Working Group. Stroke 38, 2001–2023 (2007).
Article PubMed Google Scholar
van Asch, C. J. et al. Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis. Lancet Neurol. 9, 167–176 (2010).
Article PubMed Google Scholar
Kidwell, C. S. et al. Comparison of MRI and CT for detection of acute intracerebral hemorrhage. JAMA 292, 1823–1830 (2004).
Article CAS PubMed Google Scholar
Cordonnier, C., Demchuk, A., Ziai, W. & Anderson, C. S. Intracerebral haemorrhage: current approaches to acute management. Lancet 392, 1257–1268 (2018).
Article PubMed Google Scholar
Morotti, A. & Goldstein, J. N. Diagnosis and management of acute intracerebral hemorrhage. Emerg. Med. Clin. North. Am. 34, 883–899 (2016).
Article PubMed PubMed Central Google Scholar
Lee, J. Y., Kim, J. S., Kim, T. Y. & Kim, Y. S. Detection and classification of intracranial haemorrhage on CT images using a novel deep-learning algorithm. Sci. Rep. 10, 20546 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hwang, I. et al. Prediction of brain age from routine T2-weighted spin-echo brain magnetic resonance images with a deep convolutional neural network. Neurobiol. Aging 105, 78–85 (2021).
Article PubMed Google Scholar
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H. & Aerts, H. Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510 (2018).
Article CAS PubMed PubMed Central Google Scholar
Arbabshirani, M. R. et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit. Med. 1, 9 (2018).
Article PubMed PubMed Central Google Scholar
Chilamkurthy, S. et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 392, 2388–2396 (2018).
Article PubMed Google Scholar
Ginat, D. T. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology 62, 335–340 (2020).
Article PubMed Google Scholar
Kuo, W., Hne, C., Mukherjee, P., Malik, J. & Yuh, E. L. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc. Natl. Acad. Sci. USA 116, 22737–22745 (2019).
Article CAS PubMed PubMed Central Google Scholar
Soun, J. E. et al. Artificial intelligence and acute stroke imaging. Am. J. Neuroradiol. 42, 2–11 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ye, H. et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur. Radiol. 29, 6191–6201 (2019).
Article PubMed PubMed Central Google Scholar
Schmidt-Erfurth, U. et al. Machine learning to analyze the prognostic value of current imaging biomarkers in neovascular age-related macular degeneration. Ophthalmol. Retina 2, 24–30 (2018).
Article PubMed Google Scholar
Wang, Z. et al. Non-invasive classification of microcalcifications with phase-contrast X-ray mammography. Nat. Commun. 5, 3797 (2014).
Article CAS PubMed Google Scholar
Fernando, T., Gammulle, H., Denman, S., Sridharan, S. & Fookes, C. J. a. e.-p. Deep learning for medical anomaly detection - a survey. https://arxiv.org/abs/2012.02364 (2020).
Ironside, N. et al. Fully automated segmentation algorithm for hematoma volumetric analysis in spontaneous intracerebral hemorrhage. Stroke 50, 3416–3423 (2019).
Article PubMed Google Scholar
Jang, J., Lee, H. H., Park, J. A. & Kim, H. Unsupervised anomaly detection using generative adversarial networks in (1)H-MRS of the brain. J. Magn. Reson. 325, 106936 (2021).
Article CAS PubMed Google Scholar
Schlegl, T., Seebock, P., Waldstein, S. M., Langs, G. & Schmidt-Erfurth, U. f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019).
Article PubMed Google Scholar
Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U. & Langs, G. J. a. e.-p. In: Information Processing in Medical Imaging. https://arxiv.org/abs/1703.05921 (2017).
Dawud, A. M., Yurtkan, K. & Oztoprak, H. Application of deep learning in neuroradiology: brain haemorrhage classification using transfer learning. Comput. Intell Neurosci. 2019, 4629859 (2019).
Article PubMed PubMed Central Google Scholar
Lee, H. et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 3, 173–182 (2019).
Article PubMed Google Scholar
Prevedello, L. M. et al. Automated critical test findings identification and online notification system using artificial intelligence in imaging. Radiology 285, 923–931 (2017).
Article PubMed Google Scholar
Titano, J. J. et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 24, 1337–1341 (2018).
Article CAS PubMed Google Scholar
Grewal, M., Srivastava, M. M., Kumar, P. & Varadarajan, S. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 281–284 (2018).
Mauri, L. & Damiani, E. Estimating degradation of machine learning data assets. ACM J. 14, https://doi.org/10.1145/3446331 (2022).
Young, Z. & Steele, R. Empirical evaluation of performance degradation of machine learning-based predictive models–a case study in healthcare information systems. Int. J. Inf. Manag. 2, 10070 (2022).
Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kooi, T. et al. Large scale deep learning for computer aided detection of mammographic lesions. Med. Image. Anal. 35, 303–312 (2017).
Article PubMed Google Scholar
Zhang, L. W., Lin, J. & Karim, R. Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl. Based Syst. 139, 50–63 (2018).
Article Google Scholar
Fletcher, J. G. et al. Evaluation of lower-dose spiral head CT for detection of intracranial findings causing neurologic deficits. Am. J. Neuroradiol. 40, 1855–1863 (2019).
CAS PubMed PubMed Central Google Scholar
Fletcher, J. G. et al. Observer performance in the detection and classification of malignant hepatic nodules and masses with ct image-space denoising and iterative reconstruction. Radiology 276, 465–478 (2015).
Article PubMed Google Scholar
Sage, A. & Badura, P. Intracranial hemorrhage detection in head CT using double-branch convolutional neural network, support vector machine, and random forest. Appl. Sci. 10, https://doi.org/10.3390/app10217577 (2020).
Kingma, D. P. & Welling, M. In: International Conference on Learning Representations. https://arxiv.org/abs/1312.6114 (2013).
Kingma, D. P. & Welling, M. An introduction to variational autoencoders. Found Trends Mach. Learn. 12, 4–89 (2019).
Article Google Scholar
Goodfellow, I. J. et al. In: Neural Information Processing Systems. https://arxiv.org/abs/1406.2661 (2014).

Download references

Acknowledgements

This study was funded by SK Inc. C&C. The funder of the study was involved in the collection, management, and analysis of the data used during AI algorithm development. The corresponding author had full access to most datasets and all summary estimates from each dataset and had final responsibility for the decision to submit the manuscript for publication. We thank Synex for the study coordination.

Author information

Authors and Affiliations

Institute of Radiation Medicine, Seoul National University Medical Research Center, Seoul, Republic of Korea
Tae Jin Yun, Seung Hong Choi, Roh-Eul Yoo & In Pyeong Hwang
Department of Radiology, Seoul National University Hospital, Seoul, Republic of Korea
Tae Jin Yun, Seung Hong Choi, Roh-Eul Yoo & In Pyeong Hwang
Department of Radiology, Ajou University School of Medicine, Suwon, Republic of Korea
Jin Wook Choi, Miran Han & Woo Sang Jung

Authors

Tae Jin Yun
View author publications
You can also search for this author inPubMed Google Scholar
Jin Wook Choi
View author publications
You can also search for this author inPubMed Google Scholar
Miran Han
View author publications
You can also search for this author inPubMed Google Scholar
Woo Sang Jung
View author publications
You can also search for this author inPubMed Google Scholar
Seung Hong Choi
View author publications
You can also search for this author inPubMed Google Scholar
Roh-Eul Yoo
View author publications
You can also search for this author inPubMed Google Scholar
In Pyeong Hwang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

T.J.Y. and J.W.C. conceived and designed the study. T.J.Y., J.W.C., M.H., W.S.J., S.H.C., R.-E.Y., and I.P.H. collected and curated the data for AI development. T.J.Y. and J.W.C. collected and curated the data for the reader study. T.J.Y. and J.W.C. designed the reader study protocol. T.J.Y. and J.W.C. conducted the statistical analysis. T.J.Y. and J.W.C. interpreted the results of the validation study. T.J.Y. and J.W.C. wrote the initial draft. All authors subsequently edited the report. T.J.Y. and J.W.C. supervised the project.

Corresponding author

Correspondence to Jin Wook Choi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

REPORTING SUMMARY

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yun, T.J., Choi, J.W., Han, M. et al. Deep learning based automatic detection algorithm for acute intracranial haemorrhage: a pivotal randomized clinical trial. npj Digit. Med. 6, 61 (2023). https://doi.org/10.1038/s41746-023-00798-8

Download citation

Received: 10 September 2022
Accepted: 10 March 2023
Published: 07 April 2023
DOI: https://doi.org/10.1038/s41746-023-00798-8

This article is cited by

Deep learning-assisted detection of intracranial hemorrhage: validation and impact on reader performance
- Dong-Wan Kang
- Museong Kim
- Han-Gil Jeong
Neuroradiology (2025)
Improved differentiation of cavernous malformation and acute intraparenchymal hemorrhage on CT using an AI algorithm
- Jung Youn Kim
- Hye Jeong Choi
- Hwangseon Ju
Scientific Reports (2024)
Predicting hematoma expansion in acute spontaneous intracerebral hemorrhage: integrating clinical factors with a multitask deep learning model for non-contrast head CT
- Hyochul Lee
- Junhyeok Lee
- Seung Hong Choi
Neuroradiology (2024)
“How I would like AI used for my imaging”: children and young persons’ perspectives
- Lauren Lee
- Raimat Korede Salami
- Susan Cheng Shelmerdine
European Radiology (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Diagnostic performance of the AI-based diagnostic support software in the external validation dataset

Evaluation of the diagnostic performance of the AI-based diagnostic support software

Reader assessment study

Discussion

Methods

Study design

Development dataset

External validation dataset

Reader study dataset

Development of the AI algorithm

Diagnostic performance of the AI-based diagnostic support software in the external validation dataset

Evaluation of the diagnostic performance of the AI-based diagnostic support software

Reader assessment study

Statistical analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links