This site is intended for healthcare professionals

AI Shows Potential for Detecting Mucosal Healing in Ulcerative Colitis

Carolyn Crist

DISCLOSURES

Artificial intelligence (AI) systems show high potential for detecting mucosal healing in ulcerative colitis with optimal diagnostic performance, according to a new systematic review and meta-analysis.

AI algorithms replicated expert opinion with high sensitivity and specificity when evaluating images and videos. At the same time, moderate-high heterogeneity of the data was found, the authors noted.

"Artificial intelligence software is expected to potentially solve the longstanding issue of low-to-moderate interobserver agreement when human endoscopists are required to indicate mucosal healing or different grades of inflammation in ulcerative colitis," Alessandro Rimondi, lead author and clinical fellow at the Royal Free Hospital and University College London Institute for Liver and Digestive Health, England, told Medscape Medical News.

"However, high levels of heterogeneity have been found, potentially linked to how differently the AI software was trained and how many cases it has been tested on," he said. "This partially limits the quality of the body of evidence."

The study was published online in Digestive and Liver Disease.

Evaluating AI Detection

In clinical practice, assessing mucosal healing in inflammatory bowel disease (IBD) is critical for evaluating a patient's response to therapy and guiding strategies for treatment, surgery, and endoscopic surveillance. In an era of precision medicine, assessment of mucosal healing should be precise, readily available in an endoscopic report, and highly reproducible, which requires high accuracy and agreement in endoscopic diagnosis, the authors noted.

AI systems — particularly deep learning algorithms based on convolutional neural network architecture — may allow endoscopists to establish an objective and real-time diagnosis of mucosal healing and improve the average quality standards at primary and tertiary care centers, the authors wrote. Research on AI in IBD has looked at potential implications for endoscopy and clinical management, which opens new areas to explore.

Rimondi and colleagues conducted a systematic review of studies up to December 2022 that involved an AI-based system used to estimate any degree of endoscopic inflammation in IBD, whether ulcerative colitis or Crohn's disease. After that, they conducted a diagnostic test accuracy meta-analysis restricted to the field in which more than five studies providing diagnostic performance — mucosal healing in ulcerative colitis based on luminal imaging — were available.

The researchers identified 12 studies with luminal imaging in patients with ulcerative colitis. Four evaluated the performance of AI systems on videos, six focused on fixed images, and two looked at both.

Overall, the AI systems achieved a satisfactory performance in evaluating mucosal healing in ulcerative colitis. When evaluating fixed images, the algorithms achieved a sensitivity of 0.91 and specificity of 0.89, with a diagnostic odds ratio (DOR) of 92.42, summary receiver operating characteristic curve (SROC) of 0.957, and area under the curve (AUC) of 0.957. When evaluating videos, the algorithms achieved 0.86 sensitivity, 0.91 specificity, 70.86 DOR, 0.941 SROC, and 0.941 AUC.

"It is exciting to see artificial intelligence expand and be effective for conditions beyond colon polyps," Seth Gross, MD, professor of medicine and clinical chief of gastroenterology and hepatology at NYU Langone Health, New York, told Medscape Medial News.

Gross, who wasn't involved with this study, has researched AI applications in endoscopy and colonoscopy. He and colleagues have found that machine learning software can improve lesion and polyp detection, serving as a "second set of eyes" for practitioners.

"Mucosal healing interpretation can be variable amongst providers," he said. "AI has the potential to help standardize the assessment of mucosal healing in patients with ulcerative colitis."

Improving AI Training

The authors found moderate-high levels of heterogeneity among the studies, which limited the quality of the evidence. Only 2 of the 12 studies used an external dataset to validate the AI systems, and 1 evaluated the AI system on a mixed database. However, seven used an internal validation dataset separate from the training dataset.

It is crucial to find a shared consensus on training for AI models, with a shared definition of mucosal healing and cutoff thresholds based on recent guidelines, Rimondi and colleagues noted. Training data ideally should be on the basis of a broad and shared database containing images and videos with high interobserver agreement on the degree of inflammation, they added.

"We probably need a consensus or guidelines that identify the standards for training and testing newly developed software, stating the bare minimum number of images or videos for the training and testing sections," Rimondi said.

In addition, due to interobserver misalignment, an expert-validated database could help serve the purpose of a gold standard, he added.

"In my opinion, artificial intelligence tends to better perform when it is required to evaluate a dichotomic outcome (such as polyp detection, which is a yes or no task) than when it is required to replicate more difficult tasks (such as polyp characterization or judging a degree of inflammation), which have a continuous range of expression," Rimondi said.

The authors declared no financial support for this study. Rimondi and Gross reported no financial disclosures.

Carolyn Crist is a health and medical journalist who reports on the latest studies for Medscape, MDedge, and WebMD.

References
TOP PICKS FOR YOU

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.