As AI tools become increasingly integrated into radiology workflows, speakers at ECR 2026 highlighted a growing challenge: ensuring algorithms remain reliable after deployment in clinical practice. During the session “Post-market surveillance and quality assurance of AI tools,” speakers examined regulatory requirements, real-world performance variability, and practical monitoring strategies for AI systems used in imaging.
From Approval to Continuous Oversight
Introducing the concept of post-market surveillance (PMS), Hugh Harvey, Founder and Managing Director of Hardian Health from the United Kingdom, emphasized that regulatory approval is only the beginning of the lifecycle of an AI product. “Regulatory approval is an authorization to sell a product on the market. It does not mean that you are done,” he said.
Under the European Medical Device Regulation (MDR), manufacturers must continuously monitor device performance after deployment, including safety, usability, and adverse events. Harvey stressed that PMS should not be treated as a one-time validation step. “It's not a one-off. It's not a pilot study, and then leave the AI unchecked. It is a continuous entire lifecycle process.”
One major concern is performance drift, as AI systems may behave differently across hospitals, imaging protocols, and patient populations. “The AI is a fixed black box that is trying to stay the same in a shifting data environment,” Harvey explained.
He also highlighted that hospitals play an active role because manufacturers depend on clinical feedback and downstream outcome data to maintain regulatory compliance. According to Harvey, insufficient monitoring can even threaten regulatory approval. “If you're not collecting that data, then you risk losing your CE mark,” he said, referring to manufacturers unable to demonstrate continued clinical performance after deployment.

Real-world Testing Reveals Variability
Sarah Katharina Herber from the University Medical Center Mainz in Germany, presented a comparison of three commercially available AI tools for lung nodule detection and malignancy classification under real-world conditions.
The retrospective study included 152 patients with histopathologically confirmed pulmonary nodules. While detection rates were generally high, classification performance varied considerably between the AI models. “The evaluation was performed under real-world clinical conditions, reflecting the post-market use rather than the controlled pre-market environment,” Herber explained.
“One of the most critical findings of our study was the false negative rate,” she said. In some subgroup analyses, false negative rates reached between 70 and 100 percent, although subgroup sizes were limited. According to Herber, the findings highlight the gap between pre-market validation studies and routine clinical practice. “Post-market performance does not necessarily reflect the marketing performance,” she concluded.

Monitoring AI in Clinical Workflows
Sergey Morozov, research and development consultant at 3R in Brussels, Belgium, focused on practical approaches to AI monitoring and observability in clinical environments.
Drawing on experiences from large-scale AI deployments during the COVID-19 pandemic, Morozov described how algorithm behavior can change significantly after implementation. “In the laboratory, everything seems to be really nice,” he said. “But when we deploy into the clinic, there is a drastic plummeting of the quality.”
According to Morozov, monitoring should include not only diagnostic accuracy but also workflow integration, reporting latency, and user experience. In some cases, AI results may arrive only after radiologists have finalized reports, limiting the clinical value of the system and potentially creating reporting challenges.
Building Long-term AI Governance
For the speakers, the session ultimately highlighted that AI quality assurance cannot rely solely on pre-market studies. Continuous monitoring, structured feedback systems and institutional AI governance are becoming central requirements for safe deployment.
“If every radiologist in your department is disagreeing consistently with the AI, then you have an AI problem,” Harvey said.
Morozov stressed that hospitals need dedicated oversight structures capable of evaluating AI performance over time. Since the human factor is very important to control the final results.
The session made clear that deploying AI is no longer the endpoint. For radiology departments and vendors alike, long-term monitoring is becoming part of the clinical infrastructure required to keep AI reliable in practice.









