The deadline is a forcing function, not a date.
Under EU IVDR, post-market performance evidence is a continuous, lifecycle obligation, with a Periodic Safety Update Report owed for each CE-marked device — and the FDA is moving the same way, expecting ongoing real-world performance monitoring across the life of an AI-enabled device. Transition windows are closing and reviewer capacity is constrained. For a maker without internal infrastructure, the question is no longer whether to build post-market monitoring, but who delivers it.
“Validated” gets read as permanent. It isn't.
A consensus increasingly recognised in the literature names the core failure plainly: a validation result is treated as a permanent label when it is in fact conditional and time-bound. A model proven on one population, at one moment, with one set of equipment will move as any of those change. Treating the snapshot as the truth is the mistake the regulation is now written to correct.
The drift is already in the published record.
This is not a hypothetical risk. A real-world study of a haemorrhage-detection AI across 17 facilities and more than 100,000 scans found sensitivity falling from roughly 94% at validation to about 82% in the field — and materially lower on subacute and chronic cases — with false positives traced to a specific scanner brand. The thesis is proven in someone else's data: the same model drifts by site and scanner, and nobody independent is watching.
A single accuracy number hides the failure.
A device fails two ways — false negatives and false positives — and they carry different consequences, so sensitivity and specificity have to be watched separately. And a point estimate alone is not enough: the honest picture is the confidence interval, which tightens as case volume grows. Defined properly, drift is the interval crossing outside the expected natural range — a statistically grounded trigger, not an eyeball call.
Independence is the whole point.
A manufacturer cannot be the neutral benchmark for its own device, and a manufacturer-owned data manager cannot either — the credibility only holds if the acceptable-performance bar is set objectively, by a party that isn't being measured. That structural independence — an external benchmark, held across sites — is what a notified body or the FDA can actually trust, and what makes the evidence stronger the longer it runs.