Only 5 retrospective cohorts met inclusion criteria; heterogeneity was so extreme the authors declared their own pooled estimates "clinically uninterpretable"
Journal: The Canadian Journal of Urology | Published: 2026-04-15 | Type: Systematic Review, Meta-Analysis | PMID: 42086349 Authors: Ghazwani Y et al. (King Saud bin Abdulaziz University for Health Sciences / Ministry of National Guard Health Affairs, Saudi Arabia) Funding/COI: Funding not listed; authors declare no conflicts of interest
This systematic review searched six databases through September 2025 to assess whether AI and predictive models can reliably forecast stone-free status (SFS) after ureteroscopy. Five retrospective cohorts made the cut, covering approaches from logistic regression to gradient boosting and radiomics ensembles. Individual models showed acceptable-to-excellent discrimination, but heterogeneity across studies was so severe the authors concluded their own pooled estimates are "clinically uninterpretable."
Five retrospective cohorts is a thin foundation for a meta-analysis. The authors applied QUADAS-AI for risk-of-bias assessment and used dual independent screening and extraction — appropriate. SFS definitions varied substantially: from <2 mm residual fragments at day 1 to ≤5 mm at one month, assessed by plain radiography, ultrasound, and/or CT. That's not a minor technical difference; it means the outcome being predicted is not the same thing across studies.
The heterogeneity statistics are the paper's main finding. I² values of 94.6% and 96.9% for the two primary binary outcomes, with prediction intervals spanning three orders of magnitude, indicate these studies are not measuring the same phenomenon. The authors explicitly say so rather than papering over it, which is the right call.
A meta-analysis that calls its own pooled results clinically uninterpretable is doing something right. The underlying individual models may have genuine predictive value — individual studies report acceptable-to-excellent discrimination — but five retrospective cohorts with incompatible outcome definitions do not support meaningful pooling. The paper's real contribution is cataloguing what a valid evidence base would require: standardized SFS definitions, consistent imaging protocols, and prospective validation. Stone density (HU) stands out as the one predictor that holds across studies. Read this for the honest methodology, not for actionable estimates.