Explainable AI in Alzheimer’s Diagnosis: XGBoost and SHAP

Medical diagnostics is undergoing a painful transition from blind trust in "black boxes" to transparent logic that can stand up in court or before an insurance board. Research by Afshan Hashmi from Tuwaiq Academy vividly demonstrates this shift: an XGBoost classifier has learned to distinguish between healthy cognitive function, mild cognitive impairment (MCI), and Alzheimer's disease with near-perfect precision. Using the ADNI dataset, the model achieved a macro AUC of 0.982—an impressive figure, but for medical network owners and underwriters, something else matters more. Choosing gradient boosting over trendy neural networks isn't technical conservatism; it is a conscious effort to overcome the primary barrier to clinical AI adoption: the lack of a clear answer to the question, "Why was this fatal diagnosis made?"

Solving the Interpretability Crisis

The bet on XGBoost was made for compatibility with the SHAP (SHapley Additive exPlana-tions) framework. Unlike neural network structures, where decisions emerge from a chaos of weights, SHAP mathematically justifies the contribution of each clinical feature to the final verdict. In an industry where up to 15% of patients with MCI progress to Alzheimer's annually, the ability to trace algorithmic logic is not a "bonus"—it is a rigid regulatory and ethical requirement. Hashmi’s model relies on eight routine markers, including MMSE, CDR-SB, and MoCA scores.

This combination of strong performance and interpretability is essential for clinical adoption, where algorithmic transparency is a regulatory and ethical requirement.

This approach saves MedTech CTOs from the headache of legal liability. When an algorithm delivers a diagnosis with 0.943 accuracy, SHAP analysis highlights the specific levers—whether it is the CDR-SB total score or MMSE metrics—that drove the result. This transforms AI from a black box into a hypothesis-testing tool, allowing physicians to cross-reference mathematical conclusions with their own expertise rather than simply taking the machine’s word for it.

The Clinical Weight of Eight Features

The model's success is built not on expensive MRIs, but on eight specific features: MMSE, CDR Global, CDR-SB, MoCA, FAQ, age, gender, and education level. These are exactly the data points already available to any memory clinic. SHAP analysis revealed a clear hierarchy: while CDR Global is critical for separating healthy patients from those with MCI, the combination of CDR-SB and MMSE dominates Alzheimer's classification.

SHAP analysis reveals clinically plausible, class-specific feature importance patterns supporting clinical validity.

To handle data imbalance (266 Alzheimer's cases versus 767 MCI), the authors utilized SMOTE and optimized hyperparameters via Optuna. The result—a Cohen’s kappa of 0.909—confirms the system's high reliability. For the insurance sector, such precision using routine data translates to direct cost savings: effective screening can be conducted without immediately ordering the most expensive diagnostic procedures.

The transition of AI from laboratory toy to clinical reality depends entirely on the auditability of its logic. While the work with the ADNI archive requires validation across multi-center samples to account for population differences, the direction is correct. The future of MedTech lies not in "smart" hidden layers, but in transparent models capable of explaining their reasoning. Executives should prioritize systems with SHAP transparency today to avoid regulatory deadlocks and patient trust crises tomorrow.

Source: arXiv cs.AI →

Rate this material

★ ★ ★ ★ ★

AI in HealthcareMachine LearningAI RegulationDigital TransformationXGBoost

Beyond the Black Box: Why Explainable AI is the New Standard for MedTech

Solving the Interpretability Crisis

The Clinical Weight of Eight Features