Cancer treatment is a classic sequential decision-making problem where doctors see only the tip of the iceberg, while the patient's true state remains hidden behind latent heterogeneity. Traditional Reinforcement Learning (RL) often fails here: it naively assumes the rules of the game are static. However, as noted by Deniz Sargun (Amazon), H. Bugra Tulay (HP), and J. Emre Koksal (Ohio State University), oncology is an environment with plastic dynamics. Therapy doesn't just move a patient along a set trajectory; it breaks and rewires the disease mechanisms themselves, shifting long-term equilibria. For a mutating, adaptive adversary, standard control methods are far too blunt an instrument.

From State Control to Belief-Space Planning

Researchers are proposing a paradigm shift: instead of trying to control the state trajectory, we should move to belief-space planning via active inference. In this model, treatment is framed as a Constrained Partially Observable Markov Decision Process (POMDP). The system doesn't merely react to symptoms; it manages the evolution of the "belief state"—a probability distribution of latent variables such as genetics, immune response, and physiology. The goal is to steer this distribution toward a clinically desirable outcome without exceeding the measurement budget. This is an honest admission of fact: the true state of a tumor and a patient's quality of life are never fully known; they can only be inferred from sparse, heterogeneous data.

In standard optimal control formulations, actions influence the instantaneous state while system dynamics remain fixed. In oncology, interventions induce plastic dynamics: treatment permanently alters disease mechanisms and shifts equilibrium points.

Active inference implements the free energy principle, turning action selection into the minimization of Expected Free Energy (EFE). This creates an information-theoretic functional that naturally balances pragmatic control and knowledge seeking. The model doesn't just look for the optimal drug dose—it calculates the "epistemic value" of every diagnostic test. By decomposing the goal into risk (goal alignment), ambiguity (observational uncertainty), and information gain, the framework allows AI to mathematically justify when an invasive test is warranted by the need to reduce uncertainty in a specific patient's dynamics.

Personalization Through Latent Heterogeneity

The method's strength lies in handling individual attributes that are unknown a priori. Using data from the AACR Project GENIE Biopharma Collaborative, Sargun and his co-authors demonstrated that their approach allows for simultaneous patient classification and high-efficiency therapy maintenance. Because planning occurs in belief space, the model "learns on the job," adapting to a specific person’s unique transition and observation models during treatment. This elegantly solves the exploration-exploitation dilemma: the algorithm stalls disease progression while simultaneously inferring body-specific traits from limited feedback, forcing therapy to adapt to biological changes on the fly.

The transition from AI as a static diagnostician to AI as an active controller of biological systems is a logical step, yet the path from an arXiv preprint to clinical protocol is littered with legal and technical wreckage. The computational complexity of high-dimensional POMDPs and the ethical accountability for dosages prescribed by a "black box" remain major barriers. For tech leads, the fundamental takeaway is clear: in systems where your actions change the very rules of the game—whether in oncology or volatile markets—active inference provides mathematical rigor where classical RL merely hallucinates.

Artificial IntelligenceMachine LearningAI in HealthcareAACR Project GENIE