CHIMERA-agent Challenge¶
Combining HIstology, Medical imaging and molEcular data for medical pRognosis and diAgnosis Agent
Prostate cancer care is not a single decision. It is a sequence of decisions made by different specialists at different times, each working with whatever information happens to be available. At every step, the picture is incomplete: reports arrive late, modalities are missing, and findings from one source may contradict another.
Electronic health records reflect this reality. A patient's file is not a tidy, fully paired dataset but a collection of lab results, imaging reports from different hospitals, pathology notes with varying levels of detail, and longitudinal measurements with irregular spacing. Some patients have multiple MRIs and others have none. These are not edge cases but routine clinical practice.
Most AI systems for prostate cancer ignore this complexity. They assume clean, complete, single-modality input and produce a prediction without explanation. In the CHIMERA-agent challenge, your task is to close that gap.
CHIMERA-agent builds on the CHIMERA challenge at MICCAI 2025 and focuses on three clinical questions that closely reflect real-world practice: (1) whether a patient should undergo biopsy, (2) whether a patient should receive active surveillance or definitive treatment, and (3) survival prediction.
The objective of this challenge is to have a flexible agent that can handle realistic clinical conditions, including missing modalities, longitudinal evidence, and cross-modality conflict.
The key innovation is the focus on agent-level reasoning: participants must submit not only predictions, but also structured reasoning traces showing how the available evidence was interpreted, integrated, and used to support the final decision.
Note for participants: You are expected to design decision policies that orchestrate pretrained, modality-specific foundation models. Training models from scratch is not required. The organizers provide baseline tools.
The Challenge¶
CHIMERA-agent models the prostate cancer patient journey as a sequence of three clinical decision points, each reflecting a real scenario faced by clinicians in multidisciplinary practice (see figure below).

Task 1: MRI-Only Diagnostic Decision (pre-biopsy). Using multiparametric MRI reports and basic clinical variables, the agent estimates the probability of clinically significant prostate cancer and recommends whether biopsy is warranted. See the Task 1 page for full details.
Task 2: MRI + Biopsy Risk Stratification (post-biopsy). After biopsy, the agent integrates MRI findings, biopsy pathology reports, and clinical variables to determine whether the patient is eligible for active surveillance or should proceed to radical prostatectomy, following EAU/NCCN guidelines. See the Task 2 page for full details.
Task 3: Prostatectomy Pathology Prediction (post-radical prostatectomy). For patients who undergo definitive treatment, the agent combines prostatectomy and biopsy pathology reports, preoperative MRI, and longitudinal PSA kinetics to estimate time-dependent biochemical recurrence (BCR) risk at 1, 2, and 5 years. Cases may include multiple MRIs or biopsies over time, and missing modalities are intentionally present. See the Task 3 page for full details.
The tasks are connected but independently evaluated.
Agent-Level Reasoning¶
The defining feature of CHIMERA-agent is that prediction alone is not enough. For every case, participants must submit a structured reasoning trace alongside their prediction, explaining which evidence was used, how conflicting or missing information was handled, and why the final decision was reached.
Reasoning is evaluated against the structured input fields for that case. Unsupported, contradictory, or hallucinated findings are penalized. For top-performing submissions, expert clinicians additionally review the reasoning for clinical plausibility. An agent that achieves high predictive accuracy but fabricates its justification will not rank well.
Input and output format¶
All input data is provided as structured JSON files containing the available modalities for each case (e.g. MRI reports, pathology reports, clinical variables, and longitudinal PSA where applicable). Participants produce a JSON output per case (~5 KB) containing the task-specific prediction and a structured reasoning trace.
Ground Truth¶
Ground truth labels are derived from definitive clinical outcomes, as described per task above. Reference reasoning annotations are collected through a structured clinical assessment interface in which urologists review realistic, EHR-style patient records including imaging reports, pathology findings, lab panels, PSA trends, and medication.
Tools and data provided to participants¶
Participants are expected to design agent-level decision policies that orchestrate the provided tools, rather than training models from scratch. The organizers provide:
- A set of pretrained, modality-specific tools including MRI prostate zone segmentation, automated Gleason grading for biopsy and prostatectomy specimens, and MRI-based csPCa detection.
- Structured clinical variables per case, such as age, PSA, PSA density, PI-RADS score, DRE results, medical and family history, and a deep learning-generated csPCa probability score.
Participants may use these tools and variables as-is, replace them, or combine them with additional publicly available models.
Key facts¶
| Conference | MICCAI 2026 |
| Workshop format | Half-day |
| Primary contact | nadieh.khalili@radboudumc.nl |