Data Sources And Imaging - CHIMERA-agent

Precomputed features¶

To make participation accessible and keep inference runtimes manageable, all tasks provide image features precomputed offline rather than raw imaging data. Participants receive .h5 feature files extracted from the raw MRI and WSI data, alongside .csv clinical variables. This means submissions do not need to perform image loading or feature extraction at inference time; the agent focuses entirely on evidence integration and reasoning.

Raw training data originates from the CHIMERA 2025 dataset, hosted on AWS (s3://chimera-challenge/v2/).

Imaging devices¶

Modality	Device	Resolution / Protocol
Multiparametric MRI	Siemens 3T scanner (Radboudumc, CWZ)	Axial plane; T1w, ADC maps, DWI with multiple b-values
Biopsy WSI (H&E)	3DHISTECH PANNORAMIC 1000 (Radboudumc, CWZ)	0.25 µm/pixel
Prostatectomy WSI (H&E)	3DHISTECH PANNORAMIC 1000 (Radboudumc, CWZ)	0.25 µm/pixel

Data files¶

MRI scans: .mha format, filename <subject_id>.mha
Histopathology WSI: .tif format, filename <subject_id>.tif
Annotations: .csv with columns pathology_subject_id, MRI_subject_id, and task-specific labels

Public datasets recommended for pre-training¶

PI-CAI -- prostate cancer detection in MRI (1,000+ cases)
PANDA -- prostate cancer Gleason grading in WSI (1,000+ cases)
LEOPARD -- biochemical recurrence prediction from WSI (500+ cases)

Domain shift¶

Training data originate from Radboudumc. Test data include cases from Radboudumc, CWZ, Karolinska Institute, and additional contributing institutes. Participants should expect domain shift between training and test cohorts due to differences in MRI parameters, staining protocols, and scanners.