- Published: arXiv, 2024
- Link: arXiv:2406.06007v1
- Summary: CARES evaluates the trustworthiness of medical vision language models (Med-LVLMs) across five dimensions: trustfulness, fairness, safety, privacy, and robustness.
- Unverified trustworthiness of Med-LVLMs poses risks in medical applications.
- Factual inaccuracies in medical diagnoses.
- Overconfidence in generated diagnoses.
- Privacy breaches.
- Health disparities across demographic groups.
- Introduction of CARES benchmark for evaluating Med-LVLMs' trustworthiness.
- Assessment across five critical dimensions: trustfulness, fairness, safety, privacy, and robustness.
- Public release of benchmark and code.
- Dataset from seven medical multimodal and image classification datasets.
- 18K images and 41K question-answer pairs in various formats.
- Evaluation based on trustfulness, fairness, safety, privacy, and robustness.
- Models show factual inaccuracies.
- Poor uncertainty estimation.
- Models fail to maintain fairness across different demographic groups.
- Vulnerability to attacks.
- Privacy leaks.
- Inadequate handling of OOD samples.
- Existing Med-LVLMs are unreliable and pose significant trustworthiness issues.
- CARES aims to drive standardization and development of more reliable Med-LVLMs.
- Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, et al. "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models." arXiv preprint arXiv:2406.06007v1, 2024.