Skip to content

Latest commit

 

History

History
36 lines (30 loc) · 1.66 KB

CARES.md

File metadata and controls

36 lines (30 loc) · 1.66 KB

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

  • Published: arXiv, 2024
  • Link: arXiv:2406.06007v1
  • Summary: CARES evaluates the trustworthiness of medical vision language models (Med-LVLMs) across five dimensions: trustfulness, fairness, safety, privacy, and robustness.

Problem

  • Unverified trustworthiness of Med-LVLMs poses risks in medical applications.
    • Factual inaccuracies in medical diagnoses.
    • Overconfidence in generated diagnoses.
    • Privacy breaches.
    • Health disparities across demographic groups.

Contributions

  • Introduction of CARES benchmark for evaluating Med-LVLMs' trustworthiness.
  • Assessment across five critical dimensions: trustfulness, fairness, safety, privacy, and robustness.
  • Public release of benchmark and code.

Method

  • Dataset from seven medical multimodal and image classification datasets.
  • 18K images and 41K question-answer pairs in various formats.
  • Evaluation based on trustfulness, fairness, safety, privacy, and robustness.

Result

  • Models show factual inaccuracies.
  • Poor uncertainty estimation.
  • Models fail to maintain fairness across different demographic groups.
  • Vulnerability to attacks.
  • Privacy leaks.
  • Inadequate handling of OOD samples.

Conclusion

  • Existing Med-LVLMs are unreliable and pose significant trustworthiness issues.
  • CARES aims to drive standardization and development of more reliable Med-LVLMs.

Reference

  • Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, et al. "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models." arXiv preprint arXiv:2406.06007v1, 2024.