-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
36 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | ||
- **Published**: arXiv, 2024 | ||
- **Link**: [arXiv:2406.06007v1](https://arxiv.org/abs/2406.06007v1) | ||
- **Summary**: CARES evaluates the trustworthiness of medical vision language models (Med-LVLMs) across five dimensions: trustfulness, fairness, safety, privacy, and robustness. | ||
|
||
### Problem | ||
- Unverified trustworthiness of Med-LVLMs poses risks in medical applications. | ||
- Factual inaccuracies in medical diagnoses. | ||
- Overconfidence in generated diagnoses. | ||
- Privacy breaches. | ||
- Health disparities across demographic groups. | ||
|
||
### Contributions | ||
- Introduction of CARES benchmark for evaluating Med-LVLMs' trustworthiness. | ||
- Assessment across five critical dimensions: trustfulness, fairness, safety, privacy, and robustness. | ||
- Public release of benchmark and code. | ||
|
||
### Method | ||
- Dataset from seven medical multimodal and image classification datasets. | ||
- 18K images and 41K question-answer pairs in various formats. | ||
- Evaluation based on trustfulness, fairness, safety, privacy, and robustness. | ||
|
||
### Result | ||
- Factual inaccuracies. | ||
- Poor uncertainty estimation. | ||
- Performance disparities across demographics. | ||
- Vulnerability to attacks. | ||
- Privacy leaks. | ||
- Inadequate handling of OOD samples. | ||
|
||
### Conclusion | ||
- Existing Med-LVLMs are unreliable and pose significant trustworthiness issues. | ||
- CARES aims to drive standardization and development of more reliable Med-LVLMs. | ||
|
||
### Reference | ||
- Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, et al. "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models." arXiv preprint arXiv:2406.06007v1, 2024. |