- Backward Tracing: trace back to past events to answer the question.
- Real-Time Visual Perception: understand and respond to events as they unfold at the current timestamp.
- Forward Active Responding: delay the response until sufficient future information becomes available to answer the question accurately.
OVO-Bench evaluates Video-LLMs' ability to find temporal visual clues from ongoing input, allowing models to wait for sufficient evidence before responding. We term this approach the Video Chain-of-Time thinking process, analogous to Chain-of-Thought reasoning in LLMs.
- 644 videos
- 3,100 Queries
- 263.42s Average query timestamp.
Following modules are required for inference and scoring pipeline.
moviepy==1.0.3
numpy
pillow
tqdm
Or run pip insall -r requirements
to install all required modules.
Download videos
and annotations
from our huggingface-repo, unzip all files and place them under ./data
directory.
We divide our evaluation pipeline into two parts: inference
and score
. For our released models, run our provided scripts under ./scripts
directory. For example, for InternVL2, run:
bash scripts/inference_Gemini.sh
All inference results will be saved under ./results/[MODEL_NAME]
. Then run our scoring scripts:
bash scripts/score_Gemini.sh
Scores will show in cli:
Offline Model: Gemini
Evaluate Backward Tracing...
Task: HLD, Acc: 52.69
Task: ASI, Acc: 75.68
Task: EPM, Acc: 58.59
Backward Avg.: 62.32
Evaluate Real-time Visual Perception...
Task: STU, Acc: 54.49
Task: OJR, Acc: 67.39
Task: ATR, Acc: 80.17
Task: FPD, Acc: 68.32
Task: ACR, Acc: 66.97
Task: OCR, Acc: 87.25
Realtime Avg.: 70.77
Evaluate Forward Active Responding...
Task: REC, Acc: 35.53
Task: SSR, Acc: 74.24
Task: CRR, Acc: 61.67
Forward Avg.: 57.15
Total Avg.: 65.25
To evaluate your own models, inherit OVOBenchOffline/Online
class in ./utils/OVOBench.py
and implement your own inference pipeline. Refer to our provided models under ./models
for further details.
OVO-Bench is released under CC BY-NC-SA 4.0
license. By downloading our dataset from our website or other sources, the user agrees to adhere to the terms of CC BY-NC-SA 4.0
and licenses of the source datasets
@misc{li2025ovobenchfarvideollmsrealworld,
title={OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?},
author={Yifei Li and Junbo Niu and Ziyang Miao and Chunjiang Ge and Yuanhang Zhou and Qihao He and Xiaoyi Dong and Haodong Duan and Shuangrui Ding and Rui Qian and Pan Zhang and Yuhang Zang and Yuhang Cao and Conghui He and Jiaqi Wang},
year={2025},
eprint={2501.05510},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.05510},
}