OVO-Bench: How Far is Your Video-LLMs from Real-World Online VideO Understanding?

Introduction

🌟 Three distinct problem-solving modes

Backward Tracing: trace back to past events to answer the question.
Real-Time Visual Perception: understand and respond to events as they unfold at the current timestamp.
Forward Active Responding: delay the response until sufficient future information becomes available to answer the question accurately.

💫Chain-of-Time Thinking Process

OVO-Bench evaluates Video-LLMs' ability to find temporal visual clues from ongoing input, allowing models to wait for sufficient evidence before responding. We term this approach the Video Chain-of-Time thinking process, analogous to Chain-of-Thought reasoning in LLMs.

Dataset Statistics

644 videos
3,100 Queries

263.42s Average query timestamp.

Dataset Examples

Evaluation Pipeline

Requirements

Following modules are required for inference and scoring pipeline.

moviepy==1.0.3
numpy
pillow
tqdm

Or run pip insall -r requirements to install all required modules.

Data Preparation

Download videos and annotations from our huggingface-repo, unzip all files and place them under ./data directory.

Inference and Score

We divide our evaluation pipeline into two parts: inference and score. For our released models, run our provided scripts under ./scripts directory. For example, for InternVL2, run:

bash scripts/inference_Gemini.sh

All inference results will be saved under ./results/[MODEL_NAME]. Then run our scoring scripts:

bash scripts/score_Gemini.sh

Scores will show in cli:

Offline Model: Gemini
Evaluate Backward Tracing...
Task: HLD, Acc: 52.69
Task: ASI, Acc: 75.68
Task: EPM, Acc: 58.59
Backward Avg.: 62.32

Evaluate Real-time Visual Perception...
Task: STU, Acc: 54.49
Task: OJR, Acc: 67.39
Task: ATR, Acc: 80.17
Task: FPD, Acc: 68.32
Task: ACR, Acc: 66.97
Task: OCR, Acc: 87.25
Realtime Avg.: 70.77

Evaluate Forward Active Responding...
Task: REC, Acc: 35.53
Task: SSR, Acc: 74.24
Task: CRR, Acc: 61.67
Forward Avg.: 57.15

Total Avg.: 65.25

To evaluate your own models, inherit OVOBenchOffline/Online class in ./utils/OVOBench.py and implement your own inference pipeline. Refer to our provided models under ./models for further details.

License

OVO-Bench is released under CC BY-NC-SA 4.0 license. By downloading our dataset from our website or other sources, the user agrees to adhere to the terms of CC BY-NC-SA 4.0 and licenses of the source datasets

🫥 Experimental Results

📍 Citing OVO-Bench

@misc{li2025ovobenchfarvideollmsrealworld,
      title={OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?}, 
      author={Yifei Li and Junbo Niu and Ziyang Miao and Chunjiang Ge and Yuanhang Zhou and Qihao He and Xiaoyi Dong and Haodong Duan and Shuangrui Ding and Rui Qian and Pan Zhang and Yuhang Zang and Yuhang Cao and Conghui He and Jiaqi Wang},
      year={2025},
      eprint={2501.05510},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.05510}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OVO-Bench: How Far is Your Video-LLMs from Real-World Online VideO Understanding?

Introduction

🌟 Three distinct problem-solving modes

💫Chain-of-Time Thinking Process

Dataset Statistics

Dataset Examples

Evaluation Pipeline

Requirements

Data Preparation

Inference and Score

License

🫥 Experimental Results

📍 Citing OVO-Bench

Files

README.md

Latest commit

History

README.md

File metadata and controls

OVO-Bench: How Far is Your Video-LLMs from Real-World Online VideO Understanding?

Introduction

🌟 Three distinct problem-solving modes

💫Chain-of-Time Thinking Process

Dataset Statistics

Dataset Examples

Evaluation Pipeline

Requirements

Data Preparation

Inference and Score

License

🫥 Experimental Results

📍 Citing OVO-Bench