Skip to content

Reallm-Labs/InfiGUIAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 

Repository files navigation

ToRA
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection



This is the repo for the paper "InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection". In this work, we develop a multimodal large language model-based GUI agent that enables enhanced task automation on computing devices. Our agent is trained through a two-stage supervised fine-tuning approach that focuses on fundamental GUI understanding skills and advanced reasoning capabilities, where we integrate hierarchical reasoning and expectation-reflection reasoning to enable native reasoning abilities in GUI interactions.

πŸ”₯ News

InfiGUIAgent

We are in the process of uploading key artifacts from our paper to our πŸ€— Hugging Face Collection.

Regarding the full model release, due to licensing restrictions on portions of our training data from third-party sources, we are currently sanitizing the dataset and retraining/refining the final model to ensure full compliance while maintaining performance.

Stay tuned for updates! πŸ”œ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •