cua-bench: make your agents better at computers

cua-bench is a collection of desktop and mobile tasks with a harness for evaluation and training to help agent makers quantify their agents' computer-use mastery.

interested in|
built with love by cua.ai
Tianbao Xie
Tianbao Xie
Main Author, OSWorld (XLang Labs)
@TianbaoX

Erik Dunteman
Erik Dunteman
CEO, Butter (formerly pig.dev)
@erikdunteman

Ivan Fioravanti
Ivan Fioravanti
Co-founder and CTO, CoreViewHQ
@ivanfioravanti

Alex Shaw
Alex Shaw
Co-Creator, Terminal-Bench
@alexgshaw

Tianbao Xie
Tianbao Xie
Main Author, OSWorld (XLang Labs)
@TianbaoX

Erik Dunteman
Erik Dunteman
CEO, Butter (formerly pig.dev)
@erikdunteman

Ivan Fioravanti
Ivan Fioravanti
Co-founder and CTO, CoreViewHQ
@ivanfioravanti

Alex Shaw
Alex Shaw
Co-Creator, Terminal-Bench
@alexgshaw

view agent performance

agent performance

view full leaderboard ↗
Claude Haiku 4.5
68.4%
Claude Sonnet 4.5
59.1%
UI-TARS-2
58%
OpenAI GPT-5.2
57.8%
OpenAI CUA
57.8%
Gemini CUA
54.2%
task resolution success-rate for top agents and models on cua-bench2.0
Coming Soon
The leaderboard is being prepared. Check back soon!
view the full leaderboard →
view cua-bench task examples
view all cua-bench tasks →