cua-bench: make your agents better at computers

cua-bench is a collection of desktop and mobile tasks with a harness for evaluation and training to help agent makers quantify their agents' computer-use mastery.

introducing cua-bench

read our launch announcement →

Tech Report

interested in|

built with  by cua.ai

Tianbao Xie

Main Author, OSWorld (XLang Labs)

@TianbaoX

Erik Dunteman

CEO, Butter (formerly pig.dev)

@erikdunteman

Ivan Fioravanti

Co-founder and CTO, CoreViewHQ

@ivanfioravanti

Alex Shaw

Co-Creator, Terminal-Bench

@alexgshaw

Tianbao Xie

Main Author, OSWorld (XLang Labs)

@TianbaoX

Erik Dunteman

CEO, Butter (formerly pig.dev)

@erikdunteman

Ivan Fioravanti

Co-founder and CTO, CoreViewHQ

@ivanfioravanti

Alex Shaw

Co-Creator, Terminal-Bench

@alexgshaw

view agent performance

▼

agent performance

view full leaderboard ↗

Claude Haiku 4.5

68.4%

Claude Sonnet 4.5

59.1%

UI-TARS-2

58%

OpenAI GPT-5.2

57.8%

OpenAI CUA

57.8%

Gemini CUA

54.2%

task resolution success-rate for top agents and models on cua-bench2.0

Coming Soon

The leaderboard is being prepared. Check back soon!

view the full leaderboard →

view cua-bench task examples

▼

view all cua-bench tasks →