FAQ
Frequently asked questions about Cua Driver
Common gotchas and questions. For the full action loop and tool semantics, see the CLI reference and MCP tools reference.
The action loop
Why do I get No cached AX state?
Element-indexed actions read an in-memory cache keyed on (pid, window_id). The cache is populated by get_window_state and replaced on every snapshot.
Two common causes:
- You didn't call
get_window_statein the current turn. - You called it with a different
window_idthan the one in the action.
# Populate the cache for this exact (pid, window_id).
cua-driver call get_window_state '{"pid":844,"window_id":10725}'
# Then act with the same window_id.
cua-driver call click '{"pid":844,"window_id":10725,"element_index":14}'If you're running one-shot CLI invocations without a daemon, the cache lives for one process lifetime. Start the daemon first:
open -n -g -a CuaDriver --args serveWhy did my screenshot come back empty (has_screenshot: false)?
The window capture raced against a close, or the window has no backing store yet. Re-snapshot. If it persists, pick a different window_id via list_windows.
screenshot or get_window_state fails with "ScreenCaptureKit refused this window" / "Could not start streaming".
Known macOS 26.4.x ScreenCaptureKit regression on physical Macs (SCStreamError code -3801, sometimes localized — e.g. Japanese "オーディオ/ビデオの取り込みがうまくいかなかったため、ストリーミングを開始できませんでした"). The driver already:
- Retries the SCK call once after a brief delay (covers transient failures).
- Falls back to the legacy
CGWindowListCreateImagepath (works on many windows the SCK regression breaks).
If both refuse, the error surfaces with an actionable hint. Workarounds in order of preference:
-
Try a different
window_idon the same app. Usually only one specific window is hit. -
Switch to AX-only capture for that workflow. Element-indexed clicks don't need pixels:
cua-driver config set capture_mode axget_window_statethen returns the AX tree without attempting a screenshot, andclick({pid, window_id, element_index: N})works as before. -
Re-snapshot a moment later. The failure is sometimes transient.
get_window_state does not hard-fail on this error: the AX tree still ships in the response with a warning line, so element-indexed clicks keep working even when the screenshot is unavailable. The standalone screenshot tool does hard-fail (no AX tree to fall back to).
The AX tree is tiny. What's happening?
Check the capture mode:
cua-driver config get capture_modeDefault is som (tree + screenshot). If it reads vision, get_window_state omits the tree by design (PNG only). Switch back to som to get both:
cua-driver config set capture_mode somIf the mode is som or ax and the tree is still small, the target uses custom rendering (Blender, Unity, Electron with AX disabled). For Chromium/Electron, retry get_window_state once — the tree populates on the second call. For canvas-backed apps, reach for pixel clicks instead.
Window state
My keyboard commit (Return, Space, Tab) on a minimized window silently no-ops.
Minimized windows receive AX reads and AX-dispatched clicks normally, but keyboard commits fail because AX focus doesn't propagate to renderer focus on a minimized window. You hear the macOS system-alert beep, or nothing happens.
Workarounds in order of preference:
- Use
set_valueto write the field's entire value directly. Bypasses keyboard commits. - AX-click a commit-equivalent button (Go, Submit, checkbox). Clicks route through
AXPressand don't need renderer focus. - Last resort: ask the user to un-minimize the window. Don't deminiaturize programmatically — layout-disrupting on many apps.
My backgrounded SwiftUI app (System Settings) returns an almost-empty AX tree.
Windows on another Space often strip their AX tree to the menu bar on SwiftUI apps. AppKit apps are usually fine.
get_window_state returns off_space: true plus window_space_ids when this happens, so you can detect it. Solutions:
- Ask the user to Mission-Control back to the Space that holds the target.
- Drive the app through in-window toolbar buttons (which often stay exposed) rather than deep nested controls.
- Accept the limitation for the current session.
Browsers and Electron
Right-click on Chromium web content fires as a left-click.
A known Chromium renderer-IPC limit: the filter coerces synthetic right-click subtype to left on every non-HID-tap path. Use right_click({pid, element_index}) on AX-addressable targets (links, buttons, toolbar items). For web content itself (right-clicking an image or selection), there is no backgrounded path today. See Limits for the full note.
Pixel click on a YouTube video doesn't play or pause.
HTML5's click-to-play handler rejects some synthetic click paths. Use keyboard instead:
cua-driver call press_key '{"pid":<pid>,"key":"k"}' # YouTube play/pause
cua-driver call press_key '{"pid":<pid>,"key":"space"}' # generic video play/pauseKeyboard events travel through a different auth envelope and reach the page.
How do I navigate to a URL in Chrome without stealing focus?
Pass the URL to launch_app:
cua-driver call launch_app '{"bundle_id":"com.google.Chrome","urls":["https://trycua.com"]}'The URL opens in a new window via Chrome's application(_:open:) delegate. The driver's focus-restore guard catches Chrome's internal activation and clobbers the frontmost back to what it was before the call.
Don't use hotkey ⌘L to focus the omnibox. Even when delivered to a backgrounded pid, ⌘L steals
focus because the receiving app interprets "user wants to type here" as activation intent.
The agent cursor
How do I disable the visual cursor overlay?
cua-driver call set_agent_cursor_enabled '{"enabled":false}'Or via config:
cua-driver config set agent_cursor.enabled falseThe overlay only renders when the driver has an AppKit run loop (inside cua-driver serve or cua-driver mcp). One-shot CLI invocations skip it entirely.
Can I make the cursor move faster?
Tune the motion knobs:
cua-driver call set_agent_cursor_motion '{"glide_duration_ms":300}'See set_agent_cursor_motion in the MCP tools reference for every knob.
Concurrency & multiple agents
Two agents (or subagents) take turns instead of running in parallel. Why?
The daemon is concurrent — it handles each connection on its own task, and proves it: two raw socket connections can drive two cursors simultaneously. The bottleneck is the stdio MCP transport: an MCP client (e.g. Claude Code) spawns one cua-driver mcp process per server config and shares it across all subagents, and a single stdio pipe carries one in-flight request at a time. So tool calls serialize at the transport, upstream of cua-driver. Sessions give concurrent runs distinct cursors; they don't give them distinct connections, and parallelism needs distinct connections.
Claude Code only parallelizes tool calls it deems concurrency-safe (readOnlyHint:true). cua-driver's read-only tools (including move_cursor) already parallelize; mutating tools (click, type_text) serialize by design — parallelizing an ordered sequence like 3 → + → 1 → = would race.
How do I run multiple agents truly in parallel?
Give each agent its own connection. Two options:
- Separate
cua-driver mcpprocesses — e.g. two Claude Code instances. Each spawns its own proxy → its own daemon connection → the (concurrent) daemon runs them in parallel. - The HTTP transport — start the daemon with
CUA_DRIVER_RS_MCP_HTTP_PORT=<port>and it also serves MCP over HTTP atPOST http://127.0.0.1:<port>/mcp(loopback only). Point each agent's MCP client at that URL; each opens its own HTTP connection and they run concurrently (measured 3.6× on 10 parallel calls). Per-connection ordering keeps each agent's sequence correct; the per-(pid, window_id)cache + per-session cursor make concurrent cross-connection actions safe.
# daemon with the HTTP MCP endpoint enabled
CUA_DRIVER_RS_MCP_HTTP_PORT=8787 cua-driver serve
# sanity check
curl -s -XPOST http://127.0.0.1:8787/mcp \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"start_session","arguments":{"session":"agent-1"}}}'Permissions
check_permissions says NOT granted but I granted both.
TCC checks the calling process, not CuaDriver.app. Inside IDE terminals (Claude Code, Cursor, VS Code, Conductor), the shell inherits the IDE's TCC responsibility chain. So running cua-driver call check_permissions in one of those shells reads against the IDE's bundle, not com.trycua.driver.
Start the daemon first, which runs through LaunchServices under the CuaDriver bundle:
open -n -g -a CuaDriver --args serve
cua-driver call check_permissions # forwards to the daemon — authoritative answerOr use the dedicated verbs, which handle attribution for you: cua-driver permissions grant launches
CuaDriver via LaunchServices and waits for the grant, and cua-driver permissions status reports the
driver's real state through the daemon — when no daemon is running it answers ❓ unknown rather than
the calling terminal's grants, so it never reports a false granted.
cua-driver mcp from an IDE terminal can't see Accessibility.
This is the same TCC-attribution issue as the previous question, applied to the stdio MCP server. cua-driver mcp detects it and auto-launches the daemon via open -n -g -a CuaDriver --args serve, then proxies every MCP tool call through the daemon's Unix socket. From the MCP client's perspective nothing changes — the same stdio server, the same tool names, the same response shapes — but every AX probe now hits a process that LaunchServices attributed to CuaDriver.app. No Python bridge needed. Force the in-process path with --no-daemon-relaunch (or CUA_DRIVER_MCP_NO_RELAUNCH=1) if you really want it, e.g. when mcp is launched from CuaDriver.app directly.
I keep seeing the permissions dialog on every launch.
macOS is attributing the process to a different bundle id than the one you granted. Run cua-driver diagnose and share the output when filing an issue. It reports cdhash, team id, and which bundle TCC matched against.
After a rebuild, the driver reports NOT granted but System Settings still shows CuaDriver toggled ON.
This is a stale TCC grant. TCC pins each Accessibility / Screen-Recording grant to the app's designated requirement at grant time. If you first granted while CuaDriver.app was ad-hoc signed, the requirement is a bare cdhash H"…", which changes on every rebuild — so the grant row stays allowed but its requirement no longer matches the new binary, and re-toggling the switch doesn't help (the row already records a decision, so the prompt never re-fires).
Release builds are CI-signed with a stable identity, so this only affects the local dev loop (install-local.sh). That installer now signs with a stable self-signed certificate and, when it detects the signing identity changed since the last install, runs tccutil reset for you so the next grant re-pins cleanly. After you re-grant once on the certificate-signed build, the grant survives every future rebuild.
To clear it by hand:
tccutil reset Accessibility com.trycua.driver
tccutil reset ScreenCapture com.trycua.driver
cua-driver permissions grant # re-grant once; now pinned to the stable certConfig and telemetry
Where does config live?
~/Library/Application Support/Cua Driver/config.jsonRead and write via cua-driver config:
cua-driver config # show full config
cua-driver config get capture_mode
cua-driver config set capture_mode som
cua-driver config reset # overwrite with defaultsHow do I opt out of telemetry?
cua-driver config telemetry disableOr set CUA_DRIVER_TELEMETRY_ENABLED=0 in the environment for a one-off override.
Telemetry records anonymous subcommand usage (cua_driver_api_click, cua_driver_serve, etc). No command arguments, file paths, or personal information are collected.
Windows (cua-driver-rs)
Why do click / type_text / screenshot / list_windows return empty results?
Your cua-driver daemon is probably running in Windows Session 0 (services / SSH-launched processes). Every window-driving Win32 API — EnumWindows, PostMessage, PrintWindow, UI Automation tree walks — is scoped to the calling process's WindowStation + Desktop. Session 0 has no attached interactive desktop, so all these APIs silently succeed but return empty.
Run cua-driver doctor — it surfaces this directly:
[warn] interactive session: running in Session 0 (services); window-driving tools
(list_windows, click, type_text, screenshot, get_window_state) will return
empty results — these APIs need an attached interactive desktop.Fix: re-launch cua-driver serve from an interactive logon — RDP into the host, console session, or a scheduled task configured with /RU <user> /IT so it runs in the user's session. The non-GUI tools (list_apps for Win32 entries, get_config, doctor, telemetry plumbing) work normally in Session 0 either way.
launch_app with a UWP app fails with "requires an interactive session" in Session 0.
Same root cause: IApplicationActivationManager::ActivateApplication needs the per-user AppX runtime which only exists in interactive logons. Until you re-run the daemon from a Session 1+ logon, fall back to the Win32 path (launch_app {"path":"C:\\Windows\\System32\\notepad.exe"}) — ShellExecuteEx-based launches work in Session 0 for anything reachable via PATH.
Testing
Are there tests I can run against my install?
The project includes Python integration tests under libs/cua-driver/tests/ that exercise the real cua-driver stdio server against unittest. They run as part of scripts/test.sh:
cd libs/cua-driver
./scripts/test.shFor a quick manual smoke check, the Calculator test from libs/cua-driver/Skills/cua-driver/TESTS.md is a good five-minute run: launch Calculator hidden, snapshot, click 17 × 23 by element index, re-snapshot, verify the display reads 391 and Calculator never came to the foreground.
Was this page helpful?