ReferenceSandbox SDK

Interfaces Reference

Auto-generated reference for all cua-sandbox interface classes. Do not edit manually � run scripts/gen_interface_docs.py to regenerate.

This page is auto-generated from the Python source docstrings. Run python scripts/gen_interface_docs.py to regenerate after editing interface code.

The sandbox exposes the following interface objects on every Sandbox instance:

AttributeClassPurpose
sb.shellShellRun shell commands
sb.mouseMouseMouse control
sb.keyboardKeyboardKeyboard control
sb.screenScreenScreenshots and screen info
sb.clipboardClipboardClipboard read/write
sb.tunnelTunnelPort forwarding
sb.terminalTerminalPTY terminal sessions
sb.windowWindowWindow management
sb.mobileMobileMobile-specific actions

Shell

Shell command execution.

Methods

async run(command: str, timeout: int = 30) -> CommandResult

Run a shell command and return the result.


CommandResult

Fields

NameType
stdoutstr
stderrstr
returncodeint

Methods

`@property

success() -> bool`


Mouse

Mouse control.

Methods

async click(x: int, y: int, button: str = 'left') -> None

async right_click(x: int, y: int) -> None

async double_click(x: int, y: int) -> None

async move(x: int, y: int) -> None

async scroll(x: int, y: int, scroll_x: int = 0, scroll_y: int = 3) -> None

async mouse_down(x: int, y: int, button: str = 'left') -> None

async mouse_up(x: int, y: int, button: str = 'left') -> None

async drag(start_x: int, start_y: int, end_x: int, end_y: int, button: str = 'left') -> None


Keyboard

Keyboard control.

Methods

async type(text: str) -> None

Type a string of text.

async keypress(keys: Union[List[str], str]) -> None

Press a key combination (e.g. ["ctrl", "c"] or "enter").

async key_down(key: str) -> None

async key_up(key: str) -> None


Screen

Screen capture and info.

Methods

async screenshot(format: str = 'png', quality: int = 95) -> bytes

Capture a screenshot and return raw image bytes.

Args: format: "png" (lossless, default) or "jpeg" (lossy, ~5-10x smaller). quality: JPEG quality 1-95, ignored for PNG.

async screenshot_base64(format: str = 'png', quality: int = 95) -> str

Capture a screenshot and return as a base64-encoded string.

async size() -> Tuple[int, int]

Return (width, height) of the screen.


Clipboard

Clipboard read/write.

Methods

async get() -> str

Return the current clipboard text.

async set(text: str) -> None

Set the clipboard text.


Tunnel

Port-forwarding interface — exposes sandbox ports on the host.

Methods

forward(*ports) -> _TunnelContext

Forward one or more sandbox ports (or abstract sockets) to the host.

ports may be:

  • int — TCP port inside the sandbox (e.g. 8080)
  • str — Android abstract socket name (e.g. "chrome_devtools_remote")

Returns a context manager (or awaitable) that yields:

  • a single :class:TunnelInfo when one target is given
  • a dict[sandbox_port, TunnelInfo] when multiple targets are given

Example — Chrome DevTools over ADB::

async with sb.tunnel.forward("chrome_devtools_remote") as t:

http://localhost:<random>/json lists CDP targets

print(t.url)


TunnelInfo

A single forwarded port.

Methods

`@property

url() -> str`

async close() -> None

Close this tunnel (no-op if already closed or inside a context manager).


Terminal

PTY terminal sessions.

Methods

async create(shell: str = 'bash', cols: int = 80, rows: int = 24) -> dict

Create a new PTY session. Returns {"pid": int, "cols": int, "rows": int}.

async send_input(pid: int, data: str) -> None

Send input to a PTY session.

async resize(pid: int, cols: int, rows: int) -> None

Resize a PTY session.

async close(pid: int) -> Optional[int]

Close a PTY session. Returns exit code.


Window

Window management.

Methods

async get_active_title() -> str

Return the title of the currently focused window.


Mobile

Mobile (Android) touch and hardware-key control.

All touch coordinates are in screen pixels. Single-touch methods use input tap/swipe via adb shell. Multi-touch gestures delegate to the multitouch_gesture transport action, which uses adb root + MT Protocol B sendevent for reliable injection on both local and cloud transports.

Methods

async tap(x: int, y: int) -> None

async long_press(x: int, y: int, duration_ms: int = 1000) -> None

async double_tap(x: int, y: int, delay: float = 0.1) -> None

async type_text(text: str) -> None

async swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300) -> None

async scroll_up(x: int, y: int, distance: int = 600, duration_ms: int = 400) -> None

async scroll_down(x: int, y: int, distance: int = 600, duration_ms: int = 400) -> None

async scroll_left(x: int, y: int, distance: int = 400, duration_ms: int = 300) -> None

async scroll_right(x: int, y: int, distance: int = 400, duration_ms: int = 300) -> None

async fling(x1: int, y1: int, x2: int, y2: int) -> None

async gesture(duration_ms: int = 400, steps: int = 0, *finger_paths) -> None

Inject an arbitrary N-finger gesture via MT Protocol B sendevent.

Each positional argument is a sequence of (x, y) waypoints for one finger. Pass an even number of (x, y) tuples and they are paired into start/end positions per finger::

two-finger pinch-out

await mobile.gesture( (cx - 20, cy), (cx - 200, cy), # finger 0 (cx + 20, cy), (cx + 200, cy), # finger 1 )

Delegates to the multitouch_gesture transport action, which uses adb root + MT Protocol B sendevent for reliable injection on both local ADB and cloud HTTP transports.

Args: *finger_paths: Alternating start/end (x, y) tuples, two per finger. Must have an even length >= 4 (at least 2 fingers × 2 points). duration_ms: Total gesture duration in milliseconds. steps: Interpolation steps (0 = auto: duration_ms // 20, min 5).

async pinch_in(cx: int, cy: int, spread: int = 300, duration_ms: int = 400) -> None

Pinch-in (zoom out) with two real simultaneous fingers.

async pinch_out(cx: int, cy: int, spread: int = 300, duration_ms: int = 400) -> None

Pinch-out (zoom in) with two real simultaneous fingers.

async key(keycode: int) -> None

async home() -> None

async back() -> None

async recents() -> None

async power() -> None

async volume_up() -> None

async volume_down() -> None

async enter() -> None

async backspace() -> None

async wake() -> None

async notifications() -> None

async close_notifications() -> None


Was this page helpful?


On this page

ShellMethodsasync run(command: str, timeout: int = 30) -> CommandResultCommandResultFieldsMethods`@propertyMouseMethodsasync click(x: int, y: int, button: str = 'left') -> Noneasync right_click(x: int, y: int) -> Noneasync double_click(x: int, y: int) -> Noneasync move(x: int, y: int) -> Noneasync scroll(x: int, y: int, scroll_x: int = 0, scroll_y: int = 3) -> Noneasync mouse_down(x: int, y: int, button: str = 'left') -> Noneasync mouse_up(x: int, y: int, button: str = 'left') -> Noneasync drag(start_x: int, start_y: int, end_x: int, end_y: int, button: str = 'left') -> NoneKeyboardMethodsasync type(text: str) -> Noneasync keypress(keys: Union[List[str], str]) -> Noneasync key_down(key: str) -> Noneasync key_up(key: str) -> NoneScreenMethodsasync screenshot(format: str = 'png', quality: int = 95) -> bytesasync screenshot_base64(format: str = 'png', quality: int = 95) -> strasync size() -> Tuple[int, int]ClipboardMethodsasync get() -> strasync set(text: str) -> NoneTunnelMethodsforward(*ports) -> _TunnelContexthttp://localhost:<random>/json lists CDP targetsTunnelInfoMethods`@propertyasync close() -> NoneTerminalMethodsasync create(shell: str = 'bash', cols: int = 80, rows: int = 24) -> dictasync send_input(pid: int, data: str) -> Noneasync resize(pid: int, cols: int, rows: int) -> Noneasync close(pid: int) -> Optional[int]WindowMethodsasync get_active_title() -> strMobileMethodsasync tap(x: int, y: int) -> Noneasync long_press(x: int, y: int, duration_ms: int = 1000) -> Noneasync double_tap(x: int, y: int, delay: float = 0.1) -> Noneasync type_text(text: str) -> Noneasync swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300) -> Noneasync scroll_up(x: int, y: int, distance: int = 600, duration_ms: int = 400) -> Noneasync scroll_down(x: int, y: int, distance: int = 600, duration_ms: int = 400) -> Noneasync scroll_left(x: int, y: int, distance: int = 400, duration_ms: int = 300) -> Noneasync scroll_right(x: int, y: int, distance: int = 400, duration_ms: int = 300) -> Noneasync fling(x1: int, y1: int, x2: int, y2: int) -> Noneasync gesture(duration_ms: int = 400, steps: int = 0, *finger_paths) -> Nonetwo-finger pinch-outasync pinch_in(cx: int, cy: int, spread: int = 300, duration_ms: int = 400) -> Noneasync pinch_out(cx: int, cy: int, spread: int = 300, duration_ms: int = 400) -> Noneasync key(keycode: int) -> Noneasync home() -> Noneasync back() -> Noneasync recents() -> Noneasync power() -> Noneasync volume_up() -> Noneasync volume_down() -> Noneasync enter() -> Noneasync backspace() -> Noneasync wake() -> Noneasync notifications() -> Noneasync close_notifications() -> None