eyehands

AI agents on Windows are slow because every screenshot round-trips through the cloud. Claude Computer Use takes 2-4 seconds just to see the screen. eyehands runs locally and does it in 57ms.

Download the Windows .zip from fireal.dev/eyehands/ or GitHub Releases, extract, and run:

eyehands.exe

That's the whole setup. HTTP server on localhost:7331, works from any language:

curl http://localhost:7331/screenshot --output screen.jpg
curl -X POST http://localhost:7331/click_at -d "{\"x\":500,\"y\":300}"

Or use the Python SDK (pip install eyehands-sdk):

from eyehands import EyehandsClient
c = EyehandsClient()
c.screenshot("screen.jpg")       # 57ms
c.click_at(500, 300)             # instant
c.type_text("hello world")      # native SendInput, not clipboard paste

	eyehands	Claude Computer Use
Screenshot	57ms	2-4s
5-action batch	67ms	3-5s
Resolution	1920x1080	1456x816
OCR text search	Optional*	No
UI Automation	Yes	No
Games / pointer-lock	Yes	No

* OCR (/find) requires ~2GB of EasyOCR/PyTorch dependencies, not bundled in the standalone .exe. Install from source or add easyocr to system Python to enable. All other endpoints work out of the box.

7-day free trial, then $49 one-time at fireal.dev. No subscription.

How it works

A background thread captures the screen at ~20fps into a rolling buffer. The capture backend picks the fastest option at startup: BetterCam (DXGI, ~120fps) if available, then DXcam (~39fps), then mss (GDI fallback). Mouse and keyboard go through SendInput via ctypes, which hits the Raw Input pipeline. This matters because most Python automation tools use the deprecated keybd_event/mouse_event APIs, and pointer-lock apps (games, Parsec, full-screen browsers) silently ignore those.

There's also OCR via EasyOCR (results cached per-frame, so repeated lookups on the same screen are free) and Windows UI Automation for finding elements by name and type. Your agent can click "Save" instead of guessing pixel coordinates.

Everything stays on localhost. Screen data doesn't leave your machine unless you send it to a cloud LLM yourself. The only outbound traffic is license validation at startup and an update check cached for 6 hours.

Integrations

Works with any agent or framework that can make HTTP calls.

pip install eyehands-sdk              # Python client
pip install "eyehands-sdk[mcp]"       # MCP server for Claude Code / Cursor

OpenAI function calling:

from eyehands.openai_tools import tools, dispatch

Or skip the SDK and hit the endpoints directly from Go, Rust, Node, whatever.

Browser extension

eyehands ships with a browser extension (ext/ folder) that captures your screen at full resolution through the browser. Install it once and any agent gets the same screenshot quality that Claude gets with Claude in Chrome.

Works on Chrome, Edge, Opera (via CDP), and Firefox (via captureVisibleTab). With the server running, open http://localhost:7331/ext/setup for guided installation.

When connected, agents use GET /ext/screenshot instead of /screenshot. If the extension isn't running, it falls back to native capture automatically.

Endpoints

All coordinates are physical screen pixels (Per-Monitor DPI v2). POST endpoints take JSON. Auth via bearer token (printed at startup, saved to .eyehands-token).

Method	Endpoint	What it does
GET	`/ping`	Health check, version, license, extension status
GET	`/cursor_pos`	Current cursor `{x, y}`
GET	`/screenshot`	JPEG capture, full screen or region
GET	`/screenshot_b64`	Same thing, base64 JSON
GET	`/latest`	Latest frame from the background buffer
GET	`/latest_b64`	Latest frame, base64 JSON + metadata
GET	`/view`	Live HTML page with auto-refresh overlay
GET	`/find`	OCR text search, returns coordinates
GET	`/ext/screenshot`	Full-res screenshot via browser extension (falls back to native)
GET	`/ext/screenshot_b64`	Same thing, base64 JSON
GET	`/ext/setup`	Browser extension install page (no auth)
GET	`/ext/status`	Extension connection status
POST	`/move`	Relative mouse move `{dx, dy}`
POST	`/move_absolute`	Absolute move `{x, y}`
POST	`/click`	Click `{button: "left"\|"right"\|"middle"}`
POST	`/double_click`	Double click
POST	`/mousedown`	Hold button
POST	`/mouseup`	Release button
POST	`/scroll`	Scroll wheel `{dy}`
POST	`/smooth_move`	Smooth relative move for drags `{dx, dy, steps, delay_ms}`
POST	`/click_at`	Move + click in one call `{x, y}`
POST	`/click_and_wait`	Click, then wait for the screen to change
POST	`/key`	Keypress with modifiers `{vk, modifiers}`
POST	`/type_text`	Type unicode text `{text}`
POST	`/type_and_enter`	Type + press Enter
POST	`/batch`	Up to 100 actions in one call
POST	`/ui/click_element`	Find a UI element and click it
GET	`/ui/windows`	List top-level windows
GET	`/ui/find`	Search elements by name, type, depth
GET	`/ui/at`	Element at screen coordinates
GET	`/ui/tree`	Nested element tree
GET	`/license`	License status
POST	`/activate`	Activate license key
POST	`/update`	Self-update to latest version
GET	`/openapi.json`	OpenAPI spec

Limitations

Windows only. eyehands uses SendInput, Windows UI Automation, and Per-Monitor DPI v2, all of which are Win32-specific. No plans for Mac or Linux. If you need cross-platform, Claude Computer Use or PyAutoGUI are your options.

First OCR call is slow (~8 seconds) because EasyOCR loads its model. After that, results are cached per-frame and return instantly when the screen hasn't changed.

The server is single-instance on one machine. Multiple agents can share the same server, but you can't run two copies.

Computer Use is fine, actually

Claude Computer Use works and requires zero setup. It's the right call when you want cross-platform support, teach mode, or the built-in security features (app masking, tiered permissions). Don't switch to eyehands just because it exists.

Switch when the 2-4 second screenshot latency is killing your agent's throughput. When you need OCR or UI Automation. When you're automating something with pointer-lock. When you're building with OpenAI or local models. When you care about screen data staying local. Some people use both: Computer Use for teach mode, eyehands for the fast loops.

Detailed comparison

More comparisons

vs PyAutoGUI · vs AutoHotkey · vs AutoIt · vs SikuliX · vs nut.js · vs Robot Framework · vs NirCmd · vs just screenshotting

Pricing

7-day trial, all features. Then $49 one-time at fireal.dev. Per-machine activation, no subscription.

Issues · fireal.dev · fireal6353@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
comparisons		comparisons
CODE_INDEX.md		CODE_INDEX.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eyehands

How it works

Integrations

Browser extension

Endpoints

Limitations

Computer Use is fine, actually

More comparisons

Pricing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

eyehands

How it works

Integrations

Browser extension

Endpoints

Limitations

Computer Use is fine, actually

More comparisons

Pricing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages