computer-use-agents

Installation
Summary

AI agents that perceive screens, reason about actions, and control computers like humans do.

  • Implements the perception-reasoning-action loop: capture screenshot, analyze with vision-language model, execute mouse/keyboard operations, repeat
  • Covers Anthropic's Computer Use (Claude 3.5 Sonnet and Opus 4.5), with tool support for screenshots, mouse/keyboard control, bash execution, and file editing
  • Requires sandboxed environments (Docker containers with virtual desktops) to isolate agents from host systems and minimize security risk
  • Includes code examples for building custom agents, containerized deployments with resource limits, and integrating official Anthropic tools
SKILL.md

Computer Use Agents

Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control.

Patterns

Perception-Reasoning-Action Loop

The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline.

Key components:

  1. PERCEPTION: Screenshot captures current screen state
  2. REASONING: Vision-language model analyzes and plans
  3. ACTION: Execute mouse/keyboard operations
Related skills
Installs
609
GitHub Stars
37.3K
First Seen
Jan 19, 2026