computer-use-agents
SKILL.md
Computer Use Agents
Patterns
Perception-Reasoning-Action Loop
The fundamental architecture of computer use agents: observe screen, reason about next action, execute action, repeat. This loop integrates vision models with action execution through an iterative pipeline.
Key components:
- PERCEPTION: Screenshot captures current screen state
- REASONING: Vision-language model analyzes and plans
- ACTION: Execute mouse/keyboard operations
- FEEDBACK: Observe result, continue or correct
Critical insight: Vision agents are completely still during "thinking" phase (1-5 seconds), creating a detectable pause pattern.