The Agent Skills Directory

[COMMAND_EXECUTION]: The skill provides patterns for executing arbitrary shell commands via the Anthropic bash tool and Python's subprocess module, which are used to capture screenshots and perform system operations.
[COMMAND_EXECUTION]: Utilizes the pyautogui library to simulate human interactions such as clicking, typing, and scrolling, granting the agent full control over the desktop GUI.
[PROMPT_INJECTION]: The architecture employs a vision-based reasoning loop (Category 8) where the agent analyzes screenshots to determine its next action. This creates a risk of indirect prompt injection where malicious instructions embedded in a website or document displayed on screen could hijack the agent's logic.
Ingestion points: Visual data enters the system through the capture_screenshot method in perception-reasoning-action-loop.md and anthropic-computer-use-implementation.md.
Boundary markers: No explicit markers or "ignore instruction" guidelines are implemented to help the vision model distinguish between safe system instructions and untrusted content found in screenshots.
Capability inventory: The skill enables high-risk actions including arbitrary shell execution (BetaToolBash20241022) and low-level GUI input manipulation (pyautogui).
Sanitization: The implementation lacks automated sanitization or human-in-the-loop verification steps for actions derived from vision-based reasoning.

computer-use-agents