Midscene Browser Automation
Midscene Browser Automation
Automate browser interactions using Midscene with Claude. This skill provides natural language control over a Chrome browser through command-line tools for navigation, interaction, data extraction, and screenshots.
Overview
This skill uses a CLI-based approach where Claude calls browser automation commands via bash. The browser stays open between commands for faster sequential operations and preserves browser state (cookies, sessions, etc.).
Key Features:
- 🧠 Natural language understanding of page elements
- 🎯 Intelligent element identification without CSS selectors
- 👁️ Visual and semantic understanding of web pages
- 🤖 AI-powered interactions and data extraction
Setup Verification
IMPORTANT: Before using any browser commands, you MUST check setup.json in this directory.
First-Time Setup Check
More from web-infra-dev/midscene-skills
desktop-computer-automation
|
2.9Kbrowser-automation
|
2.8Kandroid-device-automation
>
1.6Kios-device-automation
|
1.4Kharmonyos-device-automation
>
1.0Kvitest-midscene-e2e
Enhances Vitest with Midscene for AI-powered UI testing across Web (Playwright), Android (ADB), and iOS (WDA). Scaffolds new projects, converts existing projects, and creates/updates/debugs/runs E2E tests using natural-language UI interactions. Triggers: write test, add test, create test, update test, fix test, debug test, run test, e2e test, midscene test, new project, convert project, init project, 写测试, 加测试, 创建测试, 更新测试, 修复测试, 调试测试, 运行测试, 新建工程, 转化工程.
838