Browser Automation with agent-browser

Drive the agent-browser CLI as the agent's hands inside a browser: open URLs, snapshot interactive refs, fill forms, click, extract DOM, screenshot, switch tabs, persist sessions, run headed/stealth, or dispatch through providers. This skill owns ad hoc, terminal-driven browser tasks where a human-style operator loop (observe → act → verify) is appropriate.

When to use this skill

Use this skill when:

the user names agent-browser, npx agent-browser, @ref snapshots, snapshot -i, or any agent-browser flag (--session-name, --profile, --headed, -p browserbase|browseruse|kernel|ios, --engine lightpanda)
the task is one-off browser automation done by the agent itself — log in, click, fill a form, scrape data, take a screenshot, capture page state
the workflow needs deterministic DOM-grounded verification (get url, get text, get value, is visible, diff snapshot) rather than asserted test code
multi-tab, popup, OAuth, or session-isolated flows must reuse one browser context across many commands
hosted, mobile, geo, or anti-bot pressure requires --headed, stealth, profiles, or a remote provider through the same CLI
another skill (convert-url-to-nextjs, extract-saas-design) needs live browser evidence — DOM, screenshots, runtime metadata, asset URLs — captured and handed back

Do NOT use this skill when:

the deliverable is TypeScript code on @onkernel/sdk or Kernel Apps → use build-kernel-ts-sdk
the deliverable is a rebuilt Next.js project from a captured site → ownership stays with convert-url-to-nextjs; this skill is only invoked for capture
the deliverable is a SaaS visual-system writeup → ownership stays with extract-saas-design; this skill is only invoked for browser evidence
the task is static research, DevTools-first profiling, or anything that does not require an active browser context

run-agent-browser

Browser Automation with agent-browser

When to use this skill