page-agent

Installation
SKILL.md

page-agent

alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.

When to use this skill

Load this skill when a user wants to:

  • Ship an AI copilot inside their own web app (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
  • Modernize a legacy web app without rewriting the frontend — page-agent drops on top of existing DOM
  • Add accessibility via natural language — voice / screen-reader users drive the UI by describing what they want
  • Demo or evaluate page-agent against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
  • Build interactive training / product demos — let an AI walk a user through "how to submit an expense report" live in the real UI

When NOT to use this skill

  • User wants Hermes itself to drive a browser → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the opposite direction.
  • User wants cross-tab automation without embedding → use Playwright, browser-use, or the page-agent Chrome extension
  • User needs visual grounding / screenshots → page-agent is text-DOM only; use a multimodal browser agent instead
Installs
5
GitHub Stars
186.9K
First Seen
Apr 27, 2026
page-agent — nousresearch/hermes-agent