seo-ai-crawlers
Installation
SKILL.md
seo-ai-crawlers (M14)
Controls whether AI search engines can crawl and cite the page, and whether they can read it without JS. The training-vs-search-vs-fetch distinction is everything. Reference: references/ai-crawlers.md.
Audits
Working from the PageSnapshot (rendered_dom if present, else raw_html) plus the site robots.txt:
- Citation access: are retrieval/citation bots —
OAI-SearchBot,Claude-SearchBot,PerplexityBot,Bingbot— actually allowed (not caught by a broadDisallow: /or a wildcard block)? ConfirmGooglebotis not blocked and theGooglebot(search) vsGoogle-Extended(Gemini training control) split is correct. - User-agent classification: bucket every AI agent in
robots.txtinto training (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, CCBot), search/retrieval (OAI-SearchBot, Claude-SearchBot, PerplexityBot), and user-triggered fetch (ChatGPT-User, Claude-User, Perplexity-User). Match user-agents case-insensitively; treat the table inreferences/ai-crawlers.mdas a starting set, not exhaustive. - Renderability for non-JS crawlers: pull the M4 (seo-crawl-render) render result — most AI crawlers do not execute JS. If primary content only appears in
rendered_domand is absent fromraw_html, flag it as invisible to AI retrieval. - llms.txt / llms-full.txt (also covers M21): presence at the site root, valid Markdown structure (H1 title, summary blockquote, sectioned link lists), and that linked URLs resolve. Follow
references/ai-crawlers.md.
Fixes
- AUTO (
fixable: auto): a citation-friendlyrobots.txtpreset, choice-gated — the user picksallow-citations(default: allow search/retrieval, opt out of training),allow-all, orblock-all. Deterministic, additive, verifiable; emitted as a diff forfix. - AUTO (
fixable: auto), disclosure-gated and scored 0:llms.txt/llms-full.txt, generated from the site's own structure only on explicit request (fix --category llms), shown as a diff before writing. Additive and deterministic, but never sold as proven ranking value — the disclosure that it is low/uncertain impact is shown every time. - ADVISORY (
fixable: advisory): edge/WAF block for bots that ignorerobots.txt(e.g. Bytespider) — the tool never writes infra config. Never fabricate sitemap URLs, contact emails, or link targets — ask the user or leave a clearly-markedTODOplaceholder.