seo-crawlability (M1)

Crawl access is the precondition for every other search signal: if Googlebot/Bingbot can't fetch the page and its assets, nothing else ranks. This module covers general-purpose crawl access only. AI-specific bot directives (GPTBot, Claude-SearchBot, etc.) and llms.txt live in seo-ai-crawlers (M14/M21); see references/ai-crawlers.md for that boundary.

Audits

Working from the PageSnapshot (rendered_dom if present, else raw_html) plus a fetch of /robots.txt:

Reachability: /robots.txt returns 200 (a 404 means "allow all" but is worth flagging; a 5xx can suspend crawling).
Syntax: each line is a valid directive (User-agent, Disallow, Allow, Sitemap, Crawl-delay); flag unknown tokens, missing User-agent group headers, and BOM/encoding issues.
Asset blocking: any Disallow that blocks CSS/JS, fonts, or /wp-includes/-style paths — this breaks rendering and is a leading cause of "page looks broken to Google" (cross-check with M-render).
Content blocking: Disallow rules that hide important indexable paths from Googlebot/Bingbot.
Crawl-delay sanity: a large Crawl-delay (or one applied to the global group) can starve crawl budget; note that Googlebot ignores Crawl-delay but Bingbot honors it.
Sitemap directive: presence of at least one absolute Sitemap: URL.
Overall access: resolve the effective ruleset for Googlebot and Bingbot against the audited URL — does it end up allowed?