crawl

Installation
Summary

Extract and save website content as markdown files for offline access and analysis.

  • Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance
  • Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content
  • Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context
  • Provides a companion Map API for URL discovery without content extraction, useful for understanding site structure before full crawls
  • Authenticates via OAuth (Tavily account required) or API key; saves crawled pages as individual markdown files when output directory is specified
SKILL.md

Crawl Skill

Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.

Authentication

The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:

  1. Check for existing tokens in ~/.mcp-auth/
  2. If none found, automatically open your browser for OAuth authentication

Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.

Alternative: API Key

If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:

{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
Related skills
Installs
3.6K
GitHub Stars
287
First Seen
Jan 25, 2026