crawl
Extract and save website content as markdown files for offline access and analysis.
- Supports configurable crawl depth (1-5 levels), breadth limits, and page caps to balance coverage against performance
- Includes path filtering via regex patterns to focus on specific sections and exclude irrelevant content
- Offers two modes: full-page extraction for data collection, or semantic chunking with natural language instructions for feeding results into LLM context
- Provides a companion Map API for URL discovery without content extraction, useful for understanding site structure before full crawls
- Authenticates via OAuth (Tavily account required) or API key; saves crawled pages as individual markdown files when output directory is specified
Crawl Skill
Crawl websites to extract content from multiple pages. Ideal for documentation, knowledge bases, and site-wide content extraction.
Authentication
The script uses OAuth via the Tavily MCP server. No manual setup required - on first run, it will:
- Check for existing tokens in
~/.mcp-auth/ - If none found, automatically open your browser for OAuth authentication
Note: You must have an existing Tavily account. The OAuth flow only supports login — account creation is not available through this flow. Sign up at tavily.com first if you don't have an account.
Alternative: API Key
If you prefer using an API key, get one at https://tavily.com and add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
More from tavily-ai/skills
tavily-search
|
16.6Ksearch
Search the web using Tavily's LLM-optimized search API. Returns relevant results with content snippets, scores, and metadata. Use when you need to find web content on any topic without writing code.
11.9Ktavily-best-practices
Build production-ready Tavily integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents.
9.6Ktavily-research
|
9.3Kresearch
Comprehensive research grounded in web data with explicit citations. Use when you need multi-source synthesis—comparisons, current events, market analysis, detailed reports.
6.6Ktavily-extract
|
6.5K