tavily-crawl
Multi-page website crawler with semantic filtering and markdown export.
- Crawl entire site sections with depth and breadth control; filter by path regex, domain, or natural language instructions to focus results
- Save each page as local markdown files via
--output-dir, or return structured JSON for agentic processing - Use semantic instructions with chunk extraction to prevent context bloat when feeding results to LLMs; use full-page extraction for offline documentation downloads
- Supports external link following, image inclusion, timeout configuration, and regex-based path/domain filtering for precise scope control
tavily crawl
Crawl a website and extract content from multiple pages. Supports saving each page as a local markdown file.
Before running any command
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
When to use
- You need content from many pages on a site (e.g., all
/docs/)
More from tavily-ai/skills
tavily-search
|
16.6Ksearch
Search the web using Tavily's LLM-optimized search API. Returns relevant results with content snippets, scores, and metadata. Use when you need to find web content on any topic without writing code.
11.9Ktavily-best-practices
Build production-ready Tavily integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web search, content extraction, crawling, and research in agentic workflows, RAG systems, or autonomous agents.
9.6Ktavily-research
|
9.3Kresearch
Comprehensive research grounded in web data with explicit citations. Use when you need multi-source synthesis—comparisons, current events, market analysis, detailed reports.
6.6Ktavily-extract
|
6.5K