site-crawler
Site Crawler Skill
Respectfully crawl documentation sites and web content for RAG ingestion.
Overview
Documentation sites, blogs, and knowledge bases contain valuable structured content. This skill covers:
- Respectful crawling (robots.txt, rate limiting)
- Structure-preserving extraction
- Incremental updates (only fetch changed pages)
- Sitemap-based discovery
Prerequisites
# HTTP client
pip install httpx
More from mindmorass/reflex
ffmpeg-patterns
FFmpeg video and audio processing patterns. Use when transcoding video/audio, extracting clips, adding filters, merging media, creating thumbnails, or batch processing media files.
235ai-video-generation
AI video generation patterns using Sora, Runway, Pika, and other AI video tools. Use when generating videos from text prompts, image-to-video conversion, AI video editing, or integrating AI video APIs.
114n8n-patterns
Design and implement n8n workflow automations with best practices
74pdf-harvester
Extract text and data from PDF documents
46graphviz-diagrams
Create complex graph visualizations using Graphviz DOT language, with both source code and pre-rendered images.
37podcast-production
Podcast production patterns and workflows. Use when recording podcasts, editing audio, transcribing episodes, generating show notes, RSS feed management, or podcast distribution.
26