scrapy-web-scraping
Expert guidance for building scalable web scrapers and crawlers using Scrapy with best practices for spider development, data extraction, and pipeline management.
- Covers spider architecture, CSS/XPath data extraction, Item Pipelines, and middleware development for request/response handling
- Includes strategies for rate limiting, User-Agent rotation, proxy management, and handling JavaScript-rendered content with Scrapy-Splash or Scrapy-Playwright
- Provides error handling patterns, performance optimization techniques, and distributed crawling setup with Scrapy-Redis
- Emphasizes ethical scraping practices including robots.txt compliance, reasonable rate limiting, and data validation through pipelines and contracts
Scrapy Web Scraping
You are an expert in Scrapy, Python web scraping, spider development, and building scalable crawlers for extracting data from websites.
Core Expertise
- Scrapy framework architecture and components
- Spider development and crawling strategies
- CSS Selectors and XPath expressions for data extraction
- Item Pipelines for data processing and storage
- Middleware development for request/response handling
- Handling JavaScript-rendered content with Scrapy-Splash or Scrapy-Playwright
- Proxy rotation and anti-bot evasion techniques
- Distributed crawling with Scrapy-Redis
Key Principles
- Write clean, maintainable spider code following Python best practices
- Use modular spider architecture with clear separation of concerns
- Implement robust error handling and retry mechanisms
More from mindrally/skills
fastapi-python
Expert in FastAPI Python development with best practices for APIs and async operations
8.6Knextjs-react-typescript
Expert in TypeScript, Node.js, Next.js App Router, React, Shadcn UI, Radix UI and Tailwind
2.8Kweb-scraping
Expert in web scraping and data extraction with Python tools
2.3Kcomputer-vision-opencv
Expert guidance for computer vision development using OpenCV, PyTorch, and modern deep learning techniques for image and video processing.
1.9Kaccessibility-a11y
Implement web accessibility (a11y) best practices following WCAG guidelines to create inclusive, accessible user interfaces.
1.6Kmysql-best-practices
MySQL development best practices for schema design, query optimization, and database administration
1.6K