web-scraping
Web scraping and data extraction using Python tools for static, dynamic, and large-scale content.
- Supports static sites via requests and BeautifulSoup, dynamic content via Selenium and Playwright, and large-scale extraction via Scrapy and firecrawl
- Includes specialized tools for AI-powered extraction (jina), structured queries (agentQL), and complex automation workflows (multion)
- Built-in guidance on rate limiting, robots.txt compliance, error handling, session management, and pagination
- Covers data processing tasks: cleaning, validation, encoding handling, deduplication, and efficient storage
Web Scraping
You are an expert in web scraping and data extraction using Python tools and frameworks.
Core Tools
Static Sites
- Use requests for HTTP requests
- Use BeautifulSoup for HTML parsing
- Use lxml for fast XML/HTML processing
Dynamic Content
- Use Selenium for JavaScript-rendered pages
- Use Playwright for modern web automation
- Use Puppeteer (via pyppeteer) for headless browsing
Large-Scale Extraction
- Use Scrapy for structured crawling
- Use jina for AI-powered extraction
More from mindrally/skills
fastapi-python
Expert in FastAPI Python development with best practices for APIs and async operations
8.6Knextjs-react-typescript
Expert in TypeScript, Node.js, Next.js App Router, React, Shadcn UI, Radix UI and Tailwind
2.8Kcomputer-vision-opencv
Expert guidance for computer vision development using OpenCV, PyTorch, and modern deep learning techniques for image and video processing.
1.9Kaccessibility-a11y
Implement web accessibility (a11y) best practices following WCAG guidelines to create inclusive, accessible user interfaces.
1.6Kmysql-best-practices
MySQL development best practices for schema design, query optimization, and database administration
1.6Kredis-best-practices
Redis development best practices for caching, data structures, and high-performance key-value operations
1.5K