Scrape
Installation
SKILL.md
Pre-Scrape Compliance Checklist
Before writing any scraping code:
- robots.txt — Fetch
{domain}/robots.txt, check if target path is disallowed. If yes, stop. - Terms of Service — Check
/terms,/tos,/legal. Explicit scraping prohibition = need permission. - Data type — Public factual data (prices, listings) is safer. Personal data triggers GDPR/CCPA.
- Authentication — Data behind login is off-limits without authorization. Never scrape protected content.
- API available? — If site offers an API, use it. Always. Scraping when API exists often violates ToS.
Legal Boundaries
- Public data, no login — Generally legal (hiQ v. LinkedIn 2022)
- Bypassing barriers — CFAA violation risk (Van Buren v. US 2021)
- Ignoring robots.txt — Gray area, often breaches ToS (Meta v. Bright Data 2024)
- Personal data without consent — GDPR/CCPA violation
- Republishing copyrighted content — Copyright infringement