Web Scraper — API-First Approach via HAR Capture

This skill extracts structured data from websites by capturing network traffic rather than parsing HTML. The core idea: every dynamic web page fetches its data from backend APIs. By recording the browser's network activity, you can discover those APIs and call them directly — producing scraping code that is faster, more reliable, and less likely to break when the page redesign happens.

Always use the agent-browser skill for all browser operations. Do not use curl, fetch, wget, or similar tools to load pages. The browser handles JavaScript rendering, authentication cookies, and dynamic content that simple HTTP clients miss.

Workflow Overview

1. Set up temp workspace
2. Open URL in headed browser + start HAR recording
3. Handle authentication if needed (user logs in manually)
4. Wait for page to fully load, interact if needed to trigger data requests
5. Stop HAR recording → save .har to temp workspace
6. Analyze .har to identify data-serving API endpoints
7. Write Python scraping code that calls those APIs
8. Run the code → save results to current directory

web-scraper

Web Scraper — API-First Approach via HAR Capture

Workflow Overview

More from ysm-dev/skills

duckdb-cli

web-search

wachi

ddgr

findweb

csv-analyzer