web-scraper
Web Scraper — API-First Approach via HAR Capture
This skill extracts structured data from websites by capturing network traffic rather than parsing HTML. The core idea: every dynamic web page fetches its data from backend APIs. By recording the browser's network activity, you can discover those APIs and call them directly — producing scraping code that is faster, more reliable, and less likely to break when the page redesign happens.
Always use the agent-browser skill for all browser operations. Do not use curl, fetch, wget, or similar tools to load pages. The browser handles JavaScript rendering, authentication cookies, and dynamic content that simple HTTP clients miss.
Workflow Overview
1. Set up temp workspace
2. Open URL in headed browser + start HAR recording
3. Handle authentication if needed (user logs in manually)
4. Wait for page to fully load, interact if needed to trigger data requests
5. Stop HAR recording → save .har to temp workspace
6. Analyze .har to identify data-serving API endpoints
7. Write Python scraping code that calls those APIs
8. Run the code → save results to current directory
More from ysm-dev/skills
duckdb-cli
Query and analyze data using the DuckDB CLI. Use when the user needs to run SQL queries, analyze CSV/Parquet/JSON files, create or query databases, export data, or perform any ad-hoc data analysis from the command line. Triggers include requests to "query a file", "analyze data", "run SQL", "read a CSV/Parquet/JSON", "create a database", "export to CSV/Parquet", or any data analysis task that benefits from SQL.
18web-search
DuckDuckGo web search from the command line using ddgr. Use this skill whenever you need to search the web for information -- looking up documentation, researching error messages, finding API references, checking current facts, comparing libraries, or answering questions that require up-to-date information. Trigger this proactively whenever a task would benefit from a web search, even if the user didn't explicitly ask you to search. Also use this when the user asks you to install or set up ddgr.
6wachi
Install, configure, and use the wachi CLI to monitor any URL for new content and get notifications via 90+ services (Slack, Discord, Telegram, email, etc.). Use when the user wants to: (1) subscribe to web pages, blogs, YouTube channels, or RSS feeds for change notifications, (2) set up URL monitoring with wachi sub/check/ls commands, (3) configure notification channels via apprise URLs, (4) schedule periodic checks with cron, (5) troubleshoot wachi errors or configuration, or (6) understand how wachi detects changes (RSS auto-discovery, LLM-based CSS selectors).
5ddgr
Search the web using ddgr (DuckDuckGo from the terminal) to find current information, documentation, error solutions, package versions, API references, or any real-world data. Use this skill whenever the user asks to look something up online, search the web, find recent information, check documentation, research a topic, troubleshoot an error message, or needs any data that might not be in your training set. Also use it when you need to verify facts, find URLs, or when a task would clearly benefit from current web information even if the user didn't explicitly ask you to search. Trigger on phrases like "search for", "look up", "what's the latest", "find me", "google", "how do I fix this error", or any request for up-to-date information.
3findweb
Google search from the command line using findweb. Use this skill whenever you need to search the web for information — looking up documentation, researching error messages, finding API references, checking current facts, comparing libraries, or answering questions that require up-to-date information. Trigger this proactively whenever a task would benefit from a web search, even if the user didn't explicitly ask you to search. Also use this when the user asks you to install or set up findweb.
3csv-analyzer
Analyze and process large CSV files (1M+ rows) using DuckDB and Polars. Use when the user asks to analyze, query, filter, aggregate, join, or transform CSV data. Triggers on requests like "analyze this CSV", "query CSV file", "filter large dataset", "aggregate CSV data", "join CSV files", "CSV statistics", or any data analysis task involving CSV/TSV/Parquet files.
2