playwright-scraper
Web scraping for dynamic content, authentication, pagination, and data extraction using Playwright.
- Handles JavaScript-rendered sites, login flows, and multi-page navigation with built-in wait strategies and selector management
- Supports headless and visible browser modes, with async patterns for reliable automation across flaky elements
- Extracts data via selectors with JSON output, captures screenshots and PDFs, and manages cookies and sessions per context
- Configure via JSON files or environment variables; integrates with Node.js 14+ and supports proxy settings for network flexibility
playwright-scraper
Purpose
This skill enables web scraping using Playwright, a Node.js library for browser automation. It focuses on handling dynamic content, authentication flows, pagination, data extraction, and screenshots to reliably scrape modern websites.
When to Use
Use this skill for scraping sites with JavaScript-rendered content (e.g., React or Angular apps), sites requiring login (e.g., dashboards), handling multi-page results (e.g., search results), or capturing visual data (e.g., screenshots for verification). Avoid for static HTML sites where simpler tools like requests suffice.
Key Capabilities
- Dynamically load and interact with content using Playwright's browser control.
- Manage authentication flows, such as logging in via forms or API tokens.
- Handle pagination by navigating pages, clicking "next" buttons, or parsing URLs.
- Extract data using selectors, with options for JSON output or file saves.
- Capture screenshots or full-page PDFs for debugging or reporting.
- Supports headless or visible browser modes for flexibility.
Usage Patterns
Always initialize a browser context first, then create pages for navigation. Use async patterns for reliability. For authenticated scraping, handle cookies or sessions per context. Structure scripts to loop through pages for pagination and use try-catch for flaky elements. Pass configurations via JSON files or environment variables for reusability.
More from alphaonedev/openclaw-graph
gcp-iam
Manages identity and access control for Google Cloud resources using IAM policies and roles.
370humanize-ai-text
AI text humanization: reduce AI-detection patterns, natural phrasing, tone adjustment
262macos-automation
AppleScript, JXA, Shortcuts, Automator, osascript, System Events, accessibility API
173tavily-web-search
Tavily: web search optimized for AI agents, answer synthesis, domain filtering, depth control
155clawflows
OpenClaw workflow automation: multi-step task chains, conditional logic, triggers, schedule
102backtesting
Test trading strategies on historical data to evaluate performance, risks, and profitability.
98