playwright-scraper

Purpose

This skill enables web scraping using Playwright, a Node.js library for browser automation. It focuses on handling dynamic content, authentication flows, pagination, data extraction, and screenshots to reliably scrape modern websites.

When to Use

Use this skill for scraping sites with JavaScript-rendered content (e.g., React or Angular apps), sites requiring login (e.g., dashboards), handling multi-page results (e.g., search results), or capturing visual data (e.g., screenshots for verification). Avoid for static HTML sites where simpler tools like requests suffice.

Key Capabilities

Dynamically load and interact with content using Playwright's browser control.
Manage authentication flows, such as logging in via forms or API tokens.
Handle pagination by navigating pages, clicking "next" buttons, or parsing URLs.
Extract data using selectors, with options for JSON output or file saves.
Capture screenshots or full-page PDFs for debugging or reporting.
Supports headless or visible browser modes for flexibility.

Usage Patterns

Always initialize a browser context first, then create pages for navigation. Use async patterns for reliability. For authenticated scraping, handle cookies or sessions per context. Structure scripts to loop through pages for pagination and use try-catch for flaky elements. Pass configurations via JSON files or environment variables for reusability.