scrape-webpage
Scrape Webpage
Extract content, metadata, and images from a webpage for import/migration.
When to Use This Skill
Use this skill when:
- Starting a page import and need to extract content from source URL
- Need webpage analysis with local image downloads
- Want metadata extraction (Open Graph, JSON-LD, etc.)
Invoked by: page-import skill (Step 1)
Prerequisites
Before using this skill, ensure:
- ✅ Node.js is available
- ✅ npm playwright is installed (
npm install playwright) - ✅ Chromium browser is installed (
npx playwright install chromium)
More from adobe/helix-website
authoring-analysis
Analyze content sequences and determine authoring approach (default content vs blocks). Validates block selection and section styling for import/migration to AEM Edge Delivery Services.
30page-decomposition
Analyze content sequences within a section and provide neutral descriptions for AEM Edge Delivery Services. Invoked per section during page import to identify breaking points between default content and blocks.
29page-import
Import a single webpage from any URL to structured HTML content for authoring in AEM Edge Delivery Services. Scrapes the page, analyzes structure, maps to existing blocks, and generates HTML for immediate local preview. Also triggered by terms like "migrate", "migration", or "migrating".
28preview-import
Preview and verify imported content in local AEM Edge Delivery Services dev server. Validates rendering, compares with original page, and troubleshoots common issues.
28identify-page-structure
Identify section boundaries and content sequences within a scraped webpage for AEM Edge Delivery Services import. Performs two-level analysis (sections, then sequences per section) and surveys available blocks.
28modeling content
Create effective content models for your blocks that are easy for authors to work with. Use this skill anytime you are building new blocks, making changes to existing blocks that modify the initial structure authors work with.
26