crawl
SKILL.md
Web Crawling Guide
Overview
This guide covers essential web crawling and scraping operations using Python libraries. For simple fetching, use httpx. For HTML parsing and data extraction, use beautifulsoup4. For JavaScript-rendered pages, consider playwright.
Quick Start
import httpx
from bs4 import BeautifulSoup
# Fetch and parse a webpage
response = httpx.get("https://example.com")
soup = BeautifulSoup(response.text, "html.parser")
# Extract title
title = soup.title.string if soup.title else "No title"
print(f"Title: {title}")