web-archive-scraper

Pass

Audited by Gen Agent Trust Hub on Mar 31, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill processes untrusted data from the Internet Archive which presents an indirect prompt injection surface.
  • Ingestion points: The fetch_archived_content function in scripts/search_archive.py retrieves raw HTML from external URLs via the Internet Archive.
  • Boundary markers: The skill does not implement explicit delimiters or instructions to the agent to ignore embedded instructions when presenting the scraped content.
  • Capability inventory: The script uses requests.get to fetch remote content and re for text processing.
  • Sanitization: The extract_text function in scripts/search_archive.py uses regular expressions to strip script, style, and HTML tags, providing basic text extraction.
  • [EXTERNAL_DOWNLOADS]: The skill requires the requests Python library for network communication.
  • [DATA_EXFILTRATION]: Performs network requests to web.archive.org, a well-known service, to retrieve archived website metadata and content.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 31, 2026, 09:58 PM