content-hash-cache-pattern

Installation
Summary

Cache expensive file processing results using SHA-256 content hashes instead of file paths.

  • Content-hash keys survive file moves and auto-invalidate when content changes, eliminating path-based cache brittleness
  • Store cache entries as individual {hash}.json files for O(1) lookup without requiring a separate index
  • Implement caching as a service layer wrapper around pure processing functions, keeping extraction logic separate from cache concerns
  • Handle cache corruption gracefully by treating invalid entries as misses and re-processing on the next run
SKILL.md

Content-Hash File Cache Pattern

Cache expensive file processing results (PDF parsing, text extraction, image analysis) using SHA-256 content hashes as cache keys. Unlike path-based caching, this approach survives file moves/renames and auto-invalidates when content changes.

When to Activate

  • Building file processing pipelines (PDF, images, text extraction)
  • Processing cost is high and same files are processed repeatedly
  • Need a --cache/--no-cache CLI option
  • Want to add caching to existing pure functions without modifying them

Core Pattern

1. Content-Hash Based Cache Key

Use file content (not path) as the cache key:

import hashlib
Related skills
Installs
3.5K
GitHub Stars
179.7K
First Seen
Feb 17, 2026