transformer-lens-interpretability

Installation
SKILL.md

TransformerLens: Mechanistic Interpretability for Transformers

TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation.

GitHub: TransformerLensOrg/TransformerLens (2,900+ stars)

When to Use TransformerLens

Use TransformerLens when you need to:

  • Reverse-engineer algorithms learned during training
  • Perform activation patching / causal tracing experiments
  • Study attention patterns and information flow
  • Analyze circuits (e.g., induction heads, IOI circuit)
  • Cache and inspect intermediate activations
  • Apply direct logit attribution

Consider alternatives when:

  • You need to work with non-transformer architectures → Use nnsight or pyvene
  • You want to train/analyze Sparse Autoencoders → Use SAELens
Related skills
Installs
5
GitHub Stars
5
First Seen
Mar 28, 2026