transformer-lens-interpretability

Originally fromzechenzhangagi/ai-research-skills

Installation

SKILL.md

TransformerLens: Mechanistic Interpretability for Transformers

TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation.

GitHub: TransformerLensOrg/TransformerLens (2,900+ stars)

When to Use TransformerLens

Use TransformerLens when you need to:

Reverse-engineer algorithms learned during training
Perform activation patching / causal tracing experiments
Study attention patterns and information flow
Analyze circuits (e.g., induction heads, IOI circuit)
Cache and inspect intermediate activations
Apply direct logit attribution

Installs

358

Repository

orchestra-resea…h-skills

GitHub Stars

10.4K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass

transformer-lens-interpretability — orchestra-research/ai-research-skills