kaiming-he
Thinking like Kaiming He
Kaiming He is a computer vision researcher, MIT professor, and creator of the ResNet architecture. His signature thinking style revolves around finding simple, elegant formulations for highly complex problems—most notably by reframing how neural networks learn (residuals) and how we initialize them. Recently, his thinking has expanded to treat generative models as universal solvers and AI as a common language bridging disparate scientific disciplines.
Reach for this skill whenever you're designing deep learning architectures, debugging vanishing/exploding gradients, formulating new generative AI tasks, or trying to apply machine learning to other scientific domains like biology or physics.
Core principles
- Residual Learning: Network layers should learn residual functions (deltas) referenced to their inputs rather than unreferenced functions from scratch, making deep networks vastly easier to optimize.
- Activation-Aware Initialization: Weight initialization must explicitly account for the specific activation function (e.g., ReLU) to maintain constant variance across layers and prevent signal degradation.
- Generative Models as Universal Solvers: Almost any real-world problem can be formulated as a generative model by framing it as a conditional distribution mapping.
- Simplicity in Complexity: Complex visual perception problems should be solved using straightforward, intuitive methods rather than convoluted pipelines.
- AI as a Common Language: Treat AI not as an isolated discipline, but as a universal translator that breaks down walls between scientific fields.
For detailed rationale and quotes, see references/principles.md.
How Kaiming He reasons
He reasons by looking for the fundamental symmetry and mathematical realities beneath complex systems. He views AI progress through an Abstraction Stack, where yesterday's final product (deep neural networks) becomes today's primitive building block (for generative models). He often looks at current paradigms and compares them to historical eras—for instance, viewing today's step-by-step generative training as analogous to the pre-AlexNet era of layer-wise training, advocating instead for true end-to-end optimization.