code-review
Code Review
Coverage
- Pre-review fact-gathering: understanding the PR's stated purpose, the linked issue, the size of the diff, and any context the author called out
- Read-order strategy: tests first (do they describe the change correctly?), then the implementation, then the call sites that consume the changed surface
- Severity-grading rubric: blocker / change-requested / suggestion / nit / praise — and when each is appropriate
- Comment-phrasing discipline: how to ask questions instead of make accusations, how to cite line numbers and references, how to distinguish an objective rule from a stylistic preference
- The no-rubber-stamp rule for AI-generated diffs: deliberate verification of the generated code's claims, especially around tests, error handling, and security
- Self-review pass: how to review your own diff before opening the PR, catching the obvious issues so the human reviewer can focus on the non-obvious
- Tools that complement the review: lint output, type-check output, test results, and how to interpret each in context
- The merge decision: when "approve", "request changes", and "close without merge" are appropriate
Philosophy
A code review is a conversation, not a verdict. The reviewer's job is not to prove they could have written the code differently; it is to ensure the change ships in a state the team can maintain. Reviews fail when they are either rubber-stamped (no verification, just a thumbs-up) or weaponised (every review becomes a referendum on the author's competence). The reviewer's leverage comes from reading code the author has been staring at for hours — the reviewer sees the obvious mistakes the author cannot.
For AI-generated diffs the bar is higher, not lower. Karpathy's "vibe hangover" is real: AI-generated code typically looks correct, has reasonable variable names, and compiles cleanly — and contains 1.7-2.74× more security vulnerabilities than human-authored equivalents. A reviewer rubber-stamping an AI diff is not saving time; they are deferring debugging cost to whichever colleague debugs the production failure.
More from jacob-balslev/skill-graph-skills
ai-native-development
Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`).
4ideation
Use when generating a wide range of solution concepts before converging on a direction, running structured idea-generation sessions, breaking out of solution fixation, or moving from divergent to convergent selection with explicit criteria. Do NOT use for collaborative engineering domain discovery (event-storming), solo deep technical design, or making final go/no-go investment decisions — those require different methods.
4frontend-architecture
Use when organizing a frontend codebase — module boundaries, component layering, state ownership, data-flow direction, and the separation between feature code and shared primitives. Do NOT use for visual design decisions, specific framework migration tactics, or backend API contract design.
4color-system-design
Use when designing a color system — palette construction, semantic color tokens, WCAG contrast ratios, perceptual uniformity in OKLCH/LCH, and light/dark mode parity. Do NOT use for single brand-color picks, runtime theme-switching mechanics, or non-color design tokens.
4agent-engineering
Use when designing or evaluating a production AI agent system, choosing a multi-agent coordination pattern (orchestrator/worker, fan-out, consensus, sequential chain, evaluator/optimizer), diagnosing coordination failures (claim races, silent stalls, context contamination, runaway loops), or auditing whether an agent loop is truly production-ready. Covers the four pillars (architecture and lifecycle, task decomposition, coordination patterns, production reliability), the six reliability requirements (observability, cost budgets, idempotency, failure recovery, safety caps, claim locks), the delegation decision framework with overhead crossover, and the most common anti-patterns. Do NOT use for prompt wording (use `prompt-craft`), per-call tool efficiency (use `tool-call-strategy`), context-stack design within a single agent (use `context-engineering`), or runtime debugging of a deployed system (use `debugging`).
4form-ux-architecture
Use when designing or auditing form structure and validation UX: field grouping, required vs optional inputs, validation timing, client/server validation split, submission lifecycle, recovery, multi-step forms, and high-risk data entry. Do NOT use for labels and announcements alone (use `a11y`), validation-message wording (use `microcopy`), API schema design (use `api-design`), or stored data modeling (use `data-modeling`).
4