Full-empirical-analysis-skill-R
Full Empirical Analysis — Classical R Workflow
This skill is the canonical 8-step pipeline an applied economist runs on every empirical paper, written in the modern tidyverse + econometrics R ecosystem — dplyr/tidyr/haven for data, fixest as the panel/IV/DID workhorse, did/bacondecomp/HonestDiD for modern DID, rdrobust/rddensity for RD, Synth/gsynth/synthdid for synthetic control, MatchIt/WeightIt/cobalt/ebal for matching, grf/DoubleML for ML causal, mediation for causal mediation, marginaleffects for post-estimation, modelsummary/kableExtra/gt for publication tables, ggplot2/iplot/binsreg for figures.
Companion skills: this is the R sibling of 00-StatsPAI_skill (Python DSL), 00.1-Full-empirical-analysis-skill (explicit Python), and 00.2-Full-empirical-analysis-skill_Stata (Stata .do). All four implement the same 8 steps, in their respective ecosystems.
Philosophy
- Tidyverse + fixest, the modern R idioms.
feols(... | unit + year, cluster = ~unit), not Frankenstein-ylm(y ~ x + factor(unit) + factor(year)). - Reproducible scripts / Quarto. Every example below is paste-runnable.
renvfor package locking;Quarto(.qmd) for combined narrative + code + tables/figures. - 8 steps, first-class. R users historically over-invest in Step 5; this skill treats Steps 1–4 and 6–8 as core.
- Rich outputs. Every step yields at least one table or figure — tex/docx/png/pdf.
- Progressive disclosure.
SKILL.mdgives the canonical call per step;references/holds variant-specific depth.
Three domain modes (default = AER econ; alternates = epi & ML-causal)
The default playbook above is AER-style applied econometrics — the AEA convention: written-out estimating equation, identifying assumption, design horse-race, full robustness gauntlet. The skill also ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same Steps 1–4 (cleaning / construction / Table 1 / diagnostics) and Step 8 (tables/figures) — only Step 5 (estimator) and Step 6/7 swap packages: