Full-empirical-analysis-skill
Full Empirical Analysis — Classical Python Workflow
This skill is the canonical 8-step pipeline an applied economist runs on every empirical paper, written in the traditional Python ecosystem — no opinionated one-stop wrapper. Every step calls libraries directly (pandas, numpy, scipy, statsmodels, linearmodels, pyfixest, rdrobust, econml, causalml, matplotlib, seaborn), so the agent — or the user reading the agent's code — has full visibility and can swap any component.
Companion skill: if the user prefers a single-import agent-native DSL (import statspai as sp), route to 00-StatsPAI_skill instead. This skill is the opposite philosophy: everything explicit, everything inspectable, every diagnostic run by hand, every plot shaped by the user.
Philosophy
- Traditional stack, no magic. Agents should be able to read every line and know exactly which library / estimator / standard error family is at work.
- Full pipeline, not just estimation. 80% of the time on a real paper is steps 1–4 and 6–8. This skill treats them as first-class, not an afterthought.
- Rich outputs. Every step produces at least one table or figure — never a single point estimate in isolation.
- Progressive disclosure. SKILL.md gives the canonical call at each step;
references/holds variant-specific depth (dozens of tests, estimator-specific diagnostics, plot recipes). - Reproducible. Every code block is runnable after
pip install -r requirements.txtanddf = pd.read_csv(...).
Three domain modes (default = AER econ; alternates = epi & ML-causal)
The default playbook above is AER-style applied econometrics — the AEA convention: written-out estimating equation, identifying assumption, design horse-race, full robustness gauntlet. The skill also ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same Steps 1–4 (cleaning / construction / descriptives / diagnostics) and Step 8 (tables & figures) — only Step 5 (estimator) and Step 6/7 (robustness / mechanism) swap libraries: