svy Skill

svy: design-based analysis of complex survey data in Python. Covers survey design specification (strata, PSU, weights, FPC), variance estimation (Taylor linearization, BRR, jackknife, bootstrap), descriptive estimation (means, totals, proportions, ratios, medians), survey-weighted GLM regression (gaussian, binomial, Poisson), domain/subpopulation analysis, calibration, and survey data I/O (SAS, SPSS, Stata). Uses Polars DataFrames natively. Use when analyzing data from complex sample surveys (NHANES, CPS, ACS PUMS, MEPS, ECLS-K, BRFSS, DHS). For non-survey regression, use statsmodels; for fixed effects, use pyfixest; for panel/IV models, use linearmodels.

Comprehensive skill for complex survey data analysis with svy. Use decision trees below to find the right guidance, then load detailed references.

What is svy?

svy is the Python package for design-based analysis of complex survey data:

Survey-aware estimation: Means, totals, proportions, ratios, medians with proper design-based standard errors
GLM regression: Survey-weighted linear, logistic, and Poisson regression with design-adjusted inference
Flexible variance estimation: Taylor linearization (default), bootstrap, BRR (including Fay's modification), and jackknife (JK1, JKn) replicate methods
Domain estimation: Correct subpopulation analysis without pre-filtering (preserves design structure)
Native Polars: Built on Polars DataFrames, not pandas
Survey data I/O: Read SAS (.sas7bdat), SPSS (.sav), Stata (.dta), and CSV with metadata
Calibration: Post-stratification, raking, and GREG calibration for weight adjustment
Validated: Results numerically equivalent to R's survey package across all methods

svy

svy Skill

What is svy?

Version Notes