datanalysis-credit-risk

Installation
Summary

Credit risk data cleaning and variable screening pipeline for pre-loan modeling.

  • Executes 11 independent steps covering data loading, abnormal period filtering, missing rate analysis, low-IV and high-PSI variable removal, null importance denoising, and correlation-based feature elimination
  • Supports organization-level analysis with separate modeling and out-of-sample (OOS) sample handling, plus multi-process acceleration for IV and PSI calculations
  • Generates comprehensive Excel report with 15 sheets detailing operation results, feature statistics, distributions, and removed variables across all pipeline stages
  • Configurable thresholds for missing rate, IV, PSI, correlation, and null importance parameters with sensible defaults
SKILL.md

Data Cleaning and Variable Screening

Quick Start

# Run the complete data cleaning pipeline
python ".github/skills/datanalysis-credit-risk/scripts/example.py"

Complete Process Description

The data cleaning pipeline consists of the following 11 steps, each executed independently without deleting the original data:

  1. Get Data - Load and format raw data
  2. Organization Sample Analysis - Statistics of sample count and bad sample rate for each organization
  3. Separate OOS Data - Separate out-of-sample (OOS) samples from modeling samples
  4. Filter Abnormal Months - Remove months with insufficient bad sample count or total sample count
  5. Calculate Missing Rate - Calculate overall and organization-level missing rates for each feature
  6. Drop High Missing Rate Features - Remove features with overall missing rate exceeding threshold
Related skills

More from github/awesome-copilot

Installs
6.8K
GitHub Stars
32.8K
First Seen
Mar 2, 2026