datanalysis-credit-risk
Credit risk data cleaning and variable screening pipeline for pre-loan modeling.
- Executes 11 independent steps covering data loading, abnormal period filtering, missing rate analysis, low-IV and high-PSI variable removal, null importance denoising, and correlation-based feature elimination
- Supports organization-level analysis with separate modeling and out-of-sample (OOS) sample handling, plus multi-process acceleration for IV and PSI calculations
- Generates comprehensive Excel report with 15 sheets detailing operation results, feature statistics, distributions, and removed variables across all pipeline stages
- Configurable thresholds for missing rate, IV, PSI, correlation, and null importance parameters with sensible defaults
Data Cleaning and Variable Screening
Quick Start
# Run the complete data cleaning pipeline
python ".github/skills/datanalysis-credit-risk/scripts/example.py"
Complete Process Description
The data cleaning pipeline consists of the following 11 steps, each executed independently without deleting the original data:
- Get Data - Load and format raw data
- Organization Sample Analysis - Statistics of sample count and bad sample rate for each organization
- Separate OOS Data - Separate out-of-sample (OOS) samples from modeling samples
- Filter Abnormal Months - Remove months with insufficient bad sample count or total sample count
- Calculate Missing Rate - Calculate overall and organization-level missing rates for each feature
- Drop High Missing Rate Features - Remove features with overall missing rate exceeding threshold
More from github/awesome-copilot
git-commit
Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping
30.2Kgh-cli
GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line.
21.2Kdocumentation-writer
Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.
17.4Kprd
Generate high-quality Product Requirements Documents (PRDs) for software systems and AI-powered features. Includes executive summaries, user stories, technical specifications, and risk analysis.
17.4Kexcalidraw-diagram-generator
Generate Excalidraw diagrams from natural language descriptions. Use when asked to "create a diagram", "make a flowchart", "visualize a process", "draw a system architecture", "create a mind map", or "generate an Excalidraw file". Supports flowcharts, relationship diagrams, mind maps, and system architecture diagrams. Outputs .excalidraw JSON files that can be opened directly in Excalidraw.
16.4Krefactor
Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements.
16.1K