sn-da-large-file-analysis

Installation
SKILL.md

Large Scale Excel Analysis Skill

Mandatory Rules

When total rows >= 10,000, you MUST use the methods in this skill.

Data Scale Read Strategy Reason
< 10k rows pd.read_excel() directly No memory pressure
10k–100k rows pd.read_excel() → convert to Parquet → pd.read_parquet() for analysis Avoid repeated slow reads
100k–1M rows openpyxl read_only + iter_rows streaming → Parquet pd.read_excel() will OOM or timeout
> 1M rows Streaming read + multi-sheet split (Excel max 1,048,576 rows per sheet) Must chunk

Prohibited:

  • Do NOT use pd.read_excel() to fully load 100k+ row files
  • Do NOT search for fonts with fc-list, find ... fonts, or install packages with pip install
  • Do NOT use df.iterrows() on large DataFrames (use itertuples() or vectorized ops)
  • Do NOT use df.apply(lambda...) for operations that can be vectorized
Related skills
Installs
19
GitHub Stars
1.2K
First Seen
Apr 29, 2026