evaluating-code-models

Originally fromzechenzhangagi/ai-research-skills

Installation

SKILL.md

BigCode Evaluation Harness - Code Model Benchmarking

Quick Start

BigCode Evaluation Harness evaluates code generation models across 15+ benchmarks including HumanEval, MBPP, and MultiPL-E (18 languages).

Installation:

git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git
cd bigcode-evaluation-harness
pip install -e .
accelerate config

Evaluate on HumanEval:

accelerate launch main.py \
  --model bigcode/starcoder2-7b \
  --tasks humaneval \

Related skills

More from davila7/claude-code-templates

senior-data-scientist
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
2.6K
senior-backend
Comprehensive backend development skill for building scalable backend systems using NodeJS, Express, Go, Python, Postgres, GraphQL, REST APIs. Includes API scaffolding, database optimization, security implementation, and performance tuning. Use when designing APIs, optimizing database queries, implementing business logic, handling authentication/authorization, or reviewing backend code.
2.1K
excel analysis
Analyze Excel spreadsheets, create pivot tables, generate charts, and perform data analysis. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files.
1.5K
literature-review
Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).
1.5K
senior-frontend
Comprehensive frontend development skill for building modern, performant web applications using ReactJS, NextJS, TypeScript, Tailwind CSS. Includes component scaffolding, performance optimization, bundle analysis, and UI best practices. Use when developing frontend features, optimizing performance, implementing UI/UX designs, managing state, or reviewing frontend code.
1.5K
market-research-reports
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter's Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
1.3K

Installs

322

Repository

davila7/claude-…emplates

GitHub Stars

27.2K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykWarn

evaluating-code-models

BigCode Evaluation Harness - Code Model Benchmarking

Quick Start

More from davila7/claude-code-templates

senior-data-scientist

senior-backend

excel analysis

literature-review

senior-frontend

market-research-reports