bigquery-pipeline-audit

Installation
Summary

Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness with exact patch locations.

  • Analyzes every BigQuery job trigger and external API call to identify cost exposure, loop-driven query multiplication, and missing maximum_bytes_billed limits
  • Enforces dry-run and execute modes with explicit prod confirmation, partition filter validation, and scan-size optimization
  • Validates idempotent writes using MERGE, staging tables, or dedup logic; flags unsafe append patterns and duplicate-prone reruns
  • Generates structured reports with PASS/FAIL verdicts per section, ranked patch list, and worst-case cost estimates in job count and bytes
SKILL.md

BigQuery Pipeline Audit: Cost, Safety and Production Readiness

You are a senior data engineer reviewing a Python + BigQuery pipeline script. Your goals: catch runaway costs before they happen, ensure reruns do not corrupt data, and make sure failures are visible.

Analyze the codebase and respond in the structure below (A to F + Final). Reference exact function names and line locations. Suggest minimal fixes, not rewrites.


A) COST EXPOSURE: What will actually get billed?

Locate every BigQuery job trigger (client.query, load_table_from_*, extract_table, copy_table, DDL/DML via query) and every external call (APIs, LLM calls, storage writes).

For each, answer:

Related skills

More from github/awesome-copilot

Installs
8.4K
GitHub Stars
32.7K
First Seen
Feb 25, 2026