create-context-tests

nao test runs each natural-language prompt through the agent, executes both the agent's SQL and the test's expected SQL against the warehouse, and diffs the result data row-by-row. A test passes only if the actual data matches — same rows, same values. The suite is the reliability benchmark; every change to RULES.md is measured against it. Reference: docs.getnao.io/nao-agent/context-engineering/evaluation.

How many tests

One test per key metric in ## Key Metrics Reference is the floor. Then add tests for: time scoping (especially "last 8 weeks" / "last 30 days"), CTE / multi-step queries, edge cases (NULLs, empty windows), and ambiguous wording ("our users", "active") to validate naming-convention rules.

Two authoring rules — apply to every test

Rule 1 — Prompts read like real chat. Vague, short, no table/column/method hints. The test verifies the agent reaches the right answer from a real-user input.

Bad	Good
"What was the churn rate from `fct_subscriptions` in Q1?"	"How's churn looking this quarter?"
"Compute MRR as SUM(`mrr_amount`) where status='active'"	"What's our MRR?"

Rule 2 — Output column names encode format / unit, not source. A column name communicates how to interpret the value.

create-context-tests

create-context-tests

How many tests

Two authoring rules — apply to every test

More from getnao/nao

write-context-rules

setup-context

audit-context

add-semantic-layer