exploring-llm-evaluations
Exploring LLM evaluations
PostHog evaluations score $ai_generation events. Each evaluation is one of two types,
both first-class:
hog— deterministic Hog code that returnstrue/false(and optionally N/A). Best for objective rule-based checks: format validation (JSON parses, schema matches), length limits, keyword presence/absence, regex patterns, structural assertions, latency thresholds, cost guards. Cheap, fast, reproducible — no LLM call per run. Prefer this when the criterion can be expressed as code.llm_judge— an LLM scores generations against a prompt you write. Best for subjective or fuzzy checks: tone, helpfulness, hallucination detection, off-topic drift, instruction-following. Costs an LLM call per run and requires AI data processing approval at the org level.
Results from both types land in ClickHouse as $ai_evaluation events with the same
schema, so the read/query/summary workflows are identical regardless of evaluator type —
the only thing that changes is whether $ai_evaluation_reasoning was written by Hog
code or by an LLM.
More from posthog/skills
posthog-debugger
Debug and inspect PostHog implementations on any website. Use this skill when a user wants to understand how PostHog is implemented on a page, troubleshoot tracking issues, verify configuration, check what events are being sent, or audit a PostHog setup. Works with Chrome DevTools MCP and Playwright MCP to inspect live websites.
106integration-nextjs-app-router
PostHog integration for Next.js App Router applications
102instrument-product-analytics
>-
90feature-flags-nextjs
PostHog feature flags for Next.js applications
88instrument-feature-flags
>-
86error-tracking-nextjs
PostHog error tracking for Next.js
85