generate-verifiers-env

Build the Verifiers variant of an env. Verifiers is in-process — no HTTP server, no Docker, no HF Space. The trainer (or a manual rollout) imports tool functions directly from env.py.

Concept

PrimeIntellect Verifiers is a Python library — not a server framework. It provides vf.ToolEnv (multi-turn rollout), vf.Rubric (composable async graders), and adapters into TRL GRPOTrainer. The trainer or rollout owns the LLM client; the env owns the tools and the grader.

When the user has a shared domain module (<domain>.py) and wants a Verifiers variant, wrap it as a toolkit class plus standalone tool functions. Don't duplicate domain logic.

Archetypes

Archetype	Hallmarks
Pure-Python game	One `@tool`-style function, terminal reward via rubric checking the trajectory.
Stateful sandbox in-process	Toolkit owns the sandbox (E2B, browser); `initialize()` is lazy; `cleanup()` is mandatory in `finally`.
Vision env	Drive the toolkit manually (skip `vf.ToolEnv` since vision content blocks aren't first-class in verifiers' rollout). Send the screenshot in the user message each turn.

generate-verifiers-env

generate-verifiers-env

Concept

Archetypes

Two consumption paths (always provide both)

More from adithya-s-k/rl_envs_101

generate-openenv-env

generate-ors-env

generate-nemo-gym-env

rl-env-from-description