generate-verifiers-env
generate-verifiers-env
Build the Verifiers variant of an env. Verifiers is in-process — no HTTP server, no Docker, no HF Space. The trainer (or a manual rollout) imports tool functions directly from env.py.
Concept
PrimeIntellect Verifiers is a Python library — not a server framework. It provides vf.ToolEnv (multi-turn rollout), vf.Rubric (composable async graders), and adapters into TRL GRPOTrainer. The trainer or rollout owns the LLM client; the env owns the tools and the grader.
When the user has a shared domain module (<domain>.py) and wants a Verifiers variant, wrap it as a toolkit class plus standalone tool functions. Don't duplicate domain logic.
Archetypes
| Archetype | Hallmarks |
|---|---|
| Pure-Python game | One @tool-style function, terminal reward via rubric checking the trajectory. |
| Stateful sandbox in-process | Toolkit owns the sandbox (E2B, browser); initialize() is lazy; cleanup() is mandatory in finally. |
| Vision env | Drive the toolkit manually (skip vf.ToolEnv since vision content blocks aren't first-class in verifiers' rollout). Send the screenshot in the user message each turn. |
Two consumption paths (always provide both)
More from adithya-s-k/rl_envs_101
generate-openenv-env
Builds an OpenEnv (Meta) variant of an RL environment. Use whenever someone asks to scaffold an OpenEnv server, port an existing env to OpenEnv, add MCP tools to an env, or deploy an OpenEnv to HF Spaces. OpenEnv is the right framework when the user wants HTTP+MCP, structured tool calls discovered via `list_tools()`, an optional Gradio UI, sandbox-backed sessions, or deployment as a Docker container / HF Space. Output is a runnable `<env_dir>/openenv/` folder with `server/app.py`, `server/<env>_environment.py`, `pyproject.toml`, `Dockerfile`, and `rollout.py`. Use for prompts like "wrap my game in OpenEnv", "make an MCP env for X", or "add the openenv variant".
12generate-ors-env
Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".
11generate-nemo-gym-env
Builds a NeMo Gym (NVIDIA) variant of an RL environment. Use whenever someone asks to scaffold a NeMo Gym Resources Server, port an existing env to NeMo Gym, expose tools as `app.post()` endpoints with cookie-based sessions, add a post-episode `/verify` reward grader, or deploy a NeMo Gym env to HF Spaces. NeMo Gym is the right framework when the user wants HTTP+REST with cookie session handling, raw `requests`-driven rollouts (no SDK client), Ray-based orchestration, or NVIDIA NeMo / TRL training integration with a `responses_create_params` + `ground_truth` dataset format. Output is a runnable `<env_dir>/nemo_gym/` folder with `server.py`, `pyproject.toml`, `Dockerfile`, `configs/<env>.yaml`, and `rollout.py`. Use for prompts like "wrap my env in NeMo Gym", "make a NeMo resources server for X", or "add a post-episode grader to my env".
11rl-env-from-description
Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.
11