generate-nemo-gym-env
generate-nemo-gym-env
Build the NeMo Gym variant of an env. NeMo Gym is NVIDIA's RL gym layer, optimized for Ray-based orchestration and post-episode grading. The Python package is nemo_gym (installed via pip install git+https://github.com/NVIDIA-NeMo/Gym).
Concept
NeMo Gym is NVIDIA's RL gym layer for LLM agents. It's built on Ray and ships a FastAPI-based SimpleResourcesServer that exposes one POST /<tool> endpoint per tool, plus the standard /seed_session (cookie-based session bootstrap) and /verify (post-episode grader). Targets docs.nvidia.com/nemo/gym/latest.
When the user has a shared domain module (<domain>.py) and wants a NeMo Gym variant, wrap it. Don't duplicate logic.
Archetypes
| Archetype | Hallmarks |
|---|---|
| Pure-Python game | Single tool endpoint; /verify does substring match against ground_truth. |
| Stateful sandbox | Per-session sandbox in self.sessions; lazy init on first tool call. |
| Vision / computer-use | One endpoint per action; /verify rewards trajectories that called terminate(success). |
Recommended file layout
More from adithya-s-k/rl_envs_101
generate-verifiers-env
Builds a Verifiers (PrimeIntellect) variant of an RL environment. Use whenever someone asks to scaffold a Verifiers env, port to Verifiers, build an in-process toolkit, set up a `vf.ToolEnv` with a Rubric, or wire up a TRL `GRPOTrainer` rollout. Verifiers is the right framework when the user wants in-process tools (no HTTP server), structured tool calling driven by plain Python functions, composable reward rubrics with multiple grader functions, fast iteration with no Docker, or the cleanest path from prototype to TRL training. Output is a runnable `<env_dir>/verifiers/` folder with `env.py` (toolkit + standalone tool functions + `create_verifiers_env`), `rollout.py`, and `pyproject.toml`. Use for prompts like "make a verifiers env for X", "wrap my game in verifiers", or "set up a vf.ToolEnv".
13generate-openenv-env
Builds an OpenEnv (Meta) variant of an RL environment. Use whenever someone asks to scaffold an OpenEnv server, port an existing env to OpenEnv, add MCP tools to an env, or deploy an OpenEnv to HF Spaces. OpenEnv is the right framework when the user wants HTTP+MCP, structured tool calls discovered via `list_tools()`, an optional Gradio UI, sandbox-backed sessions, or deployment as a Docker container / HF Space. Output is a runnable `<env_dir>/openenv/` folder with `server/app.py`, `server/<env>_environment.py`, `pyproject.toml`, `Dockerfile`, and `rollout.py`. Use for prompts like "wrap my game in OpenEnv", "make an MCP env for X", or "add the openenv variant".
12generate-ors-env
Builds an Open Reward Standard (ORS) variant of an RL environment using the official `openreward` Python package. Use whenever someone asks to scaffold an ORS env, port to OpenReward, add per-tool-call rewards, deploy to OpenReward.ai, or wrap an existing env in the ORS protocol. ORS is the right framework when the user wants HTTP+REST+SSE, rewards arriving inline with each tool call (not post-episode), task-spec-driven sessions, splits (train/val/test), or deployment to OpenReward.ai or HF Spaces. Output is a runnable `<env_dir>/ors/` folder with `server.py`, `tasks.py`, `pyproject.toml`, `Dockerfile.spaces`, and `rollout.py`. Use for prompts like "wrap my env in ORS", "make an OpenReward env for X", or "add per-call reward to my env".
11rl-env-from-description
Turns a user's plain-English description of an RL training environment into runnable code across the four target frameworks — OpenEnv, OpenReward (ORS), Verifiers, and NeMo Gym. Use whenever someone describes an environment they want to build ("I want to train an agent that does X", "make an env where the model has to Y"), asks to scaffold a new env, asks to port an existing env to one of these frameworks, or asks how to design tools/rewards/state for a new env. Use even when the user does not explicitly say "RL environment" — descriptions like "agent that browses the web", "tool-calling agent for SQL", or "game-playing agent" all qualify. Drives the full flow — clarifying interview, env-name selection, shared-domain extraction, per-framework implementation, and rollout-based smoke tests.
11