resilient-coding-agent
Resilient Coding Agent
Long-running coding agent tasks (Codex CLI, Claude Code, OpenCode, Pi) are vulnerable to interruption: orchestrator restarts, process crashes, network drops. This skill decouples the coding agent process from the orchestrator using tmux, and leverages agent-native session resume for recovery.
Placeholders: <task-name> and <project-dir> are filled in by the orchestrator. <task-name> must match [a-z0-9-] only. <project-dir> must be a valid existing directory.
Temp directory: Each task uses a secure temp directory created with mktemp -d. Store this path as <tmpdir> and use it for all task files (prompt, events, session ID, done marker). This avoids predictable filenames and symlink/race conditions. Example: TMPDIR=$(mktemp -d) produces something like /var/folders/xx/.../T/tmp.aBcDeFgH.
Prompt safety: Task prompts are never interpolated into shell commands. Instead, write the prompt to a temp file using the orchestrator's write tool (no shell involved), then reference it with "$(cat $TMPDIR/prompt)" inside the tmux command. The shell treats command substitution output inside double quotes as a single literal argument, preventing injection. This depends on the orchestrator's write tool not invoking a shell; OpenClaw's built-in write tool meets this requirement.
Sensitive output: tmux scrollback and event log files may contain secrets or API keys from agent output. On shared machines, restrict file permissions (chmod 600) and clean up temp directories after task completion.
Prerequisites
This skill assumes the orchestrator is already configured to use coding agent CLIs (Codex, Claude Code, etc.) for coding tasks instead of native sessions. If the orchestrator is still using sessions_spawn for coding work, configure it to prefer coding agents first (e.g., via AGENTS.md or equivalent). See the coding-agent skill for setup.