The Agent Skills Directory

Command Execution: The skill utilizes the uv run inference command to start the server and manage SLURM jobs. This is a standard practice for executing Python entrypoints in a controlled development environment.
Local Network Operations: The instructions include using curl to test endpoints on localhost:8000. This is a routine procedure for verifying that a locally hosted service is responding correctly.
System Infrastructure Integration: The skill provides templates and commands for SLURM scheduling, which is common in high-performance computing environments for managing large-scale inference tasks.
Dynamic Server Management: The server exposes custom endpoints such as /update_weights and /load_lora_adapter. These are functional features designed to allow hot-reloading of model components during active development or reinforcement learning workflows.

inference-server