tensorrt-llm

Installation

SKILL.md

TensorRT-LLM

NVIDIA's open-source library for optimizing LLM inference with state-of-the-art performance on NVIDIA GPUs.

When to use TensorRT-LLM

Use TensorRT-LLM when:

Deploying on NVIDIA GPUs (A100, H100, GB200)
Need maximum throughput (24,000+ tokens/sec on Llama 3)
Require low latency for real-time applications
Working with quantized models (FP8, INT4, FP4)
Scaling across multiple GPUs or nodes

Use vLLM instead when:

Need simpler setup and Python-first API
Want PagedAttention without TensorRT compilation
Working with AMD GPUs or non-NVIDIA hardware

Use llama.cpp instead when:

Related skills

More from firecrawl/ai-research-skills

Installs

5

Repository

firecrawl/ai-re…h-skills

GitHub Stars

5

First Seen

Mar 28, 2026

Security Audits

Gen Agent Trust HubPass