predicting-improving-test-time-scaling
Predicting and Improving Test-Time Scaling via Reward Tail-Guided Search
This skill enables Claude to implement and apply Scaling-Law Guided (SLG) Search, a test-time compute optimization algorithm that replaces naive best-of-N sampling with principled budget allocation. Instead of uniformly sampling N candidates and picking the best, SLG Search fits a Generalized Pareto Distribution (GPD) to the tail of observed rewards, predicts how much improvement additional compute would yield per candidate path, and concentrates remaining budget on the most promising intermediate states. This achieves the same reward quality as best-of-N but with polynomially less compute.
When to Use
- When the user wants to optimize how an LLM allocates test-time compute across multiple candidate solutions (e.g., math reasoning, code generation, planning)
- When building a best-of-N sampling pipeline and wanting principled guidance on how large N should be or how to split budget across stages
- When implementing a search algorithm over LLM outputs that needs to decide which partial solutions to expand further
- When the user asks to predict LLM scaling behavior without running exhaustive evaluations
- When designing multi-stage reasoning pipelines where intermediate states compete for compute budget
- When implementing reward-model-guided tree search and needing a principled node selection criterion beyond simple reward ranking
Key Technique
The Problem with Best-of-N: Vanilla best-of-N generates N independent samples, scores them with a reward model, and returns the best. This is wasteful because it treats every sample path equally. Some intermediate states lead to reward distributions with heavy tails (high upside potential), while others have light tails (diminishing returns). BoN cannot distinguish between them.
Tail-Guided Prediction: SLG Search estimates the tail behavior of rewards at each intermediate state by fitting a Generalized Pareto Distribution to the upper quantile of observed reward samples. The GPD shape parameter (xi) captures whether the reward distribution is heavy-tailed (xi > 0, power-law decay, meaning more compute will likely find much better solutions) or light-tailed (xi near 0, exponential decay, meaning returns diminish quickly). From just m small pilot samples at each state, the algorithm predicts the expected maximum reward achievable with any budget N via the scaling law: V(s, N) ~ baseline + scale * N^(xi). This prediction avoids exhaustive evaluation.
More from ndpvt-web/arxiv-claude-skills
sparseeval-evaluation-sparse-optimization
Efficiently evaluate LLMs on benchmarks by selecting a small subset of anchor items via sparse optimization, reproducing full-benchmark rankings at a fraction of the cost. Use when: 'reduce evaluation cost for my LLM benchmark', 'select representative test items from a large dataset', 'rank models without running all benchmark samples', 'sparse subset selection for evaluation', 'find anchor items that represent my test suite', 'efficient model comparison on benchmarks'.
1predictive-coding-information-bottleneck
>
1supchain-bench-benchmarking-real-world-supply
Build reliable long-horizon supply chain agents using the SupChain-ReAct pattern: multi-path ReAct trajectories with majority voting for autonomous tool orchestration without handcrafted SOPs. Use when asked to 'build a supply chain agent', 'orchestrate multi-step tool calls for order management', 'diagnose fulfillment issues', 'create an SOP-free agent workflow', 'implement long-horizon tool calling', or 'build an e-commerce order diagnostic system'.
1pcbschemagen-constraint-guided-schematic-design
Generate PCB schematics from natural language using constraint-guided LLM code generation with knowledge-graph verification. Use when the user says 'generate a PCB schematic', 'design a circuit board', 'create a KiCad schematic from description', 'convert circuit requirements to netlist', 'automate schematic design', or 'generate SKiDL code for a circuit'.
1