model-extraction-relu-logits

Installation
SKILL.md

Model Extraction for ReLU Networks

This skill provides guidance for extracting internal weight matrices from black-box ReLU neural networks using only input-output access.

Problem Understanding

Model extraction tasks typically involve:

  • A black-box neural network that accepts inputs and returns outputs (logits)
  • The goal of recovering internal parameters (weight matrices, biases)
  • No direct access to the network's implementation or internal state

Critical Principle: True Black-Box Treatment

Treat the target network as a genuine black-box. Never rely on implementation details that may change during evaluation:

  • Do not hardcode hidden layer dimensions from example code
  • Do not assume specific random seeds or initialization schemes
  • Do not directly compare extracted weights to "true" weights read from source files
  • The test environment may use completely different parameters than any provided examples
Related skills

More from letta-ai/skills

Installs
32
Repository
letta-ai/skills
GitHub Stars
97
First Seen
Jan 24, 2026