model-extraction-relu-logits

Installation

SKILL.md

Model Extraction for ReLU Networks

This skill provides guidance for extracting internal weight matrices from black-box ReLU neural networks using only input-output access.

Problem Understanding

Model extraction tasks typically involve:

A black-box neural network that accepts inputs and returns outputs (logits)
The goal of recovering internal parameters (weight matrices, biases)
No direct access to the network's implementation or internal state

Critical Principle: True Black-Box Treatment

Treat the target network as a genuine black-box. Never rely on implementation details that may change during evaluation:

Do not hardcode hidden layer dimensions from example code
Do not assume specific random seeds or initialization schemes
Do not directly compare extracted weights to "true" weights read from source files
The test environment may use completely different parameters than any provided examples

Installs

32

Repository

letta-ai/skills

GitHub Stars

122

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass

model-extraction-relu-logits — letta-ai/skills