The Agent Skills Directory

[SAFE]: The skill implements a benchmarking and quality assurance system for other AI skills. It contains security-related datasets in data/red-team-guide.md and data/test-cases.yaml, including payloads for SQL injection and path traversal. These are documented test vectors for evaluation and do not represent malicious intent by the skill itself.
[SAFE]: The scripts/benchmark_db.py utility uses parameterized SQL queries with placeholders, protecting the local benchmark database from SQL injection during data insertion and management.
[SAFE]: The interfaces/hidden_tests.py module uses Base64 and a simplified XOR encryption to hide test case inputs and expected outputs. This is a recognized technique in model evaluation to prevent data leakage and overfitting during training or evaluation cycles.
[SAFE]: All referenced external sources in data/evaluation-standards.md target well-known open-source repositories and community standards for AI skill development and security scanning, serving as educational and procedural references.

benchmark-store