hugging-face-datasets

Installation
Summary

Create, query, and manage datasets on Hugging Face Hub with SQL-based transformation and streaming updates.

  • Initialize new dataset repositories with template-based schemas (chat, classification, QA, completion, tabular) and custom system prompts
  • Query any Hugging Face dataset using DuckDB SQL via the hf:// protocol, including filtering, aggregations, joins, and regex operations
  • Stream rows efficiently without downloading entire datasets, with JSON validation and batch processing for large uploads
  • Export query results locally (Parquet, JSONL) or push transformed subsets directly to new Hub repositories with optional privacy settings
  • Designed to complement the HF MCP server: use MCP for discovery and metadata, use this skill for creation, editing, and data transformation
SKILL.md

Overview

This skill provides tools to manage datasets on the Hugging Face Hub with a focus on creation, configuration, content management, and SQL-based data manipulation. It is designed to complement the existing Hugging Face MCP server by providing dataset editing and querying capabilities.

Integration with HF MCP Server

  • Use HF MCP Server for: Dataset discovery, search, and metadata retrieval
  • Use This Skill for: Dataset creation, content editing, SQL queries, data transformation, and structured data formatting

Version

2.1.0

Dependencies

This skill uses PEP 723 scripts with inline dependency management

Scripts auto-install requirements when run with: uv run scripts/script_name.py

  • uv (Python package manager)
  • Getting Started: See "Usage Instructions" below for PEP 723 usage

Core Capabilities

Related skills

More from huggingface/skills

Installs
368
GitHub Stars
10.5K
First Seen
Jan 20, 2026