sentencepiece

Originally fromovachiever/droid-tings

Installation

SKILL.md

SentencePiece - Language-Independent Tokenization

Unsupervised tokenizer that works on raw text without language-specific preprocessing.

When to use SentencePiece

Use SentencePiece when:

Building multilingual models (no language-specific rules)
Working with CJK languages (Chinese, Japanese, Korean)
Need reproducible tokenization (deterministic vocabulary)
Want to train on raw text (no pre-tokenization needed)
Require lightweight deployment (6MB memory, 50k sentences/sec)

Performance:

Speed: 50,000 sentences/sec
Memory: ~6MB for loaded model
Languages: All (language-independent)

Installs

106

Repository

zechenzhangagi/…h-skills

GitHub Stars

10.3K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass

sentencepiece — zechenzhangagi/ai-research-skills