helsinki-nlp-model-training
Helsinki-NLP Model Training
Overview
Train and fine-tune Helsinki-NLP OPUS-MT models for Chuukese-English translation. These models are specifically designed for translation tasks and perform better than general LLMs for low-resource languages like Chuukese.
Base Models:
Helsinki-NLP/opus-mt-mul-en(Multilingual → English)Helsinki-NLP/opus-mt-en-mul(English → Multilingual)
Capabilities
- Fine-tuning: Adapt base models to Chuukese-specific data
- Bidirectional Training: Support for both Chuukese→English and English→Chuukese
- Training Data Preparation: Format dictionary and parallel corpus data
- Model Evaluation: BLEU, chrF scoring on test sets
- Local Deployment: Run models locally without API calls
- GPU/CPU Support: Automatic device detection and optimization
More from findinfinitelabs/chuuk
large-document-processing
Process large documents (200+ pages) with structure preservation, intelligent parsing, and memory-efficient handling. Also covers intelligent text chunking for AI training and RAG systems. Use when working with complex formatted documents, multi-level hierarchies, or when splitting large content for AI pipelines.
28python-venv-management
Automatically manage Python virtual environments (.venv) in terminal commands. Always activate .venv before running Python/pip commands. Supports macOS, Linux, and Windows with shell-aware activation. Use when executing Python scripts, installing packages, or running development servers. Critical for consistent environment management.
14bible-epub-processing
Parse and extract structured content from Bible EPUBs (NWT) for parallel text alignment between Chuukese and English. Use when working with Bible data, verse extraction, parallel corpus building, or generating training data from Scripture.
14security-environment-standards
Security and environment configuration standards for web applications, including environment variable management, secure coding practices, and production deployment security. Use when setting up environments, configuring security, or deploying applications.
13intelligent-text-chunking
Split large texts into meaningful, AI-optimized chunks while preserving semantic coherence and document structure. Covered by the large-document-processing skill — see that skill for full details.
13document-ocr-processing
Process scanned documents and images containing Chuukese text using OCR with specialized post-processing for accent characters and traditional formatting. Use when working with scanned books, documents, or images that contain Chuukese text that needs to be digitized.
12