multi-source-data-merger
Multi Source Data Merger
Overview
This skill guides the process of merging data from multiple sources with different formats into a unified dataset. It covers reading heterogeneous file formats, applying field name mappings, resolving conflicts using priority ordering, and generating comprehensive output files including conflict reports.
Workflow
Step 1: Analyze Requirements and Source Files
Before writing any code, thoroughly understand the task:
- Identify all source files and their formats (JSON, CSV, Parquet, XML, etc.)
- Determine the merge key (e.g.,
user_id,record_id) that links records across sources - Review field mapping requirements - source fields may have different names that map to common output fields
- Understand conflict resolution rules - typically based on source priority ordering
- Identify expected output formats and structure
Important: Do not attempt to read binary formats (Parquet, Excel, etc.) as text files - use appropriate libraries.
More from letta-ai/skills
extracting-pdf-text
Extract text from PDFs for LLM consumption. Use when processing PDFs for RAG, document analysis, or text extraction. Supports API services (Mistral OCR) and local tools (PyMuPDF, pdfplumber). Handles text-based PDFs, tables, and scanned documents with OCR.
257imessage
Send and read iMessages/SMS from macOS. Use for texting contacts, scheduling services, or automating message-based workflows. Triggers on queries about texting, messaging, SMS, iMessage, or contacting someone via text.
206video-processing
Guide for video analysis and frame-level event detection tasks using OpenCV and similar libraries. This skill should be used when detecting events in videos (jumps, movements, gestures), extracting frames, analyzing motion patterns, or implementing computer vision algorithms on video data. It provides verification strategies and helps avoid common pitfalls in video processing workflows.
189letta-api-client
Build applications with the Letta API — a model-agnostic, stateful API for building persistent agents with memory and long-term learning. Covers SDK patterns for Python and TypeScript. Includes 24 working code examples.
153google-workspace
Connect to Gmail and Google Calendar via OAuth 2.0. Use when users want to search/read emails, create drafts, search calendar events, check availability, or schedule meetings. Triggers on queries about email, inbox, calendar, schedule, or meetings.
127portfolio-optimization
Guidance for implementing high-performance portfolio optimization using Python C extensions. This skill applies when tasks require optimizing financial computations (matrix operations, covariance calculations, portfolio risk metrics) by implementing C extensions for Python. Use when performance speedup requirements exist (e.g., 1.2x or greater) and the task involves numerical computations on large datasets (thousands of assets).
101