ingesting-into-data-lake

Installation
SKILL.md

Ingest into Data Lake

Move data from a source into a queryable table in the data lake. This skill assumes the source connection (if one is needed) already exists. For Glue connection setup or troubleshooting, delegate to connecting-to-data-source.

Philosophy

Default to S3 Tables unless the environment says otherwise. S3 Tables is the recommended target for new data lake work. If the user's catalog inventory shows they haven't adopted S3 Tables, recommend standard Iceberg on their existing general-purpose bucket instead of forcing them to change posture.

Common Tasks

You MUST execute commands using AWS MCP server tools when connected -- they provide validation, sandboxed execution, and audit logging. Fall back to AWS CLI only if MCP is unavailable. You MUST explain each step before executing.

Workflow

1. Verify Dependencies and Context

  • You MUST check whether AWS MCP tools or AWS CLI are available and inform the user if missing
  • You MUST confirm target AWS region and verify credentials with aws sts get-caller-identity
  • For SageMaker Unified Studio project roles, note that target tables and connections may be scoped to the project. See the caller ARN detection pattern in querying-data-lake.
Installs
1.0K
GitHub Stars
829
First Seen
May 6, 2026
ingesting-into-data-lake — aws/agent-toolkit-for-aws