| name | deploy |
| description | Deploy applications to AWS (SageMaker, Amplify, EC2). Use this skill to deploy models, frontends, or manage infrastructure. Invoke with /deploy. |
AWS Deployment
This skill manages deployments to AWS services for the wc_simd project.
SageMaker Endpoints
Deploy Embedding Model
cd demos/timetrvlr/cdk
npm install
cdk deploy
Or manually:
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
model = HuggingFaceModel(
model_data="s3://bucket/model.tar.gz",
role="arn:aws:iam::xxx:role/SageMakerRole",
transformers_version="4.37",
pytorch_version="2.1",
py_version="py310"
)
predictor = model.deploy(
instance_type="ml.g5.xlarge",
endpoint_name="embedding-endpoint"
)
Async Inference
For long-running inference (VLM embeddings):
from sagemaker.async_inference import AsyncInferenceConfig
async_config = AsyncInferenceConfig(
output_path="s3://bucket/async-output/",
max_concurrent_invocations_per_instance=4
)
predictor = model.deploy(
instance_type="ml.g5.2xlarge",
async_inference_config=async_config
)
SageMaker Auto-Scaling & 504 Errors
Common Issue: Endpoint returns 504 "Service Unavailable" after periods of inactivity.
Cause: Auto-scaling with MinCapacity=0 scales down to zero instances. When a request comes in, the endpoint enters "Updating" state while scaling up (~5-10 min).
Check current scaling config:
aws application-autoscaling describe-scalable-targets \
--service-namespace sagemaker \
--resource-ids "endpoint/<ENDPOINT_NAME>/variant/AllTraffic" \
--region eu-west-2
Fix: Keep at least 1 instance running (prevents scale-to-zero):
aws application-autoscaling register-scalable-target \
--service-namespace sagemaker \
--resource-id "endpoint/<ENDPOINT_NAME>/variant/AllTraffic" \
--scalable-dimension "sagemaker:variant:DesiredInstanceCount" \
--min-capacity 1 \
--max-capacity 1 \
--region eu-west-2
Revert to scale-to-zero (saves costs when not in use):
aws application-autoscaling register-scalable-target \
--service-namespace sagemaker \
--resource-id "endpoint/<ENDPOINT_NAME>/variant/AllTraffic" \
--scalable-dimension "sagemaker:variant:DesiredInstanceCount" \
--min-capacity 0 \
--max-capacity 1 \
--region eu-west-2
Cost note: ml.g4dn.xlarge costs $0.526/hour ($380/month) when always running.
Update SageMaker Endpoint with New Docker Image
After pushing a new image to ECR:
TIMESTAMP=$(date +%s)
NEW_MODEL_NAME="EmbeddingModel-$TIMESTAMP"
NEW_CONFIG_NAME="EmbeddingEndpointConfig-$TIMESTAMP"
ENDPOINT_NAME="EmbeddingEndpoint-u6w61sZPU1fj"
ECR_IMAGE="760097843905.dkr.ecr.eu-west-2.amazonaws.com/embed-inference:latest"
# 1. Create new model
aws sagemaker create-model \
--model-name "$NEW_MODEL_NAME" \
--primary-container Image=$ECR_IMAGE,Mode=SingleModel \
--execution-role-arn "arn:aws:iam::760097843905:role/EmbeddingEndpointStack-EmbeddingModelExecutionRole3-AXtNk8S08NEo" \
--region eu-west-2
# 2. Create new endpoint config
aws sagemaker create-endpoint-config \
--endpoint-config-name "$NEW_CONFIG_NAME" \
--production-variants VariantName=AllTraffic,ModelName=$NEW_MODEL_NAME,InitialInstanceCount=1,InstanceType=ml.g4dn.xlarge,InitialVariantWeight=1,ContainerStartupHealthCheckTimeoutInSeconds=600 \
--async-inference-config "ClientConfig={MaxConcurrentInvocationsPerInstance=1},OutputConfig={S3OutputPath=s3://embeddingendpointstack-asyncoutputbucketea73fa4d-gsaebf9dvszc/results/,S3FailurePath=s3://embeddingendpointstack-asyncoutputbucketea73fa4d-gsaebf9dvszc/failures/}" \
--region eu-west-2
# 3. Update endpoint (takes 5-10 min)
aws sagemaker update-endpoint \
--endpoint-name "$ENDPOINT_NAME" \
--endpoint-config-name "$NEW_CONFIG_NAME" \
--region eu-west-2
# 4. Wait for update
watch -n 30 "aws sagemaker describe-endpoint --endpoint-name $ENDPOINT_NAME --region eu-west-2 --query 'EndpointStatus' --output text"
AWS Amplify (Frontend)
TimeTraveler Demo
cd demos/timetrvlr/amplify-cdk
npm install
cdk deploy
The CDK stack:
- Connects to GitHub repository
- Sets up build pipeline
- Configures custom domain (optional)
- Deploys Next.js/React frontend
Manual Amplify Setup
amplify init
amplify add hosting
amplify publish
EC2 Instances
Start/Stop via Script
python aws/ec2_control.py start --name simd_gpu
python aws/ec2_control.py stop --name simd_gpu
Launch New Instance
Use AWS Console or CLI:
aws ec2 run-instances \
--image-id ami-xxx \
--instance-type g5.xlarge \
--key-name your-key \
--security-group-ids sg-xxx \
--iam-instance-profile Name=spark-docker-s3-profile
S3 Data Management
Upload Data
aws s3 sync data/ s3://bucket/data/
Download Data
aws s3 sync s3://bucket/data/ data/
RDS (Hive Metastore)
The production Spark stack uses RDS MySQL for the Hive metastore.
Connect Manually
mysql -h <rds-endpoint> -u hive -p hive
Initialize Schema
Set INIT_HIVE_SCHEMA=true in spark_docker_s3/.env on first run.
CDK Stacks
| Stack | Location | Purpose |
|---|---|---|
SparkDockerS3Stack |
spark_docker_s3/infra/ |
S3 bucket, RDS, IAM roles |
TimetrvlrStack |
demos/timetrvlr/cdk/ |
SageMaker endpoint |
AmplifyStack |
demos/timetrvlr/amplify-cdk/ |
Frontend hosting |
Deploy CDK Stack
cd <stack-directory>
npm install
cdk bootstrap # First time only
cdk synth # Preview
cdk deploy # Deploy
Destroy Stack
cdk destroy
Environment Variables
Required in .env:
AWS_REGION=eu-west-2
S3_BUCKET=your-bucket
HIVE_METASTORE_HOST=rds-endpoint
HIVE_METASTORE_USER=hive
HIVE_METASTORE_PASSWORD=xxx
Load with:
from dotenv import load_dotenv
load_dotenv()