Serverless Data Processing with AWS Lambda and Go
A practical guide to building scalable data processing with AWS Lambda and Go. Learn how to handle millions of DynamoDB records efficiently using event-driven architecture.
Why Serverless for Data Processing?
Processing millions of records daily doesn't mean you need servers running 24/7. AWS Lambda lets you run code only when needed, scale automatically, and pay only for what you use. Here's how to build a data processing pipeline that actually works.
The Architecture
Simple flow that scales:
DynamoDB Streams → EventBridge → Lambda (Go) → S3/DynamoDB
Why this stack:
- Lambda for zero-ops compute
- Go for fast execution and low memory
- DynamoDB Streams for change capture
- EventBridge for event routing
- S3 for cheap storage
Step 1: Set Up Your Project
Start with a clean Go project structure:
mkdir serverless-processor && cd serverless-processor
go mod init serverless-processor
mkdir -p cmd/processor internal/handler pkg/modelsProject layout:
.
├── cmd/
│ └── processor/ # Lambda entry point
├── internal/
│ └── handler/ # Business logic
├── pkg/
│ └── models/ # Data structures
└── terraform/ # Infrastructure
Step 2: Write Your Lambda Handler
Keep it simple. Here's a basic Lambda handler in Go:
// cmd/processor/main.go
package main
import (
"context"
"encoding/json"
"log"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/s3"
)
// Initialize AWS clients outside handler for reuse
var s3Client *s3.Client
func init() {
cfg, err := config.LoadDefaultConfig(context.TODO())
if err != nil {
log.Fatal(err)
}
s3Client = s3.NewFromConfig(cfg)
}
func handler(ctx context.Context, event events.DynamoDBEvent) error {
log.Printf("Processing %d records", len(event.Records))
for _, record := range event.Records {
// Process each record
if err := processRecord(record); err != nil {
log.Printf("Error processing record: %v", err)
return err
}
}
return nil
}
func processRecord(record events.DynamoDBEventRecord) error {
// Extract data from DynamoDB stream
newImage := record.Change.NewImage
// Transform and process
data := extractData(newImage)
// Save to S3 or write back to DynamoDB
return saveData(data)
}
func main() {
lambda.Start(handler)
}Key points:
- Initialize AWS clients in
init()for connection reuse - Handle errors explicitly
- Log what matters
- Keep functions small
Step 3: Connect to DynamoDB Streams
Enable streams on your DynamoDB table, then connect Lambda:
# Enable streams
aws dynamodb update-table \
--table-name my-table \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGESYour Lambda automatically gets triggered when data changes. Configure the event source:
// Lambda gets batches of records
// Process them in parallel with goroutines
func handler(ctx context.Context, event events.DynamoDBEvent) error {
results := make(chan error, len(event.Records))
for _, record := range event.Records {
go func(r events.DynamoDBEventRecord) {
results <- processRecord(r)
}(record)
}
// Collect results
for range event.Records {
if err := <-results; err != nil {
return err
}
}
return nil
}Tips:
- Batch size of 100 works well
- Add a dead letter queue for failed events
- Use goroutines for parallel processing
- Set appropriate timeout (30-60s)
Step 4: Add EventBridge for Flexibility
EventBridge lets you trigger Lambda from multiple sources:
// Handle different event types
func router(ctx context.Context, event json.RawMessage) error {
var eventType struct {
Source string `json:"source"`
}
json.Unmarshal(event, &eventType)
switch eventType.Source {
case "aws.dynamodb":
return handleDynamoDB(event)
case "aws.s3":
return handleS3(event)
case "custom.scheduled":
return handleScheduled(event)
default:
return fmt.Errorf("unknown source: %s", eventType.Source)
}
}Schedule periodic processing:
# Run every hour
aws events put-rule \
--name hourly-processing \
--schedule-expression "rate(1 hour)"Step 5: Save Processed Data
Write to S3 for long-term storage:
func saveToS3(data []byte, key string) error {
_, err := s3Client.PutObject(context.TODO(), &s3.PutObjectInput{
Bucket: aws.String("my-processed-data"),
Key: aws.String(key),
Body: bytes.NewReader(data),
})
return err
}Or back to DynamoDB for real-time queries:
func saveToDynamoDB(item map[string]types.AttributeValue) error {
_, err := dynamoClient.PutItem(context.TODO(), &dynamodb.PutItemInput{
TableName: aws.String("processed-data"),
Item: item,
})
return err
}Storage tips:
- S3: partition by date, compress with gzip
- DynamoDB: use batch writes, set TTL for cleanup
- Add retry logic for transient failures
Step 6: Deploy with Terraform
Automate infrastructure with Terraform:
# terraform/main.tf
resource "aws_lambda_function" "processor" {
filename = "function.zip"
function_name = "data-processor"
role = aws_iam_role.lambda_role.arn
handler = "main"
runtime = "go1.x"
timeout = 60
memory_size = 1024
environment {
variables = {
BUCKET_NAME = aws_s3_bucket.data.id
}
}
}
resource "aws_lambda_event_source_mapping" "dynamodb" {
event_source_arn = aws_dynamodb_table.source.stream_arn
function_name = aws_lambda_function.processor.arn
starting_position = "LATEST"
batch_size = 100
}
resource "aws_s3_bucket" "data" {
bucket = "processed-data-${var.environment}"
}Deploy:
terraform init
terraform plan
terraform applyStep 7: Build and Deploy
Build your Go binary for Lambda:
# Build for Lambda (Linux)
GOOS=linux GOARCH=amd64 go build -o main cmd/processor/main.go
# Zip it
zip function.zip main
# Deploy with Terraform
terraform applyTest it:
# Trigger by writing to DynamoDB
aws dynamodb put-item \
--table-name my-table \
--item '{"id":{"S":"123"},"data":{"S":"test"}}'
# Check logs
aws logs tail /aws/lambda/data-processor --followStep 8: Monitor and Optimize
Add basic CloudWatch alarms:
resource "aws_cloudwatch_metric_alarm" "errors" {
alarm_name = "lambda-errors"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "Errors"
namespace = "AWS/Lambda"
period = 60
statistic = "Sum"
threshold = 5
alarm_description = "Lambda error rate too high"
}What to monitor:
- Error rate
- Duration (optimize if >80% of timeout)
- Throttles
- DLQ messages
- Cost
Optimization tips:
- Test different memory sizes (more memory = faster CPU)
- Keep binary size small
- Reuse connections
- Use goroutines for parallel work
Real-World Results
Before (EC2-based):
- Servers running 24/7
- Manual scaling
- High operational overhead
- ~$800/month
After (Lambda + Go):
- Pay only for execution
- Auto-scales to any load
- Zero maintenance
- ~$320/month (60% cost reduction)
Processing performance:
- Handles millions of records daily
- Sub-second latency per batch
- 99.9% success rate
- Fast cold starts (~100ms with Go)
Why Go Works Great Here
Performance benefits:
- Compiled binary = fast startup
- Low memory footprint
- Built-in concurrency with goroutines
- Single binary deployment (no dependencies)
vs Python/Node.js:
- 10x faster cold starts
- 50% less memory usage
- Type safety catches errors early
- Better for CPU-intensive work
Common Issues and Fixes
Cold starts taking too long?
- Minimize binary size
- Initialize clients in
init() - Use provisioned concurrency for critical paths
Memory errors?
- Test with different memory sizes (128MB to 3GB)
- Monitor CloudWatch metrics
- More memory = faster CPU too
DynamoDB throttling?
- Adjust batch size
- Add exponential backoff
- Check table capacity
Costs higher than expected?
- Right-size memory allocation
- Set appropriate timeouts
- Use S3 lifecycle policies
- Monitor with AWS Cost Explorer
Key Takeaways
What works:
- Go for Lambda = excellent performance
- Event-driven architecture scales effortlessly
- Terraform makes infrastructure predictable
- Start simple, add complexity when needed
What to remember:
- Always add a dead letter queue
- Monitor from day one
- Test with production-like data volumes
- Version your infrastructure
Cost optimization:
- Pay only for what you use
- Right-size everything
- Use batch processing
- Archive old data to Glacier
Quick Start Checklist
- Set up Go project structure
- Write basic Lambda handler
- Enable DynamoDB Streams
- Connect Lambda to streams
- Add EventBridge rules
- Deploy with Terraform
- Set up CloudWatch alarms
- Test with real data
- Monitor costs
Start here, then iterate based on your needs.