Command Palette

Search for a command to run...

Blog
PreviousNext

Serverless Data Processing with AWS Lambda and Go

A practical guide to building scalable data processing with AWS Lambda and Go. Learn how to handle millions of DynamoDB records efficiently using event-driven architecture.

Why Serverless for Data Processing?

Processing millions of records daily doesn't mean you need servers running 24/7. AWS Lambda lets you run code only when needed, scale automatically, and pay only for what you use. Here's how to build a data processing pipeline that actually works.

The Architecture

Simple flow that scales:

DynamoDB Streams → EventBridge → Lambda (Go) → S3/DynamoDB

Why this stack:

  • Lambda for zero-ops compute
  • Go for fast execution and low memory
  • DynamoDB Streams for change capture
  • EventBridge for event routing
  • S3 for cheap storage

Step 1: Set Up Your Project

Start with a clean Go project structure:

mkdir serverless-processor && cd serverless-processor
go mod init serverless-processor
 
mkdir -p cmd/processor internal/handler pkg/models

Project layout:

.
├── cmd/
│   └── processor/      # Lambda entry point
├── internal/
│   └── handler/        # Business logic
├── pkg/
│   └── models/         # Data structures
└── terraform/          # Infrastructure

Step 2: Write Your Lambda Handler

Keep it simple. Here's a basic Lambda handler in Go:

// cmd/processor/main.go
package main
 
import (
    "context"
    "encoding/json"
    "log"
    
    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/s3"
)
 
// Initialize AWS clients outside handler for reuse
var s3Client *s3.Client
 
func init() {
    cfg, err := config.LoadDefaultConfig(context.TODO())
    if err != nil {
        log.Fatal(err)
    }
    s3Client = s3.NewFromConfig(cfg)
}
 
func handler(ctx context.Context, event events.DynamoDBEvent) error {
    log.Printf("Processing %d records", len(event.Records))
    
    for _, record := range event.Records {
        // Process each record
        if err := processRecord(record); err != nil {
            log.Printf("Error processing record: %v", err)
            return err
        }
    }
    
    return nil
}
 
func processRecord(record events.DynamoDBEventRecord) error {
    // Extract data from DynamoDB stream
    newImage := record.Change.NewImage
    
    // Transform and process
    data := extractData(newImage)
    
    // Save to S3 or write back to DynamoDB
    return saveData(data)
}
 
func main() {
    lambda.Start(handler)
}

Key points:

  • Initialize AWS clients in init() for connection reuse
  • Handle errors explicitly
  • Log what matters
  • Keep functions small

Step 3: Connect to DynamoDB Streams

Enable streams on your DynamoDB table, then connect Lambda:

# Enable streams
aws dynamodb update-table \
  --table-name my-table \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

Your Lambda automatically gets triggered when data changes. Configure the event source:

// Lambda gets batches of records
// Process them in parallel with goroutines
func handler(ctx context.Context, event events.DynamoDBEvent) error {
    results := make(chan error, len(event.Records))
    
    for _, record := range event.Records {
        go func(r events.DynamoDBEventRecord) {
            results <- processRecord(r)
        }(record)
    }
    
    // Collect results
    for range event.Records {
        if err := <-results; err != nil {
            return err
        }
    }
    
    return nil
}

Tips:

  • Batch size of 100 works well
  • Add a dead letter queue for failed events
  • Use goroutines for parallel processing
  • Set appropriate timeout (30-60s)

Step 4: Add EventBridge for Flexibility

EventBridge lets you trigger Lambda from multiple sources:

// Handle different event types
func router(ctx context.Context, event json.RawMessage) error {
    var eventType struct {
        Source string `json:"source"`
    }
    
    json.Unmarshal(event, &eventType)
    
    switch eventType.Source {
    case "aws.dynamodb":
        return handleDynamoDB(event)
    case "aws.s3":
        return handleS3(event)
    case "custom.scheduled":
        return handleScheduled(event)
    default:
        return fmt.Errorf("unknown source: %s", eventType.Source)
    }
}

Schedule periodic processing:

# Run every hour
aws events put-rule \
  --name hourly-processing \
  --schedule-expression "rate(1 hour)"

Step 5: Save Processed Data

Write to S3 for long-term storage:

func saveToS3(data []byte, key string) error {
    _, err := s3Client.PutObject(context.TODO(), &s3.PutObjectInput{
        Bucket: aws.String("my-processed-data"),
        Key:    aws.String(key),
        Body:   bytes.NewReader(data),
    })
    return err
}

Or back to DynamoDB for real-time queries:

func saveToDynamoDB(item map[string]types.AttributeValue) error {
    _, err := dynamoClient.PutItem(context.TODO(), &dynamodb.PutItemInput{
        TableName: aws.String("processed-data"),
        Item:      item,
    })
    return err
}

Storage tips:

  • S3: partition by date, compress with gzip
  • DynamoDB: use batch writes, set TTL for cleanup
  • Add retry logic for transient failures

Step 6: Deploy with Terraform

Automate infrastructure with Terraform:

# terraform/main.tf
resource "aws_lambda_function" "processor" {
  filename         = "function.zip"
  function_name    = "data-processor"
  role            = aws_iam_role.lambda_role.arn
  handler         = "main"
  runtime         = "go1.x"
  timeout         = 60
  memory_size     = 1024
 
  environment {
    variables = {
      BUCKET_NAME = aws_s3_bucket.data.id
    }
  }
}
 
resource "aws_lambda_event_source_mapping" "dynamodb" {
  event_source_arn  = aws_dynamodb_table.source.stream_arn
  function_name     = aws_lambda_function.processor.arn
  starting_position = "LATEST"
  batch_size        = 100
}
 
resource "aws_s3_bucket" "data" {
  bucket = "processed-data-${var.environment}"
}

Deploy:

terraform init
terraform plan
terraform apply

Step 7: Build and Deploy

Build your Go binary for Lambda:

# Build for Lambda (Linux)
GOOS=linux GOARCH=amd64 go build -o main cmd/processor/main.go
 
# Zip it
zip function.zip main
 
# Deploy with Terraform
terraform apply

Test it:

# Trigger by writing to DynamoDB
aws dynamodb put-item \
  --table-name my-table \
  --item '{"id":{"S":"123"},"data":{"S":"test"}}'
 
# Check logs
aws logs tail /aws/lambda/data-processor --follow

Step 8: Monitor and Optimize

Add basic CloudWatch alarms:

resource "aws_cloudwatch_metric_alarm" "errors" {
  alarm_name          = "lambda-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name        = "Errors"
  namespace          = "AWS/Lambda"
  period             = 60
  statistic          = "Sum"
  threshold          = 5
  alarm_description  = "Lambda error rate too high"
}

What to monitor:

  • Error rate
  • Duration (optimize if >80% of timeout)
  • Throttles
  • DLQ messages
  • Cost

Optimization tips:

  • Test different memory sizes (more memory = faster CPU)
  • Keep binary size small
  • Reuse connections
  • Use goroutines for parallel work

Real-World Results

Before (EC2-based):

  • Servers running 24/7
  • Manual scaling
  • High operational overhead
  • ~$800/month

After (Lambda + Go):

  • Pay only for execution
  • Auto-scales to any load
  • Zero maintenance
  • ~$320/month (60% cost reduction)

Processing performance:

  • Handles millions of records daily
  • Sub-second latency per batch
  • 99.9% success rate
  • Fast cold starts (~100ms with Go)

Why Go Works Great Here

Performance benefits:

  • Compiled binary = fast startup
  • Low memory footprint
  • Built-in concurrency with goroutines
  • Single binary deployment (no dependencies)

vs Python/Node.js:

  • 10x faster cold starts
  • 50% less memory usage
  • Type safety catches errors early
  • Better for CPU-intensive work

Common Issues and Fixes

Cold starts taking too long?

  • Minimize binary size
  • Initialize clients in init()
  • Use provisioned concurrency for critical paths

Memory errors?

  • Test with different memory sizes (128MB to 3GB)
  • Monitor CloudWatch metrics
  • More memory = faster CPU too

DynamoDB throttling?

  • Adjust batch size
  • Add exponential backoff
  • Check table capacity

Costs higher than expected?

  • Right-size memory allocation
  • Set appropriate timeouts
  • Use S3 lifecycle policies
  • Monitor with AWS Cost Explorer

Key Takeaways

What works:

  • Go for Lambda = excellent performance
  • Event-driven architecture scales effortlessly
  • Terraform makes infrastructure predictable
  • Start simple, add complexity when needed

What to remember:

  • Always add a dead letter queue
  • Monitor from day one
  • Test with production-like data volumes
  • Version your infrastructure

Cost optimization:

  • Pay only for what you use
  • Right-size everything
  • Use batch processing
  • Archive old data to Glacier

Quick Start Checklist

  • Set up Go project structure
  • Write basic Lambda handler
  • Enable DynamoDB Streams
  • Connect Lambda to streams
  • Add EventBridge rules
  • Deploy with Terraform
  • Set up CloudWatch alarms
  • Test with real data
  • Monitor costs

Start here, then iterate based on your needs.