Building a Simple MLOps Pipeline - A Developer's Guide

Learn how to build a practical MLOps pipeline from scratch. A hands-on guide covering automation, containerization, experiment tracking, and deployment for getting ML models into production.

Introduction

As a developer, I've found that deploying machine learning models isn't just about training and testing. The real challenge is getting models into production reliably, reproducibly, and with minimal manual effort. This guide walks through building a simple yet effective MLOps pipeline that actually works in the real world.

Why MLOps?

Before diving into code, let's be clear about what we're solving. Without proper MLOps practices, you'll face:

Manual deployments that take forever
"Works on my laptop" syndrome
Can't reproduce that model from last month
No idea what's happening in production

The goal is simple: automate everything so you can focus on improving models, not fighting infrastructure.

Step 1: Define Your Workflow

Start by mapping out what needs to happen:

Data → Preprocess → Train → Evaluate → Version → Deploy → Monitor

Don't overcomplicate it. You need:

A way to load and prepare data
Scripts to train models
Somewhere to store model versions
A method to deploy
Basic monitoring

Step 2: Write Clean Training Code

Begin with a simple training script. Make it repeatable and parameterized.

# train.py
import argparse
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib
 
def load_data(data_path):
    """Load and split your data."""
    df = pd.read_csv(data_path)
    X = df.drop('target', axis=1)
    y = df['target']
    return train_test_split(X, y, test_size=0.2)
 
def train_model(X_train, y_train, n_estimators=100):
    """Train your model."""
    model = RandomForestClassifier(n_estimators=n_estimators)
    model.fit(X_train, y_train)
    return model
 
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--data-path', required=True)
    parser.add_argument('--n-estimators', type=int, default=100)
    args = parser.parse_args()
    
    # Load and train
    X_train, X_test, y_train, y_test = load_data(args.data_path)
    model = train_model(X_train, y_train, args.n_estimators)
    
    # Evaluate
    accuracy = accuracy_score(y_test, model.predict(X_test))
    print(f"Accuracy: {accuracy:.3f}")
    
    # Save
    joblib.dump(model, 'model.pkl')
 
if __name__ == '__main__':
    main()

Keep it simple. Add complexity only when needed.

Step 3: Containerize Everything

Docker ensures your code runs the same everywhere. No more "but it worked on my machine."

FROM python:3.10-slim
 
WORKDIR /app
 
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
# Copy code
COPY . .
 
# Run training
CMD ["python", "train.py", "--data-path", "/data/train.csv"]

Build and test locally:

docker build -t ml-training:latest .
docker run -v $(pwd)/data:/data ml-training:latest

Step 4: Track Experiments with MLflow

You need to remember what you tried. MLflow makes this easy.

import mlflow
 
# Start tracking
mlflow.set_experiment("my-model-training")
 
with mlflow.start_run():
    # Log parameters
    mlflow.log_param("n_estimators", args.n_estimators)
    mlflow.log_param("data_path", args.data_path)
    
    # Train model
    model = train_model(X_train, y_train, args.n_estimators)
    
    # Log metrics
    accuracy = accuracy_score(y_test, model.predict(X_test))
    mlflow.log_metric("accuracy", accuracy)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")

Run MLflow UI to see all experiments:

mlflow ui

Now you can compare runs, see what worked, and pick the best model.

Step 5: Automate with CI/CD

Manual work is error-prone. Automate the boring stuff.

Create .github/workflows/train.yml:

name: Train Model
 
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Train daily at 2 AM
 
jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Build Docker image
        run: docker build -t ml-training:${{ github.sha }} .
      
      - name: Run training
        run: |
          docker run \
            -e MLFLOW_TRACKING_URI=${{ secrets.MLFLOW_URI }} \
            ml-training:${{ github.sha }}
      
      - name: Check if model improved
        run: python scripts/check_metrics.py

Now every code push triggers training. No manual work needed.

Step 6: Deploy on Kubernetes

Kubernetes handles scaling and reliability for you.

Create a simple deployment:

# k8s/deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: model-server
        image: ml-model:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"

Deploy it:

kubectl apply -f k8s/deployment.yml
kubectl expose deployment ml-model-server --type=LoadBalancer --port=80

Kubernetes now manages your model servers, restarts them if they crash, and scales them based on load.

Step 7: Serve Models with FastAPI

Create a simple API to serve predictions:

# serve.py
from fastapi import FastAPI
import mlflow.sklearn
import pandas as pd
 
app = FastAPI()
 
# Load model on startup
model = mlflow.sklearn.load_model("models:/my-model/production")
 
@app.get("/health")
def health():
    return {"status": "healthy"}
 
@app.post("/predict")
def predict(features: dict):
    # Convert to DataFrame
    df = pd.DataFrame([features])
    
    # Make prediction
    prediction = model.predict(df)[0]
    probability = model.predict_proba(df)[0]
    
    return {
        "prediction": int(prediction),
        "probability": float(max(probability))
    }

Run it:

uvicorn serve:app --host 0.0.0.0 --port 8000

Test it:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"feature1": 1.0, "feature2": 2.0}'

Step 8: Monitor What Matters

You need to know what's happening in production. Start simple:

# Add to serve.py
from prometheus_client import Counter, Histogram
import time
 
# Define metrics
prediction_counter = Counter('predictions_total', 'Total predictions')
prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')
 
@app.post("/predict")
def predict(features: dict):
    start_time = time.time()
    
    # Make prediction
    prediction = model.predict(df)[0]
    
    # Track metrics
    prediction_counter.inc()
    prediction_latency.observe(time.time() - start_time)
    
    return {"prediction": int(prediction)}

Deploy Prometheus and Grafana to visualize metrics. Set up alerts for:

Prediction latency > 500ms
Error rate > 1%
Request volume drops

Step 9: Handle Model Updates

When you train a better model, deploy it safely:

# scripts/deploy_model.py
import mlflow
 
# Get latest model from MLflow
client = mlflow.tracking.MlflowClient()
latest_version = client.get_latest_versions("my-model", stages=["None"])[0]
 
# Check if it's better than production
prod_version = client.get_latest_versions("my-model", stages=["Production"])[0]
 
if latest_version.metrics['accuracy'] > prod_version.metrics['accuracy']:
    # Promote to production
    client.transition_model_version_stage(
        name="my-model",
        version=latest_version.version,
        stage="Production"
    )
    print(f"Deployed version {latest_version.version}")
else:
    print("New model not better, keeping current")

Add rollback capability:

# If something breaks, rollback
kubectl rollout undo deployment/ml-model-server

Real-World Results

Here's what this pipeline delivered:

Before:

Manual deployments: 3-5 days
No experiment tracking
Can't reproduce old models
Deployments break frequently

After:

Automated deployments: 4 hours
All experiments logged in MLflow
Any model reproducible from version
Rollback in seconds if issues occur

Key Wins:

Data scientists focus on models, not DevOps
Deploy confidently with automated testing
Easy to onboard new team members
Clear visibility into what's running in production

Lessons Learned

What Actually Matters

Start here:

Containerize your code - solves 80% of environment issues
Track experiments - you'll thank yourself later
Automate deployment - manual is too slow and error-prone
Basic monitoring - know when things break

Add later:

Advanced orchestration
Feature stores
Real-time pipelines
Multi-model serving

Common Pitfalls to Avoid

Don't over-engineer:

Start simple, add complexity when needed
You don't need Kubernetes on day one
Simple Python scripts beat complex frameworks initially

Don't skip monitoring:

Add it early, not as an afterthought
Start with basic metrics (latency, errors)
Alerts should wake you only when critical

Don't forget reproducibility:

Pin dependency versions
Version your data
Log everything in MLflow

Tech Choices That Worked

Python + FastAPI:

Easy to write and maintain
Great performance for ML serving
Huge ecosystem

Docker + Kubernetes:

Standard tools everyone knows
Scales well when needed
Good local development story

MLflow:

Open source, no lock-in
Simple to get started
Powerful enough to grow with you

GitHub Actions:

Free for public repos
Easy YAML configuration
Good integration with everything

What's Next

This pipeline is a foundation, not the finish line. Here's what I'm adding:

Current Focus:

A/B testing framework for comparing models
Automated hyperparameter tuning
Better data validation
Cost tracking and optimization

Future Plans:

Feature store for reusable features
Real-time training for streaming data
Model explanability dashboard
Edge deployment for low-latency use cases

The Bottom Line

Building an MLOps pipeline doesn't require months of work or a huge team. Start with these core pieces:

Reproducible training (Docker + version control)
Experiment tracking (MLflow)
Automated deployment (CI/CD + containers)
Basic monitoring (metrics + alerts)

Get these working, then iterate. Don't try to build the perfect system upfront. Build something that works, ship it, and improve based on real usage.

The goal isn't perfection. The goal is to spend less time on infrastructure and more time improving your models.

Quick Start Checklist

Want to build your own? Here's the path:

Write training script with arguments
Create Dockerfile for training
Add MLflow tracking to training script
Set up CI/CD to run on code push
Create simple FastAPI serving endpoint
Deploy to cloud (start with single server)
Add basic metrics (latency, error rate)
Set up one alert (high error rate)
Document for your team

Do these nine things and you'll have a working pipeline. Everything else can wait.