Building a Simple MLOps Pipeline - A Developer's Guide
Learn how to build a practical MLOps pipeline from scratch. A hands-on guide covering automation, containerization, experiment tracking, and deployment for getting ML models into production.
Introduction
As a developer, I've found that deploying machine learning models isn't just about training and testing. The real challenge is getting models into production reliably, reproducibly, and with minimal manual effort. This guide walks through building a simple yet effective MLOps pipeline that actually works in the real world.
Why MLOps?
Before diving into code, let's be clear about what we're solving. Without proper MLOps practices, you'll face:
- Manual deployments that take forever
- "Works on my laptop" syndrome
- Can't reproduce that model from last month
- No idea what's happening in production
The goal is simple: automate everything so you can focus on improving models, not fighting infrastructure.
Step 1: Define Your Workflow
Start by mapping out what needs to happen:
Data → Preprocess → Train → Evaluate → Version → Deploy → Monitor
Don't overcomplicate it. You need:
- A way to load and prepare data
- Scripts to train models
- Somewhere to store model versions
- A method to deploy
- Basic monitoring
Step 2: Write Clean Training Code
Begin with a simple training script. Make it repeatable and parameterized.
# train.py
import argparse
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib
def load_data(data_path):
"""Load and split your data."""
df = pd.read_csv(data_path)
X = df.drop('target', axis=1)
y = df['target']
return train_test_split(X, y, test_size=0.2)
def train_model(X_train, y_train, n_estimators=100):
"""Train your model."""
model = RandomForestClassifier(n_estimators=n_estimators)
model.fit(X_train, y_train)
return model
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--data-path', required=True)
parser.add_argument('--n-estimators', type=int, default=100)
args = parser.parse_args()
# Load and train
X_train, X_test, y_train, y_test = load_data(args.data_path)
model = train_model(X_train, y_train, args.n_estimators)
# Evaluate
accuracy = accuracy_score(y_test, model.predict(X_test))
print(f"Accuracy: {accuracy:.3f}")
# Save
joblib.dump(model, 'model.pkl')
if __name__ == '__main__':
main()Keep it simple. Add complexity only when needed.
Step 3: Containerize Everything
Docker ensures your code runs the same everywhere. No more "but it worked on my machine."
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy code
COPY . .
# Run training
CMD ["python", "train.py", "--data-path", "/data/train.csv"]Build and test locally:
docker build -t ml-training:latest .
docker run -v $(pwd)/data:/data ml-training:latestStep 4: Track Experiments with MLflow
You need to remember what you tried. MLflow makes this easy.
import mlflow
# Start tracking
mlflow.set_experiment("my-model-training")
with mlflow.start_run():
# Log parameters
mlflow.log_param("n_estimators", args.n_estimators)
mlflow.log_param("data_path", args.data_path)
# Train model
model = train_model(X_train, y_train, args.n_estimators)
# Log metrics
accuracy = accuracy_score(y_test, model.predict(X_test))
mlflow.log_metric("accuracy", accuracy)
# Log model
mlflow.sklearn.log_model(model, "model")Run MLflow UI to see all experiments:
mlflow uiNow you can compare runs, see what worked, and pick the best model.
Step 5: Automate with CI/CD
Manual work is error-prone. Automate the boring stuff.
Create .github/workflows/train.yml:
name: Train Model
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * *' # Train daily at 2 AM
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: docker build -t ml-training:${{ github.sha }} .
- name: Run training
run: |
docker run \
-e MLFLOW_TRACKING_URI=${{ secrets.MLFLOW_URI }} \
ml-training:${{ github.sha }}
- name: Check if model improved
run: python scripts/check_metrics.pyNow every code push triggers training. No manual work needed.
Step 6: Deploy on Kubernetes
Kubernetes handles scaling and reliability for you.
Create a simple deployment:
# k8s/deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-server
spec:
replicas: 3
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: model-server
image: ml-model:latest
ports:
- containerPort: 8000
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"Deploy it:
kubectl apply -f k8s/deployment.yml
kubectl expose deployment ml-model-server --type=LoadBalancer --port=80Kubernetes now manages your model servers, restarts them if they crash, and scales them based on load.
Step 7: Serve Models with FastAPI
Create a simple API to serve predictions:
# serve.py
from fastapi import FastAPI
import mlflow.sklearn
import pandas as pd
app = FastAPI()
# Load model on startup
model = mlflow.sklearn.load_model("models:/my-model/production")
@app.get("/health")
def health():
return {"status": "healthy"}
@app.post("/predict")
def predict(features: dict):
# Convert to DataFrame
df = pd.DataFrame([features])
# Make prediction
prediction = model.predict(df)[0]
probability = model.predict_proba(df)[0]
return {
"prediction": int(prediction),
"probability": float(max(probability))
}Run it:
uvicorn serve:app --host 0.0.0.0 --port 8000Test it:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"feature1": 1.0, "feature2": 2.0}'Step 8: Monitor What Matters
You need to know what's happening in production. Start simple:
# Add to serve.py
from prometheus_client import Counter, Histogram
import time
# Define metrics
prediction_counter = Counter('predictions_total', 'Total predictions')
prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')
@app.post("/predict")
def predict(features: dict):
start_time = time.time()
# Make prediction
prediction = model.predict(df)[0]
# Track metrics
prediction_counter.inc()
prediction_latency.observe(time.time() - start_time)
return {"prediction": int(prediction)}Deploy Prometheus and Grafana to visualize metrics. Set up alerts for:
- Prediction latency > 500ms
- Error rate > 1%
- Request volume drops
Step 9: Handle Model Updates
When you train a better model, deploy it safely:
# scripts/deploy_model.py
import mlflow
# Get latest model from MLflow
client = mlflow.tracking.MlflowClient()
latest_version = client.get_latest_versions("my-model", stages=["None"])[0]
# Check if it's better than production
prod_version = client.get_latest_versions("my-model", stages=["Production"])[0]
if latest_version.metrics['accuracy'] > prod_version.metrics['accuracy']:
# Promote to production
client.transition_model_version_stage(
name="my-model",
version=latest_version.version,
stage="Production"
)
print(f"Deployed version {latest_version.version}")
else:
print("New model not better, keeping current")Add rollback capability:
# If something breaks, rollback
kubectl rollout undo deployment/ml-model-serverReal-World Results
Here's what this pipeline delivered:
Before:
- Manual deployments: 3-5 days
- No experiment tracking
- Can't reproduce old models
- Deployments break frequently
After:
- Automated deployments: 4 hours
- All experiments logged in MLflow
- Any model reproducible from version
- Rollback in seconds if issues occur
Key Wins:
- Data scientists focus on models, not DevOps
- Deploy confidently with automated testing
- Easy to onboard new team members
- Clear visibility into what's running in production
Lessons Learned
What Actually Matters
Start here:
- Containerize your code - solves 80% of environment issues
- Track experiments - you'll thank yourself later
- Automate deployment - manual is too slow and error-prone
- Basic monitoring - know when things break
Add later:
- Advanced orchestration
- Feature stores
- Real-time pipelines
- Multi-model serving
Common Pitfalls to Avoid
Don't over-engineer:
- Start simple, add complexity when needed
- You don't need Kubernetes on day one
- Simple Python scripts beat complex frameworks initially
Don't skip monitoring:
- Add it early, not as an afterthought
- Start with basic metrics (latency, errors)
- Alerts should wake you only when critical
Don't forget reproducibility:
- Pin dependency versions
- Version your data
- Log everything in MLflow
Tech Choices That Worked
Python + FastAPI:
- Easy to write and maintain
- Great performance for ML serving
- Huge ecosystem
Docker + Kubernetes:
- Standard tools everyone knows
- Scales well when needed
- Good local development story
MLflow:
- Open source, no lock-in
- Simple to get started
- Powerful enough to grow with you
GitHub Actions:
- Free for public repos
- Easy YAML configuration
- Good integration with everything
What's Next
This pipeline is a foundation, not the finish line. Here's what I'm adding:
Current Focus:
- A/B testing framework for comparing models
- Automated hyperparameter tuning
- Better data validation
- Cost tracking and optimization
Future Plans:
- Feature store for reusable features
- Real-time training for streaming data
- Model explanability dashboard
- Edge deployment for low-latency use cases
The Bottom Line
Building an MLOps pipeline doesn't require months of work or a huge team. Start with these core pieces:
- Reproducible training (Docker + version control)
- Experiment tracking (MLflow)
- Automated deployment (CI/CD + containers)
- Basic monitoring (metrics + alerts)
Get these working, then iterate. Don't try to build the perfect system upfront. Build something that works, ship it, and improve based on real usage.
The goal isn't perfection. The goal is to spend less time on infrastructure and more time improving your models.
Quick Start Checklist
Want to build your own? Here's the path:
- Write training script with arguments
- Create Dockerfile for training
- Add MLflow tracking to training script
- Set up CI/CD to run on code push
- Create simple FastAPI serving endpoint
- Deploy to cloud (start with single server)
- Add basic metrics (latency, error rate)
- Set up one alert (high error rate)
- Document for your team
Do these nine things and you'll have a working pipeline. Everything else can wait.