Phase 6: MLOps & Model Deployment

Building a high-performing model in a Jupyter Notebook is only 20% of the battle. The remaining 80% is securely deploying it to production, making it accessible to users, and ensuring it continues to perform well as real-world data changes over time. This discipline is known as MLOps (Machine Learning Operations).

1. Model Tracking and Registration

During the experimental phase, data scientists test hundreds of combinations of algorithms, hyperparameters, and datasets. Keeping track of this manually in a spreadsheet is disastrous.

Key Tools: MLflow, Weights & Biases (W&B) These tools automatically log the parameters, code versions, metrics, and output files of every run. They also serve as a "Model Registry" (like a GitHub for models) to version control your final .pkl or .h5 files.

Example 1: Tracking Experiments with MLflow

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

def train_and_log_model(n_estimators: int, max_depth: int):
    """
    Trains a Random Forest and automatically logs hyperparams and metrics to MLflow.
    """
    # 1. Load Data
    data = load_iris()
    X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=42)

    # 2. Start MLflow Run
    # This creates a tracking server locally (or connects to a remote one)
    with mlflow.start_run(run_name="RandomForest_Iris"):

        # Log parameters explicitly
        mlflow.log_param("n_estimators", n_estimators)
        mlflow.log_param("max_depth", max_depth)

        # 3. Train
        model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
        model.fit(X_train, y_train)

        # 4. Evaluate
        preds = model.predict(X_test)
        acc = accuracy_score(y_test, preds)

        # Log metric explicitly
        mlflow.log_metric("accuracy", acc)

        # 5. Log the actual model file!
        mlflow.sklearn.log_model(model, "random_forest_model")

        print(f"Run completed. Accuracy: {acc}")

# Example Usage:
# Run `mlflow ui` in your terminal to view the dashboard!
# train_and_log_model(n_estimators=50, max_depth=3)
# train_and_log_model(n_estimators=100, max_depth=5)

2. API Serving

Once a model is registered, it needs to be exposed so applications (web apps, mobile apps) can send it data and receive predictions.

Key Tool: FastAPI FastAPI is the modern Python standard for this. It is highly performant (asynchronous) and automatically generates interactive API documentation (Swagger UI).

Example 2: Creating a Real-Time Inference API with FastAPI

# Save this in a file named `main.py`
# Run with: uvicorn main:app --host 0.0.0.0 --port 8000
import joblib
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, conlist
import numpy as np
import logging

# Setup Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize App
app = FastAPI(title="Iris Species Predictor API", version="1.0")

# Load model ONCE during startup, not on every request!
# Assuming we saved a model earlier: joblib.dump(model, 'model.pkl')
try:
    model = joblib.load("model.pkl")
    logger.info("Model loaded successfully into memory.")
except FileNotFoundError:
    model = None
    logger.warning("Mock mode: No model.pkl found.")

# Define the expected Input Data schema using Pydantic for automatic validation
class IrisFeatures(BaseModel):
    # Expecting exactly 4 floats
    features: conlist(float, min_length=4, max_length=4)

@app.post("/predict")
async def predict(data: IrisFeatures):
    if model is None:
        # If running without the pkl for testing, return a mock response
        return {"prediction": 0, "status": "mocked"}

    try:
        # Reshape data for sklearn (requires 2D array)
        input_array = np.array(data.features).reshape(1, -1)

        # Inference
        prediction = model.predict(input_array)[0]

        logger.info(f"Successful inference: Input {data.features}, Predicted {prediction}")
        return {
            "prediction": int(prediction),
            "model_version": "v1.0"
        }
    except Exception as e:
        logger.error(f"Inference error: {str(e)}")
        raise HTTPException(status_code=500, detail="Internal Model Error")

@app.get("/health")
def health_check():
    return {"status": "healthy"}

3. Containerization (Docker)

Works on my machine ≠ Works in production. Different Python versions or C++ library dependencies can break ML models. Docker packages the OS, code, model weights, and dependencies into an isolated container.

Example 3: Dockerizing the FastAPI App

This is the Dockerfile you would place in the same directory as main.py and model.pkl.

# 1. Use an official lightweight Python runtime as a parent image
FROM python:3.10-slim

# 2. Set the working directory in the container
WORKDIR /app

# 3. Copy only the requirements first, to leverage Docker cache
COPY requirements.txt .

# 4. Install dependencies (e.g., fastapi, uvicorn, scikit-learn, joblib)
# --no-cache-dir keeps the image size small
RUN pip install --no-cache-dir -r requirements.txt

# 5. Copy the rest of the application code and model artifacts
COPY main.py .
COPY model.pkl .

# 6. Expose the port the app runs on
EXPOSE 8000

# 7. Command to run the application using Uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

To run this locally: 1. docker build -t iris-predictor:v1 . 2. docker run -p 8000:8000 iris-predictor:v1

4. Monitoring & Drift

Once deployed, ML models degrade over time because the real world changes. - Data Drift: The input data distribution changes (e.g., a new demographic starts using your app, inputting ages and incomes the model has never seen). - Concept Drift: The mapping between inputs and outputs changes (e.g., historical factors that predicted inflation in 2018 no longer predict it securely in 2024).

Solution: Continually log incoming requests and their predictions. Periodically run statistical tests (like the Kolmogorov-Smirnov test) comparing the production data distribution against your original training data distribution. If drift is detected, trigger an automated pipeline to retrain the model on fresh data.