Kubeflow: Model Serving with KFServing (KServe)

KFServing (now officially renamed to KServe and part of the Cloud Native Computing Foundation) is a serverless model serving platform built on Kubernetes. It provides a standardized and scalable way to deploy, monitor, and manage ML models in production environments. KServe abstracts away the complexities of Kubernetes, allowing data scientists and ML engineers to focus on their models.

1. Key Concepts in KServe

InferenceService: The core KServe Custom Resource Definition (CRD) that defines how an ML model should be deployed. It encapsulates the model's server (e.g., Triton, ONNX Runtime, a custom server), the model artifact, and other serving configurations.
Predictor: The main component of an InferenceService that runs the ML model and performs inference. KServe supports various pre-built model servers (TensorFlow, PyTorch, Scikit-learn, XGBoost, ONNX, etc.) or allows custom predictors.
Transformer: (Optional) A component that performs data transformation (e.g., preprocessing or post-processing) before/after model inference.
Explainer: (Optional) A component that provides model explanations (e.g., using AI Explainability methods like SHAP or LIME).
Canary Rollouts & A/B Testing: KServe leverages Istio for intelligent routing, enabling seamless canary deployments, A/B testing, and traffic splitting for models.
Autoscaling: Integrates with Knative for request-based autoscaling, scaling down to zero when idle.

2. Setting up KServe (Conceptual)

KServe is deployed on top of Kubernetes. A typical installation involves: 1. Kubernetes Cluster: A running Kubernetes cluster. 2. Istio: A service mesh for traffic management. 3. Knative Serving: A serverless platform for deploying and managing serverless workloads. 4. KServe: The KServe components themselves.

(Refer to the official KServe documentation for detailed installation instructions, as they can vary by Kubernetes environment and KServe version.)

3. Deploying a Model with KServe (`InferenceService`)

Deploying a model with KServe involves creating an InferenceService YAML manifest and applying it to your Kubernetes cluster.

Example: Deploying a Scikit-learn Model

Let's assume you have a trained Scikit-learn model (e.g., a LogisticRegression model saved as a joblib file) stored in a cloud storage bucket (e.g., S3, GCS, Azure Blob Storage).

# filename: sklearn-isvc.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris-classifier
spec:
  predictor:
    sklearn:
      # Path to your trained model artifact in a storage bucket
      # This example uses Google Cloud Storage (GCS)
      storageUri: gs://your-model-bucket/iris_classifier.joblib
      # For AWS S3: s3://your-model-bucket/iris_classifier.joblib
      # For Azure Blob Storage: azureblob://your-model-bucket/iris_classifier.joblib
      # For local path (if model is mounted): file:///mnt/models/iris_classifier.joblib
      protocolVersion: v2 # Recommended for newer KServe versions
    minReplicas: 1 # Ensure at least one replica is always running
    maxReplicas: 3 # Scale up to 3 replicas based on traffic

Steps to deploy (conceptual):

Save your model: ```python # Example Python code to save a Scikit-learn model import joblib from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris

iris = load_iris() X, y = iris.data, iris.target model = LogisticRegression(solver='liblinear') model.fit(X, y)

Save the model to a file

joblib.dump(model, 'iris_classifier.joblib')

Upload 'iris_classifier.joblib' to your configured storageUri

e.g., gsutil cp iris_classifier.joblib gs://your-model-bucket/

```
Apply the YAML: bash kubectl apply -f sklearn-isvc.yaml
Check status: bash kubectl get isvc sklearn-iris-classifier Wait until the STATUS shows Ready.
Get the inference URL: bash kubectl get inferenceservice sklearn-iris-classifier -o jsonpath='{.status.address.url}' This will give you the URL endpoint for your model.

Making an Inference Request (Conceptual)

KServe typically uses the V2 Inference Protocol.

# Example using curl (assuming your model URL is accessible)
MODEL_URL="http://sklearn-iris-classifier.default.example.com/v2/models/sklearn-iris-classifier/infer" # Replace with your actual URL

curl -v -H "Content-Type: application/json" -d '{
  "inputs": [
    {
      "name": "input-0",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [[5.1, 3.5, 1.4, 0.2]]
    }
  ]
}' $MODEL_URL

4. Deploying Other Model Types

KServe supports various model servers:

TensorFlow: yaml # ... spec: predictor: tensorflow: storageUri: gs://your-tf-model-bucket/saved_model_path/ protocolVersion: v2
PyTorch: yaml # ... spec: predictor: pytorch: storageUri: gs://your-pytorch-model-bucket/model.pt # model.pt or model.pth runtimeVersion: "1.9" # Specify PyTorch version protocolVersion: v2
XGBoost: yaml # ... spec: predictor: xgboost: storageUri: gs://your-xgboost-model-bucket/model.bst # model.bst or model.json protocolVersion: v2
Custom Predictor: For models not supported by built-in servers, you can provide a custom Docker image. yaml # ... spec: predictor: container: image: your-custom-model-server:latest # Your Docker image args: ["python", "app.py"] # Command to run your server env: - name: MODEL_NAME value: mymodel - name: STORAGE_URI value: gs://your-custom-model-bucket/model/

5. Advanced KServe Features

Traffic Splitting: Route a percentage of traffic to a new version of a model (canary deployment). yaml # ... spec: predictor: canaryTrafficPercent: 20 # Send 20% traffic to canary model # ... (define canary model spec)
Transformers: Preprocessing steps before sending to the predictor, or post-processing of results.
Explainers: Integrating model explainability tools.
Multi-Model Serving: Deploying multiple models on a single InferenceService instance (e.g., for very small models).

Further Topics:

Integrating KServe with Kubeflow Pipelines.
Monitoring deployed models.
Security considerations for model serving.
Performance optimization for inference.
KServe client SDK for programmatic deployment.

KServe (KFServing) is a powerful tool for MLOps, providing a robust, scalable, and standardized solution for deploying machine learning models in a cloud-native environment.

Kubeflow: Model Serving with KFServing (KServe)

1. Key Concepts in KServe

2. Setting up KServe (Conceptual)

3. Deploying a Model with KServe (InferenceService)

Example: Deploying a Scikit-learn Model

Save the model to a file

Upload 'iris_classifier.joblib' to your configured storageUri

e.g., gsutil cp iris_classifier.joblib gs://your-model-bucket/

Making an Inference Request (Conceptual)

4. Deploying Other Model Types

5. Advanced KServe Features

Further Topics:

3. Deploying a Model with KServe (`InferenceService`)