⬡ Hub
Skip to content

Data Drift & Post-Deployment Monitoring

Deploying a Machine Learning model is not the finish line; it is the starting line. Unlike traditional software (which behaves exactly the same way exactly 10 years from now), Machine Learning models degrade silently over time.

1. What is Drift?

Drift occurs when the real-world data the model sees in production is no longer mathematically similar to the static data the model was trained on. - Concept Drift: The definition of the "target" changes. (e.g., Before COVID, a sudden spike in toilet paper purchases meant a business was stocking a large hotel. During COVID, it meant a normal family was panicking. The AI's logic is suddenly wrong). - Data (Covariate) Drift: The fundamental distribution of the incoming features changes. (e.g., You trained a credit risk model on a population with an average income of $60,000. Your marketing team runs a successful ad campaign targeting college students. Now, the average incoming income is $20,000. Your model will confidently reject everyone).

2. Statistical Monitoring

You must mathematically compare "Production Data" against "Training Data". Because Data Scientists cannot look at 10,000 rows of user data every day manually, MLOps engineers use mathematical formulas. - Kolmogorov-Smirnov (KS) Test: A statistical test that determines if two samples (Training Income vs Production Income) are drawn from the same underlying distribution. If the P-value drops below 0.05, an automated slack alert is mathematically triggered. - Population Stability Index (PSI): Measures the shift in the distribution of categorical variables over time.

3. The Retraining Loop

When Drift is mathematically detected, the CI/CD pipeline should automatically trigger a script to pull the latest 30 days of production data, re-train the XGBoost model, log it to MLflow, and notify a human data scientist to review the new metrics before clicking "Deploy".

How to execute the examples:

Go to the Examples/ folder and run the script: python MLOps_DataDrift.py