Core ML: Regression
Regression algorithms are supervised learning models used to predict continuous numerical values. Examples include predicting the price of a house, the temperature for tomorrow, or the future stock price of a company.
1. Linear Regression
The most fundamental algorithm. It attempts to draw a straight line (or an N-dimensional hyperplane) through the data points in a way that minimizes the Sum of Squared Errors (SSE). - Pros: Highly interpretable (you get a mathematical equation: $Y = 5x_1 + 3x_2 + 10$), fast to train. - Cons: Assumes the relationship between features and the target is strictly linear. Very sensitive to outliers.
2. Regularization (Ridge & Lasso)
Standard Linear Regression fails when you have too many features that are correlated with each other (Multicollinearity) or when you have more features than actual data rows. Regularization solves this by adding a "penalty" to the mathematical loss function, punishing the model for relying too heavily on any single feature. - Ridge (L2 Penalty): Shrinks the coefficients of less important features towards zero, making the model more robust. - Lasso (L1 Penalty): Can shrink coefficients to exactly zero, effectively performing automatic Feature Selection.
3. Evaluation Metrics
You cannot evaluate regression models with "Accuracy" (what is the accuracy of guessing $100,001 instead of $100,000?). - Mean Absolute Error (MAE): The average difference between predictions and reality. Simple to explain to business stakeholders. - Mean Squared Error (MSE): Punishes large errors exponentially. - R-Squared ($R^2$): Represents the percentage of variance in the target variable that the model explains. A score of 1.0 is perfect; 0.0 means the model is as good as just guessing the average.
How to execute the examples:
Go to the Examples/ folder and run the scripts using Python:
python Reg_Linear_Regression.py
python Reg_Lasso_Feature_Selection.py
python Reg_RandomForest_Regressor.py