Model Evaluation is the process of measuring how well a Machine Learning model performs on unseen data.
After training a model, we must check its performance to ensure it generalizes well and does not just memorize training data.
Why Model Evaluation is Important
Model evaluation helps:
Measure accuracy
Detect overfitting and underfitting
Compare different models
Improve model performance
Ensure reliability before deployment
Without evaluation, we cannot trust predictions.
Training vs Testing Data
Data is usually divided into:
Training Set → Used to train the model
Testing Set → Used to evaluate performance
This ensures the model is tested on unseen data.
Common split:
70–80% Training
20–30% Testing
Evaluation for Regression Models
Regression models predict continuous values.
1. Mean Absolute Error (MAE)
Average of absolute differences between actual and predicted values.
Lower MAE means better model.
2. Mean Squared Error (MSE)
Average of squared differences.
Penalizes larger errors more than MAE.
3. Root Mean Squared Error (RMSE)
Square root of MSE.
Easier to interpret because it is in original units.
4. R-Squared (R²)
Measures how well the model explains variance in data.
Value range:
0 to 1
Closer to 1 means better fit.
Evaluation for Classification Models
Classification models predict categories.
1. Accuracy
Percentage of correct predictions.
Accuracy = Correct Predictions / Total Predictions
Best used when data is balanced.
2. Confusion Matrix
Shows:
True Positive (TP)
True Negative (TN)
False Positive (FP)
False Negative (FN)
Helps analyze model errors.
3. Precision
Precision = TP / (TP + FP)
Measures how many predicted positives are correct.
Important when false positives are costly.
4. Recall
Recall = TP / (TP + FN)
Measures how many actual positives are identified.
Important when false negatives are costly.
5. F1-Score
Harmonic mean of precision and recall.
Useful when dataset is imbalanced.
Cross-Validation
Instead of using one train-test split, cross-validation divides data into multiple folds.
Example:
5-Fold Cross Validation:
Data split into 5 parts
Train on 4 parts
Test on 1 part
Repeat 5 times
Final score is average of all folds.
Benefits:
More reliable evaluation
Reduces bias
Better performance estimation
Overfitting and Underfitting
Overfitting:
- Model performs well on training data
- Performs poorly on testing data
Underfitting:
- Model performs poorly on both training and testing data
Good model:
- Performs well on both
Example Using Scikit-Learn
Regression evaluation:
from sklearn.metrics import mean_squared_error, r2_scoremse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)print("MSE:", mse)
print("R2 Score:", r2)
Classification evaluation:
from sklearn.metrics import accuracy_score, classification_reportprint("Accuracy:", accuracy_score(y_test, predictions))
print(classification_report(y_test, predictions))
Choosing the Right Metric
Use MAE, MSE, RMSE for regression
Use Accuracy for balanced classification
Use Precision/Recall for imbalanced data
Use F1-score when both precision and recall matter
Key Takeaway
Model Evaluation measures how well a Machine Learning model performs on unseen data.
Using proper metrics ensures the model is accurate, reliable, and ready for real-world applications.