Model Evaluation is the process of measuring how well a Machine Learning model performs on unseen data.

After training a model, we must check its performance to ensure it generalizes well and does not just memorize training data.

Why Model Evaluation is Important

Model evaluation helps:

Measure accuracy
Detect overfitting and underfitting
Compare different models
Improve model performance
Ensure reliability before deployment

Without evaluation, we cannot trust predictions.

Training vs Testing Data

Data is usually divided into:

Training Set → Used to train the model
Testing Set → Used to evaluate performance

This ensures the model is tested on unseen data.

Common split:

70–80% Training
20–30% Testing

Evaluation for Regression Models

Regression models predict continuous values.

1. Mean Absolute Error (MAE)

Average of absolute differences between actual and predicted values.

Lower MAE means better model.

2. Mean Squared Error (MSE)

Average of squared differences.

Penalizes larger errors more than MAE.

3. Root Mean Squared Error (RMSE)

Square root of MSE.

Easier to interpret because it is in original units.

4. R-Squared (R²)

Measures how well the model explains variance in data.

Value range:

0 to 1

Closer to 1 means better fit.

Evaluation for Classification Models

Classification models predict categories.

1. Accuracy

Percentage of correct predictions.

Accuracy = Correct Predictions / Total Predictions

Best used when data is balanced.

2. Confusion Matrix

Shows:

True Positive (TP)
True Negative (TN)
False Positive (FP)
False Negative (FN)

Helps analyze model errors.

3. Precision

Precision = TP / (TP + FP)

Measures how many predicted positives are correct.

Important when false positives are costly.

4. Recall

Recall = TP / (TP + FN)

Measures how many actual positives are identified.

Important when false negatives are costly.

5. F1-Score

Harmonic mean of precision and recall.

Useful when dataset is imbalanced.

Cross-Validation

Instead of using one train-test split, cross-validation divides data into multiple folds.

Example:

5-Fold Cross Validation:

Data split into 5 parts
Train on 4 parts
Test on 1 part
Repeat 5 times

Final score is average of all folds.

Benefits:

More reliable evaluation
Reduces bias
Better performance estimation

Overfitting and Underfitting

Overfitting:

Model performs well on training data
Performs poorly on testing data

Underfitting:

Model performs poorly on both training and testing data

Good model:

Performs well on both

Example Using Scikit-Learn

Regression evaluation:

from sklearn.metrics import mean_squared_error, r2_scoremse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)print("MSE:", mse)
print("R2 Score:", r2)

Classification evaluation:

from sklearn.metrics import accuracy_score, classification_reportprint("Accuracy:", accuracy_score(y_test, predictions))
print(classification_report(y_test, predictions))

Choosing the Right Metric

Use MAE, MSE, RMSE for regression
Use Accuracy for balanced classification
Use Precision/Recall for imbalanced data
Use F1-score when both precision and recall matter

Key Takeaway

Model Evaluation measures how well a Machine Learning model performs on unseen data.

Using proper metrics ensures the model is accurate, reliable, and ready for real-world applications.

Home » PYTHON FOR AI AND LLM (PYAI) > Scikit-Learn > Model Evaluation

Free Video Tutorial

Want Mentorship on this Training?

Book a 1-on-1 Consultancy Session

Model Evaluation