Classification Models (Logistic Regression, Decision Trees)

Classification models are used to predict categorical outcomes, such as whether a customer will buy a product (Yes/No) or which category an observation belongs to. In R, logistic regression and decision trees are two widely used classification techniques.

1. Logistic Regression

Logistic regression predicts a binary outcome using one or more predictors. The output is a probability between 0 and 1.

a) Load Data

# Load dataset
data <- read.csv("customer_purchases.csv")# Convert target variable to factor
data$HighValue <- factor(ifelse(data$PurchaseAmt > 1000, "Yes", "No"))# Inspect data
str(data)
table(data$HighValue)

b) Split Data into Training and Test Sets

library(caret)set.seed(123)
train_index <- createDataPartition(data$HighValue, p = 0.7, list = FALSE)
train_data <- data[train_index, ]
test_data <- data[-train_index, ]

c) Fit Logistic Regression Model

model_logistic <- glm(HighValue ~ Age + Gender + ProductCategory,
data = train_data, family = binomial)summary(model_logistic)

d) Make Predictions

# Predict probabilities
probabilities <- predict(model_logistic, test_data, type = "response")# Convert probabilities to class labels (threshold = 0.5)
predicted_classes <- ifelse(probabilities > 0.5, "Yes", "No")
predicted_classes <- factor(predicted_classes, levels = c("No", "Yes"))

e) Evaluate Model Performance

confusionMatrix(predicted_classes, test_data$HighValue)

2. Decision Trees

Decision trees split the data into branches to classify observations based on feature values.

a) Install and Load Package

install.packages("rpart")
install.packages("rpart.plot")library(rpart)
library(rpart.plot)

b) Fit Decision Tree Model

tree_model <- rpart(HighValue ~ Age + Gender + ProductCategory, 
data = train_data, method = "class")# Plot the tree
rpart.plot(tree_model, type=2, extra=104, fallen.leaves=TRUE)

c) Make Predictions

pred_tree <- predict(tree_model, test_data, type = "class")

d) Evaluate Model Performance

confusionMatrix(pred_tree, test_data$HighValue)

3. Comparison of Logistic Regression and Decision Trees

Feature: Logistic Regression vs Decision Tree

Type: Parametric, linear vs Non-parametric, non-linear
Output: Probability of class vs Class labels
Interpretability: Coefficients show effect vs Tree structure is visual
Handling Non-Linearity: Limited without transformations vs Handles non-linear relationships
Robustness: Sensitive to outliers vs Less sensitive to outliers

4. Advantages of Classification Models in R

  • Predict categorical outcomes from complex data
  • Identify important features that affect the target variable
  • Visualize decision-making with trees or probability outputs
  • Integrate easily with other R packages for evaluation and reporting

5. Conclusion

Logistic regression and decision trees are essential classification techniques in R. Logistic regression is ideal for binary outcomes with linear relationships, while decision trees handle complex, non-linear interactions and provide intuitive visual explanations. Mastering both methods allows you to build predictive models, evaluate performance, and make data-driven decisions effectively.

Home » R Programming (R Lang) > R for Data Science & Machine Learning > Classification Models (Logistic Regression, Decision Trees)