Linear Regression Model

Linear regression is one of the most widely used statistical methods for predicting a continuous dependent variable based on one or more independent variables. R provides a simple and efficient way to build, evaluate, and visualize linear regression models.

1. What is Linear Regression?

Linear regression models the relationship between a dependent variable YYY and one or more independent variables XXX by fitting a linear equation:Y=β0+β1X1+β2X2++ϵY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \epsilonY=β0​+β1​X1​+β2​X2​+⋯+ϵ

  • β0\beta_0β0​ = intercept
  • β1,β2,\beta_1, \beta_2, …β1​,β2​,… = coefficients for predictors
  • ϵ\epsilonϵ = error term

Types of Linear Regression:

  • Simple Linear Regression: One predictor variable
  • Multiple Linear Regression: Two or more predictor variables

2. Load and Inspect Data

# Load dataset
data <- read.csv("customer_purchases.csv")# Inspect data
head(data)
str(data)
summary(data)

3. Simple Linear Regression

Predicting PurchaseAmt based on Age:

# Fit model
model_simple <- lm(PurchaseAmt ~ Age, data = data)# Model summary
summary(model_simple)

Key Outputs:

  • Coefficients: Effect of Age on PurchaseAmt
  • R-squared: Proportion of variance explained by the model
  • p-value: Significance of predictors

4. Multiple Linear Regression

Predicting PurchaseAmt based on Age, Gender, and ProductCategory:

# Fit model
model_multiple <- lm(PurchaseAmt ~ Age + Gender + ProductCategory, data = data)# Summary
summary(model_multiple)

5. Making Predictions

# Predict purchase amounts
predictions <- predict(model_multiple, newdata = data)# View predicted vs actual
head(data.frame(Actual = data$PurchaseAmt, Predicted = predictions))

6. Evaluating Model Performance

Common metrics for regression models:

# Root Mean Squared Error (RMSE)
rmse <- sqrt(mean((predictions - data$PurchaseAmt)^2))
rmse# Mean Absolute Error (MAE)
mae <- mean(abs(predictions - data$PurchaseAmt))
mae

7. Visualizing Regression Results

a) Simple Linear Regression Plot

plot(data$Age, data$PurchaseAmt, main="Age vs Purchase Amount", xlab="Age", ylab="Purchase Amount", pch=19, col="blue")
abline(model_simple, col="red", lwd=2)

b) Multiple Regression with ggplot2

library(ggplot2)
ggplot(data, aes(x=Age, y=PurchaseAmt, color=Gender)) +
geom_point(size=3) +
geom_smooth(method="lm", se=FALSE) +
ggtitle("Multiple Linear Regression: Age & Gender vs Purchase Amount")

8. Assumptions of Linear Regression

  • Linearity: Relationship between predictors and outcome is linear
  • Independence: Observations are independent
  • Homoscedasticity: Constant variance of residuals
  • Normality: Residuals are normally distributed
  • No multicollinearity: Predictors are not highly correlated

Check assumptions using diagnostic plots:

par(mfrow = c(2, 2))
plot(model_multiple)
par(mfrow = c(1, 1))

9. Advantages of Linear Regression

  • Simple and interpretable
  • Provides insight into relationships between variables
  • Basis for more advanced predictive models
  • Easy to implement and visualize in R

Conclusion

Linear regression in R is a fundamental technique for modeling and predicting continuous outcomes. By fitting simple or multiple regression models, making predictions, evaluating performance, and checking assumptions, you can extract actionable insights and understand relationships in your data. Mastering linear regression is a crucial step in data analysis and machine learning workflows.

Home » R Programming (R Lang) > R for Data Science & Machine Learning > Linear Regression Model