Correlation and regression are key statistical techniques used to analyze relationships between variables. Correlation measures the strength and direction of a relationship, while regression allows prediction of one variable based on another.
1. Correlation Analysis
Correlation quantifies the linear relationship between two continuous variables.
a) Pearson Correlation
Measures linear correlation between two variables. Values range from -1 to 1.
# Sample data
x <- c(10, 20, 30, 40, 50)
y <- c(15, 25, 35, 45, 55)# Pearson correlation
cor(x, y, method = "pearson")
b) Spearman Correlation
Non-parametric correlation used for ranked or non-normal data.
cor(x, y, method = "spearman")
c) Correlation Matrix
For multiple variables, use cor() on a data frame:
data <- data.frame(
height = c(150, 160, 170, 180),
weight = c(50, 60, 65, 75),
age = c(25, 30, 35, 40)
)cor(data)
2. Simple Linear Regression
Linear regression predicts the value of a dependent variable (Y) based on an independent variable (X).
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)# Fit regression model
model <- lm(y ~ x)
summary(model)
Output includes:
- Coefficients (intercept and slope)
- R-squared value (goodness of fit)
- p-values for significance of coefficients
Plotting Regression Line
plot(x, y, main = "Simple Linear Regression", xlab = "X", ylab = "Y")
abline(model, col = "red", lwd = 2)
3. Multiple Linear Regression
Predict a dependent variable using multiple independent variables.
data <- data.frame(
Y = c(50, 60, 65, 80, 90),
X1 = c(1, 2, 3, 4, 5),
X2 = c(2, 3, 4, 5, 6)
)model_multi <- lm(Y ~ X1 + X2, data = data)
summary(model_multi)
4. Checking Assumptions of Regression
- Linearity: Relationship between X and Y is linear
- Normality: Residuals should be normally distributed
- Homoscedasticity: Constant variance of residuals
- Independence: Observations should be independent
par(mfrow = c(2, 2))
plot(model) # Diagnostic plots
par(mfrow = c(1, 1))
5. Advantages of Correlation and Regression
- Understand relationships between variables
- Predict outcomes using regression models
- Quantify the strength and direction of associations
- Inform decision-making and strategy based on data
Conclusion
Correlation and regression analysis in R are essential tools for understanding and modeling relationships between variables. Correlation measures the strength and direction of relationships, while regression allows prediction and quantifies effects. Mastering these techniques enables deeper insights, data-driven predictions, and effective analysis of complex datasets.