Hypothesis testing is a fundamental statistical method used to make inferences about populations based on sample data. In R, you can perform various tests to check whether observed data support a specific assumption or hypothesis.
1. Key Concepts in Hypothesis Testing
- Null Hypothesis (H₀): The default assumption, e.g., no difference or effect.
- Alternative Hypothesis (H₁): The assumption that contradicts H₀, e.g., there is a difference or effect.
- Significance Level (α): The probability threshold to reject H₀, commonly 0.05.
- p-value: Probability of observing data as extreme as the sample, assuming H₀ is true.
- Decision: Reject H₀ if p-value < α; otherwise, fail to reject H₀.
2. One-Sample t-Test
Used to compare the sample mean against a known population mean.
# Sample data
scores <- c(90, 85, 88, 92, 80)# Test if mean equals 85
t.test(scores, mu = 85)
Output:
- t-statistic, degrees of freedom, p-value, confidence interval, and sample mean.
3. Two-Sample t-Test
Used to compare the means of two independent samples.
group1 <- c(90, 85, 88)
group2 <- c(80, 82, 84)# Test if means are equal
t.test(group1, group2, var.equal = TRUE)
4. Paired t-Test
Used when samples are related, e.g., before-and-after measurements.
before <- c(100, 102, 98, 95)
after <- c(105, 100, 97, 96)t.test(before, after, paired = TRUE)
5. Chi-Square Test
Used for categorical data to test independence or goodness-of-fit.
a) Test for Independence
data <- matrix(c(30, 10, 20, 40), nrow=2)
chisq.test(data)
b) Test for Goodness-of-Fit
observed <- c(50, 30, 20)
expected <- c(40, 40, 20)
chisq.test(x = observed, p = expected/sum(expected))
6. ANOVA (Analysis of Variance)
Used to compare means across more than two groups.
group <- factor(c("A","A","B","B","C","C"))
score <- c(90, 85, 88, 82, 95, 89)anova_result <- aov(score ~ group)
summary(anova_result)
7. Non-Parametric Tests
- Wilcoxon Test: Alternative to t-test when data is not normally distributed.
- Kruskal-Wallis Test: Alternative to ANOVA for non-normal data.
wilcox.test(group1, group2)
kruskal.test(score ~ group)
8. Advantages of Hypothesis Testing
- Supports data-driven decision making
- Evaluates assumptions about populations
- Identifies statistically significant differences
- Forms the basis for inferential statistics
Conclusion
Hypothesis testing in R allows you to make informed decisions about your data. By mastering tests like t-tests, chi-square tests, ANOVA, and non-parametric alternatives, you can evaluate assumptions, compare groups, and draw reliable conclusions from sample data. These techniques are essential for rigorous data analysis and scientific research.