Descriptive Statistics

Descriptive statistics summarize and describe the main features of a dataset. They provide insights into the distribution, central tendency, variability, and overall patterns of data. R offers built-in functions and packages to calculate descriptive statistics efficiently.

1. Measures of Central Tendency

Central tendency indicates the “center” of a dataset. Common measures include mean, median, and mode.

a) Mean

scores <- c(90, 85, 88, 92, 75, 80, 95)
mean(scores) # Calculates average score

b) Median

median(scores)  # Middle value when data is sorted

c) Mode

R doesn’t have a built-in mode function, but you can define one:

get_mode <- function(x) {
uniq <- unique(x)
uniq[which.max(tabulate(match(x, uniq)))]
}get_mode(scores)

2. Measures of Dispersion

Dispersion indicates how spread out the data is.

a) Range

range(scores)  # Minimum and maximum values
diff(range(scores)) # Range difference

b) Variance

var(scores)  # Measures variability

c) Standard Deviation

sd(scores)  # Square root of variance

d) Quantiles

quantile(scores)        # Default quartiles
quantile(scores, probs = c(0.25, 0.5, 0.75)) # Specific quartiles

3. Summary Function

summary() provides a quick overview of numeric data including min, 1st quartile, median, mean, 3rd quartile, and max.

summary(scores)

4. Frequency Tables

For categorical data, frequency counts and proportions are useful.

grades <- c("A", "B", "A", "C", "B", "A")
table(grades) # Counts of each category
prop.table(table(grades)) # Proportion of each category

5. Using dplyr for Descriptive Statistics

dplyr can summarize data easily:

library(dplyr)data <- data.frame(
Name = c("Alice","Bob","Charlie","David"),
Score = c(90, 85, 88, 92)
)data %>%
summarise(
Average = mean(Score),
Maximum = max(Score),
Minimum = min(Score),
SD = sd(Score)
)

6. Advantages of Descriptive Statistics

  • Provides quick insights into the dataset
  • Helps detect patterns, trends, and anomalies
  • Forms the basis for inferential statistics
  • Simplifies decision-making based on data summaries

Conclusion

Descriptive statistics are the foundation of data analysis in R. By calculating measures of central tendency, dispersion, quantiles, and frequencies, you can summarize large datasets effectively. Tools like summary(), table(), and dplyr::summarise() make it easier to explore and understand data before further analysis.

Home » R Programming (R Lang) > Statistics with R > Descriptive Statistics