The mutate() and summarise() functions in R, provided by the dplyr package, are essential for creating new variables, transforming data, and generating summary statistics. Mastering these functions allows you to manipulate datasets efficiently and extract meaningful insights.
1. The mutate() Function
mutate() is used to create new columns or modify existing ones in a data frame or tibble.
Creating New Columns
library(dplyr)data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Score = c(90, 85, 88),
Age = c(25, 30, 28)
)# Add a new column for Score plus bonus points
data <- mutate(data, ScoreBonus = Score + 5)# Add a logical column to indicate if Score is passing (>= 88)
data <- mutate(data, Passed = Score >= 88)
Modifying Existing Columns
You can also update existing columns:
data <- mutate(data, Age = Age + 1) # Increase all ages by 1
2. The summarise() Function
summarise() is used to calculate summary statistics for a dataset. It is often combined with group_by() for grouped summaries.
Basic Summary Statistics
# Calculate average score and maximum age
summarise(data, AverageScore = mean(Score), MaxAge = max(Age))
Grouped Summaries with group_by()
data <- mutate(data, Passed = Score >= 88)# Group by Passed status and calculate average age and score
data %>%
group_by(Passed) %>%
summarise(
AverageAge = mean(Age),
AverageScore = mean(Score)
)
This calculates the average age and score for students who passed and those who didn’t.
3. Using mutate() and summarise() Together
You can combine these functions in a pipeline to create new variables and then summarize them:
data %>%
mutate(ScoreAdjusted = Score + 2) %>%
summarise(AverageAdjusted = mean(ScoreAdjusted))
4. Advantages of mutate() and summarise()
- Simplifies creation of new columns and transformations
- Provides powerful tools for grouped statistical analysis
- Works seamlessly with the pipe operator
%>%for readable and efficient workflows - Essential for preparing datasets for visualization or modeling
Conclusion
mutate() and summarise() are core functions in R for transforming and analyzing data. mutate() allows you to add or modify variables, while summarise() helps generate key statistics and insights. Mastering these functions makes your data analysis workflow more efficient, organized, and insightful.