Mutate and Summarize Functions

The mutate() and summarise() functions in R, provided by the dplyr package, are essential for creating new variables, transforming data, and generating summary statistics. Mastering these functions allows you to manipulate datasets efficiently and extract meaningful insights.

1. The mutate() Function

mutate() is used to create new columns or modify existing ones in a data frame or tibble.

Creating New Columns

library(dplyr)data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Score = c(90, 85, 88),
Age = c(25, 30, 28)
)# Add a new column for Score plus bonus points
data <- mutate(data, ScoreBonus = Score + 5)# Add a logical column to indicate if Score is passing (>= 88)
data <- mutate(data, Passed = Score >= 88)

Modifying Existing Columns

You can also update existing columns:

data <- mutate(data, Age = Age + 1)  # Increase all ages by 1

2. The summarise() Function

summarise() is used to calculate summary statistics for a dataset. It is often combined with group_by() for grouped summaries.

Basic Summary Statistics

# Calculate average score and maximum age
summarise(data, AverageScore = mean(Score), MaxAge = max(Age))

Grouped Summaries with group_by()

data <- mutate(data, Passed = Score >= 88)# Group by Passed status and calculate average age and score
data %>%
group_by(Passed) %>%
summarise(
AverageAge = mean(Age),
AverageScore = mean(Score)
)

This calculates the average age and score for students who passed and those who didn’t.

3. Using mutate() and summarise() Together

You can combine these functions in a pipeline to create new variables and then summarize them:

data %>%
mutate(ScoreAdjusted = Score + 2) %>%
summarise(AverageAdjusted = mean(ScoreAdjusted))

4. Advantages of mutate() and summarise()

  • Simplifies creation of new columns and transformations
  • Provides powerful tools for grouped statistical analysis
  • Works seamlessly with the pipe operator %>% for readable and efficient workflows
  • Essential for preparing datasets for visualization or modeling

Conclusion

mutate() and summarise() are core functions in R for transforming and analyzing data. mutate() allows you to add or modify variables, while summarise() helps generate key statistics and insights. Mastering these functions makes your data analysis workflow more efficient, organized, and insightful.

Home » R Programming (R Lang) > Data Manipulation in R > Mutate and Summarize Functions