Working with tidyr for Data Tidying

tidyr is a powerful R package designed to help you clean, reshape, and organize messy data into a tidy format. Tidy data makes analysis, visualization, and modeling easier and more efficient. Each variable should have its own column, each observation should have its own row, and each value should occupy its own cell.

1. Installing and Loading tidyr

Before using tidyr, install and load the package:

install.packages("tidyr")  # Install tidyr
library(tidyr) # Load tidyr

2. Key tidyr Functions

a) gather() / pivot_longer()

pivot_longer() (modern replacement for gather()) converts wide-format data into long-format data. This is useful when columns represent values instead of variables.

data <- data.frame(
Name = c("Alice", "Bob"),
Math = c(90, 85),
Science = c(88, 92)
)# Convert wide data to long format
long_data <- pivot_longer(data, cols = c(Math, Science),
names_to = "Subject",
values_to = "Score")

Result: Each row represents one subject score per student.

b) spread() / pivot_wider()

pivot_wider() (replacement for spread()) converts long-format data into wide-format data.

wide_data <- pivot_wider(long_data, names_from = Subject, values_from = Score)

Result: Each subject becomes a separate column again.

c) separate()

separate() splits a single column into multiple columns based on a separator.

data <- data.frame(ID = c("A-01", "B-02"))
separate(data, col = ID, into = c("Letter", "Number"), sep = "-")

d) unite()

unite() combines multiple columns into one.

data <- data.frame(Letter = c("A","B"), Number = c("01","02"))
unite(data, col = "ID", Letter, Number, sep = "-")

e) drop_na() and fill()

  • drop_na() removes rows with missing values
  • fill() fills missing values with the previous or next non-missing value
data <- data.frame(Name = c("Alice","Bob","Charlie"), Score = c(90, NA, 88))
drop_na(data) # Removes row with NA
fill(data, Score) # Fills NA with previous value

3. Advantages of Using tidyr

  • Converts messy datasets into tidy format for easier analysis
  • Works seamlessly with dplyr and the tidyverse
  • Handles missing data and reshaping tasks efficiently
  • Simplifies data preparation for visualization and modeling

Conclusion

tidyr is essential for cleaning and reshaping data in R. By mastering functions like pivot_longer(), pivot_wider(), separate(), unite(), and handling missing values, you can prepare datasets in a tidy, consistent format. This allows for faster, more accurate, and more effective data analysis.

Home ยป R Programming (R Lang) > Data Manipulation in R > Working with tidyr for Data Tidying