GroupBy and Aggregation

GroupBy and Aggregation are powerful features in pandas used to summarize and analyze data by categories.

They help you answer questions like:

What is the total sales per department?
What is the average salary by job role?
How many employees are in each city?

First, import pandas:

import pandas as pd

Create a sample dataset:

data = {
"Department": ["IT", "HR", "IT", "Finance", "HR", "Finance"],
"Employee": ["Ali", "Sara", "Ahmed", "John", "Ayesha", "David"],
"Salary": [60000, 50000, 65000, 70000, 52000, 75000]
}df = pd.DataFrame(data)

What is GroupBy?

GroupBy splits the data into groups based on one or more columns.

Syntax:

df.groupby("ColumnName")

Example:

df.groupby("Department")

This groups data by Department.

Aggregation Functions

Aggregation means performing calculations on grouped data.

Common aggregation functions:

sum()
mean()
count()
min()
max()
std()

Example 1: Total Salary by Department

df.groupby("Department")["Salary"].sum()

Example 2: Average Salary by Department

df.groupby("Department")["Salary"].mean()

Example 3: Count Employees per Department

df.groupby("Department")["Employee"].count()

Multiple Aggregations

You can apply multiple functions at once:

df.groupby("Department")["Salary"].agg(["sum", "mean", "max"])

GroupBy with Multiple Columns

df.groupby(["Department", "Employee"])["Salary"].sum()

Resetting Index

Grouped results often create a new index.
To convert it back to a normal DataFrame:

df.groupby("Department")["Salary"].sum().reset_index()

Practical Example in Data Analytics

GroupBy is used to:

Analyze sales by region
Calculate revenue by product
Find average marks by class
Summarize expenses by category
Measure performance by department

Key Takeaway

GroupBy splits data into categories, and Aggregation performs calculations on those groups.

Together, they are essential tools for summarizing and analyzing structured data in real-world analytics projects.

Home » PYTHON FOR DATA ANALYTICS (PYDA) > Pandas > GroupBy and Aggregation