GroupBy and Aggregation are powerful features in pandas used to summarize and analyze data by categories.
They help you answer questions like:
What is the total sales per department?
What is the average salary by job role?
How many employees are in each city?
First, import pandas:
import pandas as pd
Create a sample dataset:
data = {
"Department": ["IT", "HR", "IT", "Finance", "HR", "Finance"],
"Employee": ["Ali", "Sara", "Ahmed", "John", "Ayesha", "David"],
"Salary": [60000, 50000, 65000, 70000, 52000, 75000]
}df = pd.DataFrame(data)
What is GroupBy?
GroupBy splits the data into groups based on one or more columns.
Syntax:
df.groupby("ColumnName")
Example:
df.groupby("Department")
This groups data by Department.
Aggregation Functions
Aggregation means performing calculations on grouped data.
Common aggregation functions:
sum()
mean()
count()
min()
max()
std()
Example 1: Total Salary by Department
df.groupby("Department")["Salary"].sum()
Example 2: Average Salary by Department
df.groupby("Department")["Salary"].mean()
Example 3: Count Employees per Department
df.groupby("Department")["Employee"].count()
Multiple Aggregations
You can apply multiple functions at once:
df.groupby("Department")["Salary"].agg(["sum", "mean", "max"])
GroupBy with Multiple Columns
df.groupby(["Department", "Employee"])["Salary"].sum()
Resetting Index
Grouped results often create a new index.
To convert it back to a normal DataFrame:
df.groupby("Department")["Salary"].sum().reset_index()
Practical Example in Data Analytics
GroupBy is used to:
Analyze sales by region
Calculate revenue by product
Find average marks by class
Summarize expenses by category
Measure performance by department
Key Takeaway
GroupBy splits data into categories, and Aggregation performs calculations on those groups.
Together, they are essential tools for summarizing and analyzing structured data in real-world analytics projects.