Merging and Joining are used to combine multiple datasets into one.
In real-world Data Analytics projects, data is often stored in different tables, so combining them is an essential skill.
First, import pandas:
import pandas as pd
Why Merging is Important
You may have:
- One table with employee details
- Another table with salary information
- A third table with department data
To perform complete analysis, you must combine them.
Sample Data
Employee Table
employees = pd.DataFrame({
"Emp_ID": [1, 2, 3, 4],
"Name": ["Ali", "Sara", "Ahmed", "Ayesha"],
"Department": ["IT", "HR", "Finance", "IT"]
})
Salary Table
salaries = pd.DataFrame({
"Emp_ID": [1, 2, 3, 5],
"Salary": [60000, 50000, 70000, 45000]
})
1. Merge in Pandas
The merge() function is similar to SQL JOIN.
Syntax:
pd.merge(left_df, right_df, on="column_name", how="type")
Types of Merge
Inner Join
Returns only matching records from both tables.
pd.merge(employees, salaries, on="Emp_ID", how="inner")
Only Emp_ID present in both tables will appear.
Left Join
Returns all records from the left table and matching from the right.
pd.merge(employees, salaries, on="Emp_ID", how="left")
All employees will appear. Missing salaries will show NaN.
Right Join
Returns all records from the right table.
pd.merge(employees, salaries, on="Emp_ID", how="right")
Outer Join
Returns all records from both tables.
pd.merge(employees, salaries, on="Emp_ID", how="outer")
2. Joining Using join()
join() is mainly used to join on index.
Set index first:
employees.set_index("Emp_ID", inplace=True)
salaries.set_index("Emp_ID", inplace=True)
Join:
employees.join(salaries, how="left")
3. Concatenation (Stacking Data)
Concatenate rows:
pd.concat([employees, employees])
Concatenate columns:
pd.concat([employees, salaries], axis=1)
When to Use What?
Use merge() when joining based on a common column.
Use join() when joining based on index.
Use concat() when stacking data vertically or horizontally.
Why Merging and Joining are Important
In Data Analytics, you often:
Combine sales data with customer data
Merge employee records with payroll data
Join product data with inventory records
Combine multiple monthly reports
Understanding merging helps you build complete and accurate datasets.
Key Takeaway
Merging and Joining allow you to combine multiple datasets into one meaningful dataset.
Mastering these techniques is essential for real-world Data Analytics projects.