Merging and Joining

Merging and Joining are used to combine multiple datasets into one.
In real-world Data Analytics projects, data is often stored in different tables, so combining them is an essential skill.

First, import pandas:

import pandas as pd

Why Merging is Important

You may have:

  • One table with employee details
  • Another table with salary information
  • A third table with department data

To perform complete analysis, you must combine them.

Sample Data

Employee Table

employees = pd.DataFrame({
"Emp_ID": [1, 2, 3, 4],
"Name": ["Ali", "Sara", "Ahmed", "Ayesha"],
"Department": ["IT", "HR", "Finance", "IT"]
})

Salary Table

salaries = pd.DataFrame({
"Emp_ID": [1, 2, 3, 5],
"Salary": [60000, 50000, 70000, 45000]
})

1. Merge in Pandas

The merge() function is similar to SQL JOIN.

Syntax:

pd.merge(left_df, right_df, on="column_name", how="type")

Types of Merge

Inner Join

Returns only matching records from both tables.

pd.merge(employees, salaries, on="Emp_ID", how="inner")

Only Emp_ID present in both tables will appear.

Left Join

Returns all records from the left table and matching from the right.

pd.merge(employees, salaries, on="Emp_ID", how="left")

All employees will appear. Missing salaries will show NaN.

Right Join

Returns all records from the right table.

pd.merge(employees, salaries, on="Emp_ID", how="right")

Outer Join

Returns all records from both tables.

pd.merge(employees, salaries, on="Emp_ID", how="outer")

2. Joining Using join()

join() is mainly used to join on index.

Set index first:

employees.set_index("Emp_ID", inplace=True)
salaries.set_index("Emp_ID", inplace=True)

Join:

employees.join(salaries, how="left")

3. Concatenation (Stacking Data)

Concatenate rows:

pd.concat([employees, employees])

Concatenate columns:

pd.concat([employees, salaries], axis=1)

When to Use What?

Use merge() when joining based on a common column.
Use join() when joining based on index.
Use concat() when stacking data vertically or horizontally.

Why Merging and Joining are Important

In Data Analytics, you often:

Combine sales data with customer data
Merge employee records with payroll data
Join product data with inventory records
Combine multiple monthly reports

Understanding merging helps you build complete and accurate datasets.

Key Takeaway

Merging and Joining allow you to combine multiple datasets into one meaningful dataset.
Mastering these techniques is essential for real-world Data Analytics projects.

Home » PYTHON FOR DATA ANALYTICS (PYDA) > Pandas > Merging and Joining