Pandas is a powerful Python library used for data manipulation and analysis. It is one of the most important tools in Data Analytics and Data Science.
Pandas makes it easy to work with structured data such as tables, spreadsheets, and CSV files.
Why Use Pandas?
- Handles structured data efficiently
- Works with CSV, Excel, SQL, and more
- Provides powerful data filtering and transformation tools
- Built on top of NumPy
- Widely used in industry
Installing Pandas
If not installed, use:
pip install pandas
Import Pandas:
import pandas as pd
pd is the common alias for pandas.
Core Data Structures in Pandas
Pandas mainly uses two data structures:
1. Series
A Series is a one-dimensional labeled array.
import pandas as pddata = pd.Series([10, 20, 30, 40])
print(data)
2. DataFrame
A DataFrame is a two-dimensional table with rows and columns.
data = {
"Name": ["Ali", "Sara", "Ahmed"],
"Age": [25, 28, 30],
"Salary": [50000, 60000, 70000]
}df = pd.DataFrame(data)
print(df)
Loading Data into Pandas
From a CSV file:
df = pd.read_csv("data.csv")
From an Excel file:
df = pd.read_excel("data.xlsx")
Viewing Data
First 5 rows:
df.head()
Last 5 rows:
df.tail()
Check data information:
df.info()
Statistical summary:
df.describe()
Selecting Data
Select a column:
df["Age"]
Select multiple columns:
df[["Name", "Salary"]]
Filter data:
df[df["Age"] > 26]
Why Pandas is Important in Data Analytics
Pandas helps you:
Clean data
Filter and sort records
Handle missing values
Group and summarize data
Prepare data for visualization and machine learning
Key Takeaway
Pandas is the backbone of data analysis in Python. Mastering DataFrames and Series will allow you to efficiently load, clean, analyze, and transform real-world datasets.