Working with Datasets

Working with datasets is one of the most important skills in Data Analytics. A dataset is a collection of structured information, usually stored in files like CSV, Excel, or databases.

In Python, we commonly use the pandas library to handle datasets efficiently.

What is a Dataset?

A dataset is a structured collection of data organized in rows and columns.

Rows represent records
Columns represent fields or features

Example:

Name | Age | Salary
Ali | 25 | 50000
Sara | 28 | 60000

Common Dataset File Formats

CSV (Comma Separated Values)
Excel (.xlsx)
JSON
SQL Databases

CSV is the most commonly used format in analytics.

Loading a Dataset in Python

First, install pandas if not installed:

pip install pandas

Import pandas and load a CSV file:

import pandas as pddf = pd.read_csv("data.csv")

For Excel files:

df = pd.read_excel("data.xlsx")

Viewing Data

Display first 5 rows:

df.head()

Display last 5 rows:

df.tail()

Check shape (rows and columns):

df.shape

Check column names:

df.columns

Get summary information:

df.info()

Get statistical summary:

df.describe()

Selecting Data

Select a single column:

df["Age"]

Select multiple columns:

df[["Name", "Salary"]]

Select rows using condition:

df[df["Age"] > 25]

Handling Missing Data

Check missing values:

df.isnull().sum()

Remove missing values:

df.dropna()

Fill missing values:

df.fillna(0)

Sorting Data

Sort by a column:

df.sort_values("Salary")

Sort in descending order:

df.sort_values("Salary", ascending=False)

Saving the Dataset

Save as CSV:

df.to_csv("new_data.csv", index=False)

Save as Excel:

df.to_excel("new_data.xlsx", index=False)

Why Working with Datasets is Important

Data analysis starts with understanding and cleaning data.
Proper dataset handling helps you:

Understand data structure
Identify errors
Clean missing values
Filter useful information
Prepare data for visualization and modeling

Key Takeaway

Working with datasets means loading, exploring, cleaning, filtering, and saving data. Mastering these steps is essential for building a strong foundation in Data Analytics.

Home » PYTHON FOR DATA ANALYTICS (PYDA) > Introduction to Data Analysis > Working with Datasets