Extracting data from APIs is a common step in ETL and data pipelines.
APIs (Application Programming Interfaces) allow applications to communicate and share data over the internet.
In simple terms:
API β Sends request
Server β Returns data (usually in JSON format)
This data can then be stored, transformed, and analyzed.
What is an API?
An API is a service that provides access to data through endpoints.
Example endpoint:
When you send a request, the server returns structured data.
Most APIs return:
JSON
XML
JSON is the most common format.
Common HTTP Methods
GET β Retrieve data
POST β Send data
PUT/PATCH β Update data
DELETE β Remove data
For extracting data, we usually use GET.
Extracting Data Using Python
The most popular library is requests.
Install:
pip install requests
Basic example:
import requestsurl = "https://jsonplaceholder.typicode.com/posts"response = requests.get(url)if response.status_code == 200:
data = response.json()
print(data[:2])
else:
print("Error:", response.status_code)
Explanation:
requests.get() sends request
response.status_code checks success
response.json() converts JSON into Python dictionary
Working with JSON Data
API responses are usually dictionaries or lists.
Example:
for post in data:
print(post["title"])
You can convert JSON to Pandas DataFrame:
import pandas as pddf = pd.DataFrame(data)
print(df.head())
This is useful for analysis.
Handling API Authentication
Some APIs require authentication.
Common methods:
API Key
Bearer Token
OAuth
Example with API key:
headers = {
"Authorization": "Bearer YOUR_API_TOKEN"
}response = requests.get(url, headers=headers)
Always keep API keys secure.
Handling Query Parameters
APIs often accept parameters.
Example:
params = {
"userId": 1
}response = requests.get(url, params=params)
print(response.json())
This filters data from the server.
Handling Errors and Exceptions
Use try-except to prevent crashes:
try:
response = requests.get(url)
response.raise_for_status()
data = response.json()
except requests.exceptions.RequestException as e:
print("API Error:", e)
Handling Pagination
Large APIs return data in pages.
Example:
page = 1while True:
response = requests.get(url, params={"page": page})
data = response.json() if not data:
break print("Processing page:", page)
page += 1
Pagination ensures complete data extraction.
Saving API Data
Save to CSV:
df.to_csv("api_data.csv", index=False)
Save to database:
Use INSERT queries inside loop or bulk insert.
Real-World Example
E-commerce analytics:
Extract product data from API
Transform data
Load into data warehouse
Build dashboard
This process is automated in data pipelines.
Best Practices
Check status codes
Handle exceptions
Manage pagination
Secure API keys
Respect rate limits
Log errors
Automate extraction
Common Mistakes
Ignoring API rate limits
Hardcoding credentials
Not handling errors
Loading all data without pagination
Not validating JSON structure
Key Takeaway
Extracting data from APIs involves sending HTTP requests, receiving JSON responses, and converting them into usable formats.
APIs are a powerful source of real-time and external data in modern ETL and data engineering workflows.