Amazon Web Services offers S3 (Simple Storage Service), a scalable object storage service used to store files, datasets, backups, and logs.

Using Python, you can easily interact with Amazon S3 to upload, download, list, and delete files.

What is Amazon S3?

Amazon S3 is an object storage service that stores data as:

Buckets → Containers for files
Objects → Actual files stored inside buckets

Example structure:

my-bucket/
data.csv
reports/sales.xlsx

Why Use S3 in Data Engineering?

Store raw data
Store processed datasets
Backup databases
Store logs
Build data lakes

S3 is highly scalable and cost-effective.

Step 1: Install Required Library

Python uses boto3 to connect to AWS services.

pip install boto3

Step 2: Configure AWS Credentials

You need:

AWS Access Key
Secret Access Key
Region

Configure using:

aws configure

Or set environment variables.

Step 3: Connect to S3 Using Python

import boto3s3 = boto3.client('s3')

Upload a File to S3

s3.upload_file('local_file.csv', 'my-bucket', 'data/local_file.csv')

Parameters:

Local file path
Bucket name
S3 object key

Download a File from S3

s3.download_file('my-bucket', 'data/local_file.csv', 'downloaded.csv')

List Files in a Bucket

response = s3.list_objects_v2(Bucket='my-bucket')for obj in response.get('Contents', []):
    print(obj['Key'])

Delete a File

s3.delete_object(Bucket='my-bucket', Key='data/local_file.csv')

Reading CSV Directly from S3 Using Pandas

import pandas as pddf = pd.read_csv('s3://my-bucket/data/local_file.csv')

For this, you may need additional libraries like s3fs.

Uploading DataFrame to S3

df.to_csv('output.csv', index=False)
s3.upload_file('output.csv', 'my-bucket', 'processed/output.csv')

Best Practices

Never hardcode credentials
Use IAM roles in production
Organize bucket folders properly
Enable versioning
Set proper access permissions
Use lifecycle policies for cost control

Real-World Use Case Example

ETL Workflow:

Extract API data
Save raw data to S3
Transform data
Store processed data back to S3
Load into data warehouse

Interview Answer (Short Version)

Using Python with AWS S3 involves using the boto3 library to upload, download, and manage files in S3 buckets. It is commonly used in data engineering workflows for storing raw and processed datasets.

Final Summary

Python with AWS S3 allows you to:

Automate file uploads and downloads
Build cloud-based data pipelines
Store large datasets
Create scalable data lake architectures

It is an essential skill for modern cloud-based Data Engineering projects.

Home » PYTHON FOR DATA ENGINEERING (PYDE) > Cloud Data Engineering > Python with AWS S3

Free Video Tutorial

Want Mentorship on this Training?

Book a 1-on-1 Consultancy Session

Python with AWS S3