Generators and Iterators

Generators and Iterators are tools in Python used to handle data efficiently, especially large datasets.

They allow you to process data one item at a time instead of loading everything into memory.

This improves:

Memory efficiency
Performance
Scalability

What is an Iterator?

An Iterator is an object that allows you to traverse (loop through) a collection one element at a time.

An object is an iterator if it has:

iter()
next()

Example:

numbers = [1, 2, 3]
iterator = iter(numbers)print(next(iterator))
print(next(iterator))
print(next(iterator))

Output:

1
2
3

When elements finish, it raises StopIteration.

How Iterators Work

Iterator follows this process:

  1. iter() returns the iterator object
  2. next() returns next value
  3. Raises StopIteration when done

You can also create custom iterators.

Example:

class Counter:
def __init__(self, limit):
self.limit = limit
self.current = 0 def __iter__(self):
return self def __next__(self):
if self.current < self.limit:
self.current += 1
return self.current
else:
raise StopIteration

What is a Generator?

A Generator is a simpler way to create iterators using the yield keyword.

Instead of writing iter() and next(), you use yield.

Example:

def count_up_to(limit):
for i in range(1, limit + 1):
yield igen = count_up_to(5)for num in gen:
print(num)

Generators automatically handle StopIteration.

Yield vs Return

return:

Ends function completely

yield:

Pauses function and remembers state
Resumes from where it left off

This makes generators memory-efficient.

Generator Expressions

Similar to list comprehensions but use parentheses.

List comprehension (stores all values):

numbers = [x*x for x in range(1000000)]

Generator expression (memory efficient):

numbers = (x*x for x in range(1000000))

Generator expressions do not store full list in memory.

Why Generators Are Important

They:

Use less memory
Handle large datasets efficiently
Improve performance
Support lazy evaluation

Used in:

Reading large files
Streaming data
Data pipelines
Big data processing

Iterator vs Generator

Iterator:

More code
Manual implementation
Requires iter and next

Generator:

Short and simple
Uses yield
Automatically creates iterator

Generators are easier and cleaner.

Real-World Example

Reading large file line by line:

with open("large_file.txt") as file:
for line in file:
process(line)

File objects are iterators.
They read one line at a time, not entire file.

When to Use Generators

Large datasets
Streaming data
Infinite sequences
Memory-sensitive applications

Key Takeaway

Iterators allow sequential access to data.
Generators are a simple and memory-efficient way to create iterators using yield.

They are powerful tools for handling large data efficiently in Python.

Home » PYTHON FOR DATA ENGINEERING (PYDE) > Working with Data at Scale > Generators and Iterators