When you first learn Python, you’re introduced to functions and loops. As you go deeper, you meet one of Pythonโs coolest features, Generators.
But what is a generator?
Why do developers say it’s memory efficient or lazy?
In this blog, we are going to see where and how these generators are going to help us.
๐ฅ Imagine a Buffet vs. a Waiter
Letโs say you visit two types of restaurants
- Buffet Restaurant โ All food is laid out at once. You take what you want.
- Waiter Served Restaurant โ You order food, and the waiter brings one dish at a time.
In programming terms,
- Buffet = List โ Everything is prepared and loaded in memory at once.
- Waiter = Generator โ Items are delivered one by one, only when you ask.
This is exactly what generators do, they wait for you to ask for the next value, and then give it.
So What is a Generator?
A generator is a special type of Python function that remembers its state and yields values one at a time using the yield keyword.
From the docs: https://wiki.python.org/moin/Generators
This makes them lazy (they only compute when needed) and memory efficient.
๐ ๏ธ Basic Example – Countdown
Letโs see a normal function first,
def countdown_list(n):
result = []
while n > 0:
result.append(n)
n -= 1
return result
print(countdown_list(5)) # [5, 4, 3, 2, 1]
Now, letโs do the same with a generator,
def countdown_gen(n):
while n > 0:
yield n
n -= 1
for number in countdown_gen(5):
print(number)
Output
5
4
3
2
1
Notice: We did not return a full list, but we still got values one by one.
โก What’s the Benefit?
Imagine n = 1000000. A normal function stores a list of 1 million numbers in memory.
But the generator only generates one number at a time, using much less memory.
๐ค What’s yield?
- Think of
yieldlikereturn, but smarter. - When a function hits
yield, it pauses. - The next time you ask for a value, it resumes from where it left off.

๐ Behind the Scenes
You can manually use a generator using the next() function,
gen = countdown_gen(3)
print(next(gen)) # 3
print(next(gen)) # 2
print(next(gen)) # 1
# print(next(gen)) # Raises StopIteration
๐ง Real World Analogy
Imagine you’re delivering water bottles
- With a list, you carry all 100 bottles at once โ heavy and inefficient.
- With a generator, you carry one bottle at a time โ lighter, scalable.
๐ Use Case – Reading Large Files
Imagine youโre reading a 2 GB log file. Don’t load the whole thing!
Use a generator to read line by line,
def read_large_file(file_path):
with open(file_path) as f:
for line in f:
yield line
for log in read_large_file("biglog.txt"):
print(log)
Only one line is loaded at a time. Fast and memory efficient.
Infinite Sequence Generator
def infinite_counter(start=0):
while True:
yield start
start += 1
counter = infinite_counter()
print(next(counter)) # 0
print(next(counter)) # 1
print(next(counter)) # 2
Processing Large Data Streams (Logs, CSV, JSONL, etc.)
Imagine processing a 1 GB CSV file. Reading it all at once into memory is dangerous.
def read_csv_line_by_line(file_path):
with open(file_path) as f:
for line in f:
yield line.strip().split(',')
โ You can now process one row at a time without choking RAM.
Streaming API Responses (e.g., Tweets, Logs, Events)
You can stream paginated or real-time API responses,
import requests
def paginated_fetch(base_url):
page = 1
while True:
resp = requests.get(f"{base_url}?page={page}")
data = resp.json()
if not data:
break
for item in data:
yield item
page += 1
โ
Useful for APIs with pagination or streaming endpoints.
Video Frame Streaming (Using OpenCV)
Stream video frames for real-time processing,
import cv2
def video_frame_stream(path):
cap = cv2.VideoCapture(path)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
yield frame
cap.release()
โ Helps in object detection, surveillance, or motion tracking systems.
๐งฎ Use Case: Processing Large Datasets with Generators
One of the most common challenges in data science is working with datasets that are too large to fit into memory.
Instead of loading an entire 10 GB CSV file into a pandas DataFrame (which might crash your system), you can use generators to process the data line-by-line, efficiently and incrementally.
๐ Scenario
๐ผ You have a 10 GB CSV file of global sales data. You need to filter and process only records from the “South Asia” region.
# ๐ซ Not recommended for huge files
import pandas as pd
df = pd.read_csv('sales.csv') # This loads the entire file in memory!
โ Generator Pipeline to the Rescue
def read_large_csv(file_path):
with open(file_path) as f:
header = next(f).strip().split(',') # Skip header line
for line in f:
yield dict(zip(header, line.strip().split(',')))
def filter_region(rows, region_name):
for row in rows:
if row.get("region") == region_name:
yield row
def total_sales(rows):
for row in rows:
yield float(row.get("sales", 0.0))
# ๐ Composing the pipeline
file_path = "sales.csv"
rows = read_large_csv(file_path)
filtered_rows = filter_region(rows, "South Asia")
sales_values = total_sales(filtered_rows)
# Calculate total sales for the region
print("Total South Asia Sales:", sum(sales_values))
๐ Why is this better?
- โ Memory efficient: Loads and processes one row at a time.
- โ Composable: Steps are cleanly separated and reusable.
- โ Scalable: Works even with 50 GB+ files.
Why Generators Are Data Science Friendly
| Problem | Generator Benefit |
|---|---|
| Large CSV/JSON files | Avoids memory crashes |
| ETL pipelines | Composable, readable, stream-friendly |
| Lazy filtering and transformations | Only processes what’s needed |
| Real-time sensor/data ingestion | Processes data as it arrives |
Stack Overflow Survey โ Count Python Developers from India
https://colab.research.google.com/drive/1nFqkBu1zb-QPZXGSnNGrgV00CoZjK0Cp?usp=sharing
