Python Generators – The Lazy Workers of Python

When you first learn Python, you’re introduced to functions and loops. As you go deeper, you meet one of Python’s coolest features, Generators.

But what is a generator?

Why do developers say it’s memory efficient or lazy?

In this blog, we are going to see where and how these generators are going to help us.

🥘 Imagine a Buffet vs. a Waiter

Let’s say you visit two types of restaurants

Buffet Restaurant – All food is laid out at once. You take what you want.
Waiter Served Restaurant – You order food, and the waiter brings one dish at a time.

In programming terms,

Buffet = List – Everything is prepared and loaded in memory at once.
Waiter = Generator – Items are delivered one by one, only when you ask.

This is exactly what generators do, they wait for you to ask for the next value, and then give it.

So What is a Generator?

A generator is a special type of Python function that remembers its state and yields values one at a time using the yield keyword.

From the docs: https://wiki.python.org/moin/Generators

This makes them lazy (they only compute when needed) and memory efficient.

🛠️ Basic Example – Countdown

Let’s see a normal function first,

def countdown_list(n):
    result = []
    while n > 0:
        result.append(n)
        n -= 1
    return result

print(countdown_list(5))  # [5, 4, 3, 2, 1]

Now, let’s do the same with a generator,


def countdown_gen(n):
    while n > 0:
        yield n
        n -= 1

for number in countdown_gen(5):
    print(number)

Output

Notice: We did not return a full list, but we still got values one by one.

⚡ What’s the Benefit?

Imagine n = 1000000. A normal function stores a list of 1 million numbers in memory.

But the generator only generates one number at a time, using much less memory.

🤔 What’s `yield`?

Think of yield like return, but smarter.
When a function hits yield, it pauses.
The next time you ask for a value, it resumes from where it left off.

🔁 Behind the Scenes

You can manually use a generator using the next() function,


gen = countdown_gen(3)
print(next(gen))  # 3
print(next(gen))  # 2
print(next(gen))  # 1
# print(next(gen))  # Raises StopIteration

🧠 Real World Analogy

Imagine you’re delivering water bottles

With a list, you carry all 100 bottles at once — heavy and inefficient.
With a generator, you carry one bottle at a time – lighter, scalable.

🔍 Use Case – Reading Large Files

Imagine you’re reading a 2 GB log file. Don’t load the whole thing!

Use a generator to read line by line,

def read_large_file(file_path):
    with open(file_path) as f:
        for line in f:
            yield line

for log in read_large_file("biglog.txt"):
    print(log)

Only one line is loaded at a time. Fast and memory efficient.

Infinite Sequence Generator


def infinite_counter(start=0):
    while True:
        yield start
        start += 1

counter = infinite_counter()
print(next(counter))  # 0
print(next(counter))  # 1
print(next(counter))  # 2

Processing Large Data Streams (Logs, CSV, JSONL, etc.)

Imagine processing a 1 GB CSV file. Reading it all at once into memory is dangerous.


def read_csv_line_by_line(file_path):
    with open(file_path) as f:
        for line in f:
            yield line.strip().split(',')

✅ You can now process one row at a time without choking RAM.

Streaming API Responses (e.g., Tweets, Logs, Events)

You can stream paginated or real-time API responses,


import requests

def paginated_fetch(base_url):
    page = 1
    while True:
        resp = requests.get(f"{base_url}?page={page}")
        data = resp.json()
        if not data:
            break
        for item in data:
            yield item
        page += 1

✅ Useful for APIs with pagination or streaming endpoints.

Video Frame Streaming (Using OpenCV)

Stream video frames for real-time processing,


import cv2

def video_frame_stream(path):
    cap = cv2.VideoCapture(path)
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        yield frame
    cap.release()

✅ Helps in object detection, surveillance, or motion tracking systems.

🧮 Use Case: Processing Large Datasets with Generators

One of the most common challenges in data science is working with datasets that are too large to fit into memory.

Instead of loading an entire 10 GB CSV file into a pandas DataFrame (which might crash your system), you can use generators to process the data line-by-line, efficiently and incrementally.

📊 Scenario

💼 You have a 10 GB CSV file of global sales data. You need to filter and process only records from the “South Asia” region.

# 🚫 Not recommended for huge files
import pandas as pd
df = pd.read_csv('sales.csv')  # This loads the entire file in memory!

✅ Generator Pipeline to the Rescue


def read_large_csv(file_path):
    with open(file_path) as f:
        header = next(f).strip().split(',')  # Skip header line
        for line in f:
            yield dict(zip(header, line.strip().split(',')))

def filter_region(rows, region_name):
    for row in rows:
        if row.get("region") == region_name:
            yield row

def total_sales(rows):
    for row in rows:
        yield float(row.get("sales", 0.0))

# 🔄 Composing the pipeline
file_path = "sales.csv"
rows = read_large_csv(file_path)
filtered_rows = filter_region(rows, "South Asia")
sales_values = total_sales(filtered_rows)

# Calculate total sales for the region
print("Total South Asia Sales:", sum(sales_values))

📌 Why is this better?

✅ Memory efficient: Loads and processes one row at a time.
✅ Composable: Steps are cleanly separated and reusable.
✅ Scalable: Works even with 50 GB+ files.

Why Generators Are Data Science Friendly

Problem	Generator Benefit
Large CSV/JSON files	Avoids memory crashes
ETL pipelines	Composable, readable, stream-friendly
Lazy filtering and transformations	Only processes what’s needed
Real-time sensor/data ingestion	Processes data as it arrives

Stack Overflow Survey – Count Python Developers from India

https://colab.research.google.com/drive/1nFqkBu1zb-QPZXGSnNGrgV00CoZjK0Cp?usp=sharing