Generator Functions

Generator Functions

Python provides a tool called the generator function, which… well, it’s hard to describe everything it gives you in one sentence. Of its many talents, I’ll first focus on how it’s a very useful shortcut for creating iterators.

A generator function looks a lot like a regular function. But instead of saying return, it uses a new and different keyword: yield. Here’s a simple example:

Example
def gen_nums():
n = 0
while n < 4: yield n n += 1 Use it in a for loop like this: Ex. >>> for num in gen_nums():
… print(num)
0
1
2
3

Let’s go through and understand this. When you call gen_nums() like a function, it immediately returns a generator object:

Ex.
>>> sequence = gen_nums()
>>> type(sequence)
<class ‘generator’>

The generator function is gen_nums – what we define and then call. A function is a generator function if and only if it uses “yield” instead of “return”. The generator object is what that generator function returns when called – sequence, in this case. A generator function will always return a generator object; it can’t return anything else. And this generator object is an iterator, which means you can iterate through it using next() or a for loop:

Ex.
>>> sequence = gen_nums()
>>> next(sequence)
0
>>> next(sequence)
1
>>> next(sequence)
2
>>> next(sequence)
3
>>> next(sequence)
Traceback (most recent call last):
File “”, line 1, in
StopIteration

Ex.
>>> # Or in a for loop:
… for num in gen_nums(): print(num)

0
1
2
3

The flow of code works like this: when next() is called the first time, or the for loop first starts, the body of gen_nums starts executing at the beginning, returning the value to the right of the yield.

So far, this is much like a regular function. But the next time next() is called – or, equivalently, the next time through the for loop – the function doesn’t start at the beginning again. It starts on the line after the yield statement. Look at the source of gen_nums() again:

Ex.
def gen_nums():
n = 0
while n < 4:
yield n
n += 1

gen_nums is more general than a function or subroutine. It’s actually a coroutine. You see, a regular function can have several exit points (otherwise known as return statements). But it has only one entry point: each time you call a function, it always starts at the first line of the function body.

A coroutine is like a function, except it has several possible entry points. It starts with the first line, like a normal function. But when it “returns”, the coroutine isn’t exiting, so much as pausing. Subsequent calls with next() – or equivalently, the next time through the for loop – start at that yield statement again, right where it left off; the re-entry point is the line after the yield statement.

And that’s the key: Each yield statement simultaneously defines an exit point, and a re-entry point.

For generator objects, each time a new value is requested, the flow of control picks up on the line after the yield statement. In this case, the next line increments the variable n, then continues with the while loop.

Notice we do not raise StopIteration anywhere in the body of gen_nums(). When the function body finally exits – after it exits the while loop, in this case – the generator object automatically raises StopIteration.

Again: each yield statement simultaneously defines an exit point, and a re-entry point. In fact, you can have multiple yield statements in a generator:

Ex.
def gen_extra_nums():
n = 0
while n < 4 yield n n += 1 yield 42 # Second yield Here’s the output when you use it: Ex. >>> for num in gen_extra_nums():
… print(num)
0
1
2
3
42

The second yield is reached after the while loop exits. When the function reaches the implicit return at the end, the iteration stops. Reason through the code above, and convince yourself it makes sense.

Let’s revisit the earlier example, of cycling through a sequence of squares. This is how we first did it:

Ex.
def fetch_squares(max_root):
squares = []
for n in range(max_root):
squares.append(n**2)
return squares

MAX = 5
for square in fetch_squares(MAX):
do_something_with(square)

As an excercise, pause here, open up a new Python file, and see if you can write a gen_squares generator function that accomplishes the same thing.

Done? Great. Here’s what it looks like:

Ex.
>>> def gen_squares(max_num):
… for num in range(max_num):
… yield num ** 2

>>> MAX = 5
>>> for square in gen_squares(MAX):
… print(square)
0
1
4
9
16

Now, this gen_squares has a problem in Python 2, but not Python 3. Can you spot it?

Here it is: range returns an iterator in Python 3, but in Python 2 it returns a list. If MAX is huge, that creates a huge list inside, killing scalability. So if you are using Python 2, your gen_squares needs to use xrange instead, which acts just like Python 3’s range.

The larger point here affects all versions of Python. Generator functions potentially have a small memory footprint, but only if you code intelligently. When writing generator functions, be watchful for hidden bottlenecks.

Now, strictly speaking, we don’t need generator functions for iteration. We just want them, because they make certain patterns of scalability far easier. Now that we’re in a position to understand it, let’s look at the SquaresIterator class again:

Ex.
# Same code we saw earlier.
class SquaresIterator:
def __init__(self, max_root_value):
self.max_root_value = max_root_value
self.current_root_value = 0
def __iter__(self):
return self
def __next__(self):
if self.current_root_value >= self.max_root_value:
raise StopIteration
square_value = self.current_root_value ** 2
self.current_root_value += 1
return square_value

# You can use it like this:
for square in SqauresIterator(5):
print(square)

Each value is obtained by invoking its __next__ method, until it raises StopIteration. This produces the same output; but look at the source code for SquaresIterator class, and compare it to the source for the generator above. Which is easier to read? Which is easier to maintain? And when requirements change, which is easier to modify without introducing errors? Most people find the generator solution easier and more natural.

Authors often use the word “generator” by itself, to mean either the generator function, or the generator object returned when you call it. Typically the writer thinks it’s obvious by the context which they are referring to; sometimes it is, sometimes not. Sometimes the writer is not even clear on the distinction to begin with. But it’s important: just as there is a big difference between a function, and the value it returns when you call it, so is there a big difference between the generator function, and the generator object it returns.

In your own though and speech, I encourage you to only use the phrases “generator function” and “generator object”, so you are always clear inside yourself, and in your communication. (Which also helps your teammates be more clear.) The only exception: when you truly mean “generator functions and objects”, lumping them together, then it’s okay to just say “generators”. I’ll lead by example in this book.