Iteration in Python

Iteration in Python

This for loop seems simple:

Example
For item in items :
do_something_with(item)

And yet, miracles hide here. As you probably know, the act of efficiently going through a collection, one element at a time, is called iteration. But few understand how Python’s iteration system really works… how deep and well-thought out it is. This post makes you one of those people, giving you the ability to naturally write highly scalable Python applications… able to handle ever-larger data sets in performant, memory-efficient ways.

Iteration is also core to one of Python’s most powerful tools: the generator function. Generator functions are not just a convenient way to create useful iterations. They enable some exquisite patterns of code organization, in a way that – by their very nature – intrinsically encourage excellent coding habits.

This is special, because understanding it threatens to make you a permanently better programmer in every language. Mastering Python generators tends to do that, because of the distinctions and insights you gain along the way. Let’s dive in.

Python has a built-in function called iter(). When you pass it a collection, you get back an iterator object:

Ex.
>>> numbers = [7, 4, 11, 3]
>>> iter(numbers)

Just as in other languages, a Python iterator produces the values in a sequence, one at a time. You probably know an iterator is like a moving pointer over the collection:

Ex.
>>> numbers_iter = iter(numbers)
>>> for num in numbers_iter: print(num)
7
4
11
3

You don’t normally need to do this. If you instead write for num in numbers, what Python effectively does under the hood is call iter() on that collection. This happens automatically. Whatever object it gets back is used as the iterator for that for loop:

Ex.
# This…
for num in numbers:
print(num)

# … is effectively just like this:
numbers_iter = iter(numbers)
for num in numbers_iter:
print(num)

An iterator over a collection is a separate object, with its own identity – which you can verify with id():

Ex.
>>> # id() returns a unique number for each object.
… # Different objects will always have different IDs.
>>> id(numbers)
4330133896
>>> id(numbers_iter)
4330216640

How does iter() actually get the iterator? It can do this in several ways, but one relies on a magic method called __iter__. This is a method any class (including yours) may define; when called with no arguments, it must return a fresh iterator object. Lists have it, for example:

Ex.
>>> numbers.__iter__
<method-wrapper ‘__iter__’ of list object at 0x10130e4c8>
>>> numbers.__iter__()

Python makes a distinction between objects which are iterators, and objects which are iterable. We say an object is iterable if and only if you can pass it to iter(), and get ready-to-use iterator. If that object has an __iter__ method, iter() will call it to get the iterator. Python lists and tuples are iterable. So are strings, which is why you can write for char in my_str: to iterate over my_str’s characters. Any container you might use in a for loop is iterable.

A for loop is the most common way to step through a sequence. But sometimes your code needs to step through in a more fine-grained way. For this, use the built function next(). You normally call it with a single argument, which is an iterator. Each time you call it, next(my_iterator) fetches and returns the next element:

Ex.
>>> names = [“Tome”, “Shelly”, “Garth”]
>>> # Create a fresh iterator…
… names_it = iter(names)
>>> next(names_it)
‘Tom’
>>> next(names_it)
‘Shelly’
>>> next(names_it)
‘Garth’

What happens if you call next(names_it) again? next() will raise a special built-in exception, called StopIteration:

Ex.
>>> next(names_it)
Traceback (most recent call last):
File “”, line 1, in
StopIteration

This is part of Python’s iterator protocol. Raising this specific exception is, by design, how an iterator signals the sequence is done. You rarely have to raise or catch this exception yourself, though we’ll see some paterns later where it’s useful to do so. A good mental model for how a for loop works to imagine it calling next() each time through the loop, exiting when StopIteration gets raised.

When using next() yourself, you can provide a second argument, for the default value. If you do, next() will return that instead of raising StopIteration at the end:

Ex.
>>> names = [“Tom”, “Shelly”, “Garth”]
>>> new_names_it = iter(names)
>>> next(new_names_it, “Rick”)
‘Tom’
>>> next(new_names_it, “Rick”)
‘Shelly’
>>> next(new_names_it, “Rick”)
‘Garth’
>>> next(new_names_it, “Rick”)
‘Rick’
>>> next(new_names_it)
Traceback (most recent call last):
File “”, line 1, in
StopIteration
>>> next(new_names_it, “Jane”)
‘Jane’

Now, let’s consider a different situation. What if you aren’t working with a simple sequence of numbers or strings, but something more complex? What if you are calculating or reading or otherwise obtaining the sequence elements as you go along? Let’s tart with a simple example (so it’s easy to reason about). Suppose you need to write a function creating a list of square numbers, which will be processed by other code:

Ex.
def fetch_squares(max_root):
squares = []
for n in range(max_root):
squares.append(n**2)
return squares

MAX = 5
for square in fetch_squares(MAX):
do_something_with(square)

This works. But there is a potential problem lurking here. Can you spot it?

Here’s one: what if MAX is not 5, but 10,000,000? Or 10,000,000,000? Or more? Your memory footprint is pointlessly dreadful: the code here creates a massive list, uses it once, then throws it away. On top of that, the second for loop cannot even start until the entire list of squares has been fully calculated. If some poor human us using this program, they’ll wonder if the program is stuck.

Even worse: What if you aren’t doing arithmetic to get each element – which is fast and cheap – but making a truly expensive calculation? Or making an API call over the network? Or reading from a database? Your program is sluggish, even unresponsive, and might even crash with an out-of-memory error. Its users will think you’re a terrible programmer.

The solution is to create an iterator to start with, lazily computing each value only when needed. Then each cycle through the loop happens just in time.

For the record, here is how you create an equivalent iterator class, which fully complies with Python’s iterator protocol:

Ex.
class SquaresIterator:
def __iter__(self, max_root_value):
self.max_root_value = max_root_value
self.current_root_value = 0
def __iter__(self):
return self
def __next__(self):
if self.current_root_value >= self.max_root_value:
raise StopIteration
square_value = self.current_root_value ** 2
self.current_root_value += 1
return square_value

# You can use it like this:
for square in SquaresIterator(5):
print(square)

Holy crap, that’s horrible. There’s got to be a better way.

Good news: there’s a better way. It’s called a generator function, and you’re going to love it!