Today we're going to take a closer look at collections in Python, and two special features (iterators and generators) that let you create your own collection-like objects with special properties.
1. The Collection Protocol (again)
You probably recall from an earlier lab exercise that there are some special methods you can implement to emulate the behaviour of sequences. Here they are once more:
len(x) can be implemented by x.__len__() x[i] can be implemented by x.__getitem__(i) x[i] = y can be implemented by x.__setitem__(i, y) del x[i] can be implemented by x.__delitem__(i) y in x can be implemented by x.__contains__(y)
This time, i'll actually ask you to give this a whirl.
Have a look at the following, which is just the beginning of a class whose instances represent a collection of files in a directory, using dictionary-like behaviour.
import os class DictDir: def __init__(self, path): self.path = path if not os.path.exists(path): os.makedirs(path) # see help(os.makedirs) def keys(self): return os.listdir(self.path) ...
DictDir instance should be keyed on the names of
files in its directory; getting or setting the value associated
with a key should get or set the contents of the file.
For example, if the directory
contains a text file named
and the file contains the string “Barcelona”,
we would want the following behaviour:
>>> dd = DictDir('/tmp/spam') >>> dd['manuel'] 'Barcelona' >>> 'basil' in dd 0 >>> dd['basil'] = 'Fawlty' >>> 'basil' in dd 1 >>> dd['basil'] 'Fawlty' >>> del dd['manuel'] >>> dd.keys() ['basil'] >>>
(And at this point,
/tmp/spam/manuel would be gone,
and there would be a new text file at
/tmp/spam/basil containing the string
Q1. Fill in the
by implementing the methods of the collection protocol.
Tip: Instead of manipulating strings yourself to
assemble the directory name and filename into a path,
use the handy
which works on a variety of platforms.
2. The Default Iteration Protocol
What happens when you try to loop over a
If your implementation is anything like mine, you'll probably get an error.
>>> dd = DictDir('/tmp/spam') >>> for filename in dd: ... print filename ... Traceback (most recent call last): File "<stdin>", line 2, in ? File "dictdir.py", line 13, in __getitem__ path = os.path.join(self.path, name) File "/usr/lib/python2.2/posixpath.py", line 48, in join if b[:1] == '/': TypeError: unsubscriptable object >>>
What's happening here?
What is causing this error?
Tip: Try adding
Check back at the bottom of Explore 6
for a description of what the
for loop is doing here.
For situations like this,
Python provides a special iterator protocol
to give you more control over how
operate on your objects.
The iterator protocol also affects iteration performed
by other built-in functions,
In Python, an iterator is an object whose job is to obtain the next item from a collection of items. Iterators pretty much have to be mutable, because every time an iterator returns an item, it has to move forward to the next item. And iterators are usually a separate object from the collection being iterated over, since you usually want to be able to iterate over a collection without destroying it.
When Python starts a loop over an object,
it looks for a method named
on the object. It calls this method to obtain
an iterator. Then, each time through the loop,
it calls the
next() method on the
iterator to get the next item.
Iteration stops when the iterator raises a
In order to be a proper iterator,
an object must implement just two methods:
which should just return
(since an iterator is its own iterator),
Here's an example of an iterator that would work
for the above
class ListIterator: def __init__(self, items): self.items = items[:] # make a copy of the list self.index = -1 def __iter__(self): return self def next(self): self.index += 1 if self.index < len(self.items): return self.items[self.index] else: raise StopIteration
DictDirobject like this:
class DictDir: ... def __iter__(self): return ListIterator(self.keys()) ...
for filename in ddwhere
DictDirobject, and thereby loop over the filenames in the directory.
I showed you
so you would see an example of how iterators are implemented.
But note that in real life,
is actually unnecessary, because it implements exactly
the normal iteration behaviour of a list.
Python has a built-in function
that already implements the default iteration behaviour.
So it would also work to add just this to
class DictDir: ... def __iter__(self): return iter(self.keys()[:]) ...
In general, there are two ways to use any iterator.
- Loop over it:
for item in iterator: ....
- Call its
while blah: ... iterator.next() ...
The first way is simpler; the second way gives you more control. You might want to use the second way if you have to go off and do something else before coming back to deal with the next item.
Okay. Now here's a little iterator exercise.
Recall that the built-in
produces a list of numbers, which you can then iterate over.
But this can be wasteful of time and space;
if you write
for i in range(1000000): print i,
Python will actually construct a list with a million numbers
in it before starting the loop.
Iterators are one way to improve the situation.
If you have an object that stores only the necessary
information about the range
range() takes three arguments:
start, stop, and step), it can easily compute the
numbers one by one as they are needed,
instead of wasting a lot of space on the entire list.
Here's a start.
class Range: def __init__(self, start, stop, step): self.start, self.stop, self.step = start, stop, step def __iter__(self): ... class RangeIterator: def __init__(self, ...): ... def __iter__(self): return self def next(self): ...
Fill in the parts marked with
in the above example,
for i in Range(x, y, z)
has the same effect as
for i in range(x, y, z),
only without ever constructing the entire list of numbers.
Q4. Write an iterator that takes any sequence or iterator as an argument, and produces every other item of the given sequence (starting with the first item, then the third, and so on).
Sometimes when you're trying to write an iterator,
it isn't practical to design a
that can return after each item.
There may be too much going on to store and
remember in the iterator object.
So Python recently gained the ability to do something truly remarkable: it can suspend functions in the middle of execution, and resume them later. This really changes what you can do with an iterator. It also means you can have multiple suspended functions waiting around, and you can pick which one you want to resume.
This ability is triggered by the
It is aptly named, for it both yields a value to
its caller, and yields control back to its caller.
A function which generates values in this way is called
Generators are defined using
def just like
normal functions, but if the keyword
appears anywhere in the function, you get a generator
object instead of a normal function.
And generator objects can be used as iterators.
For example, here is a generator that produces the sequence 1, -1, 2, -2, 3, -3, 4, -4, 5, -5.
def alternate(): for i in range(1, 6): yield i yield -i
Writing the same thing without generators would be much more tedious, in part because the class now has to remember whether the next number should be positive or negative:
class alternate(): def __init__(self): self.i = 1 self.negative = 0 def __iter__(self): return self def next(self): if self.negative: item = -self.i self.i += 1 else: if self.i >= 6: raise StopIteration item = self.i self.negative = not self.negative return item
The generator gains a lot of simplicity because
the interpreter remembers where it was in the generator
each time the generator pauses; it remembers that it
was inside a loop, and it remembers which
statement it was on. In the non-generator version,
both of these things have to be explicitly stored as variables.
Note that generators are such a new feature that they are enabled by default only in Python 2.3. To use generators in Python 2.2, you need to say
from __future__ import generators
Q5. Write an generator that takes any sequence or iterator as an argument, and produces every other item of the given sequence (starting with the first item, then the third, and so on).
Generators are particularly powerful when you want to do iteration over a hierarchy instead of a list, since generators can call other generators recursively.
Q6. Write a generator that returns the list of all files in a subdirectory tree.
Here's a start:
import os def allfiles(directory): for filename in os.listdir(directory): path = os.path.join(directory, filename) if os.path.isdir(path): ... if os.path.isfile(path): ...
Fill in the parts marked with
There's no assignment this week. Please spend the time working out your project's design and completing the test suite, which will be due on April 7. On April 7, the following are due:
- Outline of the modules and classes in your project.
- Documentation explaining what each function and method is supposed to do. (It doesn't have to be a lot. But write enough so that I can see how all the pieces work and fit together.)
- A test suite (in
unitteststyle) containing tests for any functions and methods that can be tested independently of the whole project. (You don't have to write tests for things that can only be tested by a user. But try to separate as much functionality as possible out of the user interface and into modules.) There should be one test file corresponding to each module (if the module is called
spam.py, put the tests in