Beautiful Code, Spring 2003: Explore 6

For a long time, six has been my favourite small number. It's the first perfect number (it equals the sum of its divisors: 1, 2, and 3). I've always felt there was something magical about six.

This week we'll delve into a bit of the magic behind the object-oriented syntax you've already been using. Because objects are everywhere in Python, you've been using objects and calling methods in your programs. Now you'll create your own kinds of objects.

1. Shared State

Procedural programming consists of defining behaviours. This is what you have been doing so far by writing your own functions. Each function performs some operation, but operates in its own local namespace, apart from all other functions. Furthermore, each call to the function sees a fresh, new local namespace, unaffected by any other calls to the same function. The only way that functions can see things in common is to use outer namespaces, such as the global namespace of a module.

>>> x = 0
>>> def next():
...     global x
...     x = x + 1
...     return x
... 
>>> next()
1
>>> next()
2
>>> next()
3
>>>

The above is an example of using the global keyword to allow reassignment of a global variable. You need the global keyword for reassigment, but you don't need it just to refer to a global variable. When you refer to a global variable, you can still cause visible outside effects using mutation, if the global variable has a mutable value.

>>> y = [1, 2, 3]
>>> def more():
...     y.append(y[-1] + 1)
... 
>>> more()
>>> y
[1, 2, 3, 4]
>>> more()
>>> more()
>>> more()
>>> y
[1, 2, 3, 4, 5, 6, 7]
>>>

In this example, the function more() is not changing what the variable y refers to. (The function can't do that, because y is global and there is no global y declaration.) The function is using y to refer to the list, and telling the list: "please change yourself".

A function can also maintain external state in a non-global scope. You can create other enclosing scopes, besides the global scope. This isn't done very often in most Python programs, but you should understand it.

>>> import random
>>> def fountain():
...     numbers = range(5)
...     def pick():
...         number = numbers.pop(0)
...         numbers.append(number)
...         return number
...     return pick
... 
>>> getnumber = fountain()
>>> getnumber()
0
>>> getnumber()
1
>>> getnumber()
2
>>> getnumber()
3
>>> getnumber()
4
>>> getnumber()
0
>>>

The above example defines a function within an inner scope, then returns the inner function. The inner scope is created when we call the outer function. Because we only called fountain() once, there is only one variable named numbers, and every call to getnumber() affects the same variable.

Q1. Suppose the line numbers = range(10) were moved one line down, inside the pick() function. How could you exhibit a change in the behaviour of fountain()?

Q2. Suppose the line numbers = range(10) were moved one line up, outside the fountain() function. How could you exhibit a change in the behaviour of fountain()?

If you have any difficulty understanding how to answer these two questions, please ask me for an explanation before proceeding.

2. Object Concepts

In object-oriented programming, the behaviour and state are bound together. An object is an encapsulation of behaviour and data. The behaviour is specified for entire classes of objects at once, whereas the data has to do with each individual object. For example, the upper() method is defined to mean the same thing for all strings; you do get different results, though, depending on what's in the string.

Although there are a few ways of binding a function together with a scope, as we just saw, Python has a specific convention for defining objects. The standard way is more convenient because it lets you bundle together potentially many functions and many pieces of data. To describe the behaviour of a class of objects, you define a class; the class contains methods that give the objects their behaviour. An object that belongs to a particular class is usually called an instance of the class.

Here is how you would write the first example, above, using a class instead of a global variable. There's a bunch of new stuff here, so it may look puzzling. It will all be explained soon; i just wanted you to see an early example so you have some idea what we're talking about.

>>> class Counter:
...     def __init__(self):
...         self.x = 0
...     def next(self):
...         self.x = self.x + 1
...         return self.x
... 
>>> c = Counter()
>>> c.next()
1
>>> c.next()
2
>>> c.next()
3
>>>

Calling the class by saying Counter() produces a new instance of the Counter class with the specified behaviour. Then we can call methods on the instance object.

Admittedly, the class definition a little more verbose than the version with the global variable. But it's also a great deal more flexible. One major advantage is that we can now easily create many independent counters.

>>> d = Counter()
>>> e = Counter()
>>> d.next()
1
>>> d.next()
2
>>> c.next()
4
>>> e.next()
1
>>> d.next()
3
>>> e.next()
2
>>>

Before we get to the details of defining your own classes, we'll talk a bit about how Python's object system works.

3. Attributes

As you know, you can use the dot to retrieve an attribute of something, such as a variable from a module's namespace.

>>> import math
>>> math.sin
<built-in function sin>
>>>

You can use the dot to retrieve attributes from an instance object this way.

>>> c = Counter()
>>> 
>>> c.next()
1
>>> c.next()
2
>>> c.x
2
>>>

You can also assign to attributes of things. For example, you can stash anything you want into the namespace of a Counter instance, or into the namespace of a module.

>>> c.x
2
>>> c.x = 8
>>> c.next()
9
>>> c.blargh = 29
>>> print c.blargh
29
>>> c.ziggy
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: Counter instance has no attribute 'ziggy'
>>> 
>>> math.blargh = 42
>>> print math.blargh
42
>>> math.ziggy
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'ziggy'
>>>

However, you can't assign to attributes of most built-in types, because they're read-only.

>>> n = 5
>>> n.a = 6
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'int' object has only read-only attributes (assign to .a)
>>>

4. Bound Methods

You also use the dot to call a method on an object. Let's take a closer look at how method calls work.

You already know that you can take a string and call methods on it to produce results:

>>> refrain = 'And a partridge in a pear tree.'
>>> refrain.upper()
'AND A PARTRIDGE IN A PEAR TREE.'
>>> refrain.title()
'And A Partridge In A Pear Tree.'
>>>

But you can also walk away with that method without calling it, just like you can manipulate functions without calling them:

>>> method = refrain.upper
>>> method
<built-in method upper of str object at 0x8150e28>
>>> method()
'AND A PARTRIDGE IN A PEAR TREE.'
>>>

So, when you call a method as in refrain.upper(), the effect is to first retrieve the attribute named "upper" from the variable refrain, and then to take that retrieved result and call it like a function. The retrieved method retains knowledge of the object it's associated with; we call it a bound method because it's bound to the object. The representation that's displayed for the bound method, above, hints at this association:

>>> method
<built-in method upper of str object at 0x8150e28>
>>> hex(id(refrain))
'0x8150e28'

(The hex() function simply displays a hexadecimal number.) You can see here that the method we retrieved above remembers that it's associated with object 0x8150e28, which is the string in the refrain variable.

Just like other functions, of course, methods can mutate things. Some methods mutate their objects and some don't.

>>> numbers = [1, 2, 3]
>>> foo = numbers.append
>>> foo
<built-in method append of list object at 0x814fa4c>
>>> foo(5)
>>> foo(6)
>>> numbers
[1, 2, 3, 5, 6]
>>>

Q3. Suppose that stuff = [1, 3, 5, 2, 7, 6, 8, 0] and indices = [1, 2, 4, 0]. Without the help of the computer, figure out the result of map(stuff.pop, indices). Then call me over and tell me your answer.

There are many similarities between module lookups and instance lookups. In both cases, a dot is used to retrieve an attribute. When you say math.sin, you retrieve the function referenced by the sin variable within the math module. Functions retain their knowledge of the global namespace in which they were defined, which is how the next() function in the very first example remembers the next number to return. When you say refrain.upper you retrieve the value referenced by the upper attribute of the refrain object, and the method retains its association to the object.

The difference is that, once you define a module named math, there can only be one module named math. With a class, you can create as many objects you like, all with the same behaviour but carrying different data.

5. Defining Classes

Now let's go back and take apart that example. Here it is again:

class Counter:
    def __init__(self):
        self.x = 0
    def next(self):
        self.x = self.x + 1
        return self.x

The first line says that we're defining a class called Counter. Class names always begin with a capital letter, and the names usually consist of CapitalizedWordsSmushedTogether. Within this class we've defined two methods: __init__ and next.

__init__ is a special method that gets run immediately when an object is first created. We call this method the constructor. In this case, it takes the place of the line x = 0 that used to be outside the function, setting up the value of x. There are many other special methods, but we'll get to those later. The convention is that any special method has a name beginning with two underscores and ending with two underscores, and by "special" we mean that the method is called automatically: you don't name the __init__ method in an explicit call, but it gets called anyway.

Both methods have an argument called self. And instead of manipulating x, we manipulate self.x. That's because we're working with a variable inside the object: we call this an instance variable. We say self to identify the object namespace in which the variable x resides. All the methods are passed the object instance as the first argument. By convention, the first argument to any method is always called self (since it refers to the object itself).

In other languages you may have encountered, the object may have a different special name. For example, in C++ and Java the object is called this. Also, in other languages, you don't have to identify the object namespace. In C++ and Java, you can just name variables inside the object, such as x in this example, and the compiler will guess that you are talking about an instance variable. The compiler can only do this because in C++ and Java, classes must declare all of their instance variables. In Python, you must explicitly refer to instance variables by putting self. in front of the variable name.

When we call Counter() to create an instance, the instance object begins with an empty namespace. The __init__ method is called on the new instance, where the instance is passed as the first argument, and the arguments to Counter() become the rest of the arguments. (There are no arguments to Counter() here, so __init__ just gets one argument.) Then __init__ sets the instance variable self.x to zero.

When we later call the next() method, the extra argument is again inserted in front. We call the method with no arguments, but the method gets the instance as the first argument. The method then increments and returns self.x.

The extra argument self is inserted in front of the argument list for any method call on an instance. This is quite apparent in some error messages; be careful not to let it confuse you:

>>> c = Counter()
>>> c.next(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: next() takes exactly 1 argument (2 given)
>>>

The complaint here is that next() was expecting just one argument (self), but it got two arguments (self and 5).

6. Unbound Methods

Classes are objects too, and they also have attributes. In particular, the methods are attributes of the class.

>>> c
<__main__.Counter instance at 0x815891c>
>>> c.next
<bound method Counter.next of <__main__.Counter instance at 0x815891c>>
>>> Counter.next
<unbound method Counter.next>
>>>

Counter.next is an unbound method. It's the method as you really defined it, with one argument, and it's not associated with any particular instance.

When you retrieve the next() method through an instance, as when you say c.next, the unbound method is transformed into a bound method. Now you understand what this "binding" does: it merely holds on to the instance, and inserts the instance as the first argument when making the call.

The following two calls have exactly the same effect:

Counter.next(c)
c.next()

For the next few questions, have a look at this slightly enhanced version of the Counter class:

class FancyCounter:
    def __init__(self, start):
        self.x = start

    def next(self, inc=1):
        self.x = self.x + inc
        return self.x

    def set(self, value):
        self.x = value

Q4. Adjust the class definition so we can construct a new counter starting at zero when we call FancyCounter() with no arguments.

Q5. Add another argument to the constructor that lets us set the default increment when we create a new instance. This default increment would be used if no increment is specified when we call next(). The default increment should be 1 if we construct an instance of Counter with no arguments. Here's the expected behaviour:

>>> a = FancyCounter()
>>> a.x
0
>>> a.next()
1
>>> a.next()
2
>>> a.next(3)
5
>>> a.next(3)
8
>>> b = FancyCounter(5, 7)
>>> b.next()
12
>>> b.next()
19
>>> b.next(1)
20
>>> b.next(1)
21
>>> b.next()
28
>>>

Q6. Adjust the set() method so it can be used to set the default increment as well as the value.

Q7. In your final FancyCounter class, what are the minimum and maximum number of arguments allowed for each of the following calls? Assume that c is an instance of FancyCounter.

FancyCounter(...)
FancyCounter.__init__(...)
FancyCounter.next(...)
FancyCounter.set(...)
c.__init__(...)
c.next(...)
c.set(...)

7. Common Special Methods

I mentioned earlier that there are many special method names. Here are two that are used particularly often.

As you know, when you enter an expression, the interpreter displays a representation of the result. The default representation for an instance object is pretty dull. It just shows the module name and class name of the instance, and the instance's unique identifier in hexadecimal:

>>> c = Counter()
>>> c
<__main__.Counter instance at 0x8156e54>
>>>

You can control this display by defining a __repr__ method that returns a string. Having a more meaningful representation can make debugging a lot easier. Remember that the convention is for the representation to show what you would have to type in to produce the value, if possible. So if your object is simple enough that you could create an identical object just by calling the constructor with some set of arguments, the representation could show exactly how to construct the object. For example:

class Counter:
    def __init__(self, start):
        self.x = start

    def __repr__(self):
        return 'Counter(%d)' % self.x

    def next(self, inc=1):
        self.x = self.x + inc
        return self.x

Only do this when you can regenerate the entire state of the object just from the constructor. For anything more complicated, you can just make a simplified explanation of what's in the object, and put that in angle brackets. (Angle brackets are the convention for representing something that can't be typed in.)

Whenever an instance object x is converted to a string, the x.__str__() method is called. Conversion to a string can be caused by an explicit call to str(x), or by formatting, as in '%s' % x, or just by printing, as in print x. Defining a __str__ method lets you control this.

If you define __repr__ but not __str__, conversion to a string will use __repr__.

8. The Number Protocol

Some special methods can be used to implement your own behaviour for arithmetic operators. For a binary operator, the behaviour is determined by the left operand; if the left operand is an instance, a special method is called to determine the result.

Assuming that x is an instance of a class called XClass, the three expressions in each row are equivalent:

x + y        x.__add__(y)             XClass.__add__(x, y)
x - y        x.__sub__(y)             XClass.__sub__(x, y)

-x           x.__neg__()              XClass.__neg__(x)
abs(x)       x.__abs__()              XClass.__abs__(x)

x * y        x.__mul__(y)             XClass.__mul__(x, y)
x ** y       x.__pow__(y)             XClass.__pow__(x, y)

x / y        x.__div__(y)             XClass.__div__(x, y)
x % y        x.__mod__(y)             XClass.__mod__(x, y)

Q8. Extra for experts (or for bored people). Define a class for complex numbers.

9. The Comparison Protocol

The special method __cmp__ defines how your object compares to other objects. It should return a number less than, equal to, or greater than zero if the left operand is less than, equal to, or greater than the right operand. This affects the result of all the comparison operators, as well as the sorting behaviour when you call .sort() on a list containing instance objects.

Assuming that x is an instance of a class called XClass, the three expressions in each row are equivalent:

x < y        x.__cmp__(y) < 0        XClass.__cmp__(x, y) < 0
x > y        x.__cmp__(y) > 0        XClass.__cmp__(x, y) > 0
x <= y       x.__cmp__(y) <= 0       XClass.__cmp__(x, y) <= 0
x >= y       x.__cmp__(y) >= 0       XClass.__cmp__(x, y) >= 0
x == y       x.__cmp__(y) == 0       XClass.__cmp__(x, y) == 0
x != y       x.__cmp__(y) != 0       XClass.__cmp__(x, y) != 0

(In recent versions of Python, you can also define each of these operations separately. The corresponding methods are called __lt__, __gt__, __le__, __ge__, __eq__, and __ne__. Each one is expected to return a true or false value. If these are defined, __cmp__ is not used.)

The default comparison behaviour for instances is just to compare the identifiers. This means that they won't sort in any meaningful order, though it will be consistent. It also means that, unless you define a special comparison method, comparing two instances for equality is the same as comparing their identities.

10. The Collection Protocol

You can also make your objects behave like sequences or dictionaries by responding to the usual operators.

Assuming that x is an instance of a class called XClass, the three expressions in each row are equivalent:

len(x)       x.__len__()             XClass.__len__(x)
x[i]         x.__getitem__(i)        XClass.__getitem__(x, i)
x[i] = y     x.__setitem__(i, y)     XClass.__setitem__(x, i, y)
del x[i]     x.__delitem__(i)        XClass.__delitem__(x, i)
y in x       x.__contains__(y)       XClass.__contains__(x, y)

This is how the CGI module produces a dictionary-like object containing the values of the form fields. If you want to support dictionary behaviour completely, you also need to implement get(), keys(), values(), and items().

Once you implement __getitem__(), it will be possible to use an instance of your class in a for loop (like for item in x). When a for loop executes over your instance, your instance will get a call to __getitem__(0), then __getitem__(1), then __getitem__(2), and so on until your __getitem__ method raises an IndexError.

Also, if you implement __getitem__ but not __contains__, Python will take care of the in operator for you. If someone asks y in x, it will have the effect of running a for loop over x, comparing y with each retrieved item.

You don't need to memorize all these special method names; you can always look them up in help under the topic SPECIALMETHODS, or on the Web (a Google search for "Python special methods" does the trick). There are many others that haven't been mentioned here. All you really need to know is that it is possible to create an object that completely emulates the behaviour of just about anything in Python, if you find yourself wanting to do so.

Here's the sixth assignment.

If you have any questions about these exercises or the assignment, feel free to send me e-mail at bczestyca.