For a long time, six has been my favourite small number. It's the first perfect number (it equals the sum of its divisors: 1, 2, and 3). I've always felt there was something magical about six.
This week we'll delve into a bit of the magic behind the object-oriented syntax you've already been using. Because objects are everywhere in Python, you've been using objects and calling methods in your programs. Now you'll create your own kinds of objects.
1. Shared State
Procedural programming consists of defining behaviours. This is what you have been doing so far by writing your own functions. Each function performs some operation, but operates in its own local namespace, apart from all other functions. Furthermore, each call to the function sees a fresh, new local namespace, unaffected by any other calls to the same function. The only way that functions can see things in common is to use outer namespaces, such as the global namespace of a module.
>>> x = 0 >>> def next(): ... global x ... x = x + 1 ... return x ... >>> next() 1 >>> next() 2 >>> next() 3 >>>
The above is an example of using the global
keyword
to allow reassignment of a global variable.
You need the global
keyword for reassigment,
but you don't need it just to refer to a global variable.
When you refer to a global variable,
you can still cause visible outside effects
using mutation, if the global variable has a mutable value.
>>> y = [1, 2, 3] >>> def more(): ... y.append(y[-1] + 1) ... >>> more() >>> y [1, 2, 3, 4] >>> more() >>> more() >>> more() >>> y [1, 2, 3, 4, 5, 6, 7] >>>
In this example, the function more()
is not changing what the variable y
refers to.
(The function can't do that, because y
is global
and there is no global y
declaration.)
The function is using y
to refer to the list,
and telling the list: "please change yourself".
A function can also maintain external state in a non-global scope. You can create other enclosing scopes, besides the global scope. This isn't done very often in most Python programs, but you should understand it.
>>> import random >>> def fountain(): ... numbers = range(5) ... def pick(): ... number = numbers.pop(0) ... numbers.append(number) ... return number ... return pick ... >>> getnumber = fountain() >>> getnumber() 0 >>> getnumber() 1 >>> getnumber() 2 >>> getnumber() 3 >>> getnumber() 4 >>> getnumber() 0 >>>
The above example defines a function within an inner scope,
then returns the inner function.
The inner scope is created when we call the outer function.
Because we only called fountain()
once,
there is only one variable named numbers
,
and every call to getnumber()
affects the same variable.
Q1.
Suppose the line
numbers = range(10)
were moved one line down, inside the pick()
function.
How could you exhibit a change in the behaviour of fountain()
?
Q2.
Suppose the line
numbers = range(10)
were moved one line up, outside the fountain()
function.
How could you exhibit a change in the behaviour of fountain()
?
If you have any difficulty understanding how to answer these two questions, please ask me for an explanation before proceeding.
2. Object Concepts
In object-oriented programming,
the behaviour and state are bound together.
An object is an encapsulation of behaviour and data.
The behaviour is specified for entire classes of objects at once,
whereas the data has to do with each individual object.
For example, the upper()
method
is defined to mean the same thing for all strings;
you do get different results, though, depending on what's in the string.
Although there are a few ways of binding a function together with a scope, as we just saw, Python has a specific convention for defining objects. The standard way is more convenient because it lets you bundle together potentially many functions and many pieces of data. To describe the behaviour of a class of objects, you define a class; the class contains methods that give the objects their behaviour. An object that belongs to a particular class is usually called an instance of the class.
Here is how you would write the first example, above, using a class instead of a global variable. There's a bunch of new stuff here, so it may look puzzling. It will all be explained soon; i just wanted you to see an early example so you have some idea what we're talking about.
>>> class Counter: ... def __init__(self): ... self.x = 0 ... def next(self): ... self.x = self.x + 1 ... return self.x ... >>> c = Counter() >>> c.next() 1 >>> c.next() 2 >>> c.next() 3 >>>
Calling the class by saying Counter()
produces a new instance of the Counter class
with the specified behaviour.
Then we can call methods on the instance object.
Admittedly, the class definition a little more verbose than the version with the global variable. But it's also a great deal more flexible. One major advantage is that we can now easily create many independent counters.
>>> d = Counter() >>> e = Counter() >>> d.next() 1 >>> d.next() 2 >>> c.next() 4 >>> e.next() 1 >>> d.next() 3 >>> e.next() 2 >>>
Before we get to the details of defining your own classes, we'll talk a bit about how Python's object system works.
3. Attributes
As you know, you can use the dot to retrieve an attribute of something, such as a variable from a module's namespace.
>>> import math >>> math.sin <built-in function sin> >>>
You can use the dot to retrieve attributes from an instance object this way.
>>> c = Counter() >>> >>> c.next() 1 >>> c.next() 2 >>> c.x 2 >>>
You can also assign to attributes of things.
For example, you can stash anything you want
into the namespace of a Counter
instance,
or into the namespace of a module.
>>> c.x 2 >>> c.x = 8 >>> c.next() 9 >>> c.blargh = 29 >>> print c.blargh 29 >>> c.ziggy Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: Counter instance has no attribute 'ziggy' >>> >>> math.blargh = 42 >>> print math.blargh 42 >>> math.ziggy Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'ziggy' >>>
However, you can't assign to attributes of most built-in types, because they're read-only.
>>> n = 5 >>> n.a = 6 Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: 'int' object has only read-only attributes (assign to .a) >>>
4. Bound Methods
You also use the dot to call a method on an object. Let's take a closer look at how method calls work.
You already know that you can take a string and call methods on it to produce results:
>>> refrain = 'And a partridge in a pear tree.' >>> refrain.upper() 'AND A PARTRIDGE IN A PEAR TREE.' >>> refrain.title() 'And A Partridge In A Pear Tree.' >>>
But you can also walk away with that method without calling it, just like you can manipulate functions without calling them:
>>> method = refrain.upper >>> method <built-in method upper of str object at 0x8150e28> >>> method() 'AND A PARTRIDGE IN A PEAR TREE.' >>>
So, when you call a method as in refrain.upper()
,
the effect is to first retrieve the attribute named
"upper" from the variable refrain
,
and then to take that retrieved result and call it like a function.
The retrieved method retains knowledge of the object
it's associated with;
we call it a bound method because it's bound to the object.
The representation that's displayed for the bound method, above,
hints at this association:
>>> method <built-in method upper of str object at 0x8150e28> >>> hex(id(refrain)) '0x8150e28'
(The hex()
function
simply displays a hexadecimal number.)
You can see here that the method
we retrieved above
remembers that it's associated with object 0x8150e28
,
which is the string in the refrain
variable.
Just like other functions, of course, methods can mutate things. Some methods mutate their objects and some don't.
>>> numbers = [1, 2, 3] >>> foo = numbers.append >>> foo <built-in method append of list object at 0x814fa4c> >>> foo(5) >>> foo(6) >>> numbers [1, 2, 3, 5, 6] >>>
Q3.
Suppose that
stuff = [1, 3, 5, 2, 7, 6, 8, 0]
and
indices = [1, 2, 4, 0]
.
Without the help of the computer,
figure out the result of
map(stuff.pop, indices).
Then call me over and tell me your answer.
There are many similarities between module lookups and instance lookups.
In both cases, a dot is used to retrieve an attribute.
When you say math.sin
,
you retrieve the function referenced by the sin
variable
within the math
module.
Functions retain their knowledge of the global namespace
in which they were defined,
which is how the next()
function
in the very first example remembers the next number to return.
When you say refrain.upper
you retrieve the value referenced by the upper
attribute
of the refrain
object,
and the method retains its association to the object.
The difference is that, once you define a module named math
,
there can only be one module named math
.
With a class, you can create as many objects you like,
all with the same behaviour but carrying different data.
5. Defining Classes
Now let's go back and take apart that example. Here it is again:
class Counter: def __init__(self): self.x = 0 def next(self): self.x = self.x + 1 return self.x
The first line says that we're defining a class called Counter
.
Class names always begin with a capital letter,
and the names usually consist of CapitalizedWordsSmushedTogether.
Within this class we've defined two methods:
__init__
and next
.
__init__
is a special method that gets run
immediately when an object is first created.
We call this method the constructor.
In this case, it takes the place of the line x = 0
that used to be outside the function, setting up the value of x
.
There are many other special methods, but we'll get to those later.
The convention is that any special method
has a name beginning with two underscores
and ending with two underscores,
and by "special" we mean that the method
is called automatically:
you don't name the __init__
method in an explicit call,
but it gets called anyway.
Both methods have an argument called self
.
And instead of manipulating x
,
we manipulate self.x
.
That's because we're working with a variable inside the object:
we call this an instance variable.
We say self
to identify the object namespace
in which the variable x
resides.
All the methods are passed the object instance as the first argument.
By convention, the first argument to any method
is always called self
(since it refers to the object itself).
In other languages you may have encountered,
the object may have a different special name.
For example, in C++ and Java the object is called this
.
Also, in other languages,
you don't have to identify the object namespace.
In C++ and Java, you can just name variables inside the object,
such as x
in this example,
and the compiler will guess that you are talking
about an instance variable.
The compiler can only do this because in C++ and Java,
classes must declare all of their instance variables.
In Python, you must explicitly refer to instance variables
by putting self.
in front of the variable name.
When we call Counter()
to create an instance,
the instance object begins with an empty namespace.
The __init__
method is called on the new instance,
where the instance is passed as the first argument,
and the arguments to Counter()
become the rest of the arguments.
(There are no arguments to Counter()
here,
so __init__
just gets one argument.)
Then __init__
sets the instance variable
self.x
to zero.
When we later call the next()
method,
the extra argument is again inserted in front.
We call the method with no arguments,
but the method gets the instance as the first argument.
The method then increments and returns self.x
.
The extra argument self
is inserted
in front of the argument list
for any method call on an instance.
This is quite apparent in some error messages;
be careful not to let it confuse you:
>>> c = Counter() >>> c.next(5) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: next() takes exactly 1 argument (2 given) >>>
The complaint here is that
next()
was expecting just one argument (self
),
but it got two arguments (self
and 5).
6. Unbound Methods
Classes are objects too, and they also have attributes. In particular, the methods are attributes of the class.
>>> c <__main__.Counter instance at 0x815891c> >>> c.next <bound method Counter.next of <__main__.Counter instance at 0x815891c>> >>> Counter.next <unbound method Counter.next> >>>
Counter.next
is an unbound method.
It's the method as you really defined it, with one argument,
and it's not associated with any particular instance.
When you retrieve the next()
method through an instance,
as when you say c.next
,
the unbound method is transformed into a bound method.
Now you understand what this "binding" does:
it merely holds on to the instance,
and inserts the instance as the first argument when making the call.
The following two calls have exactly the same effect:
Counter.next(c) c.next()
For the next few questions, have a look at this slightly enhanced version of the Counter class:
class FancyCounter: def __init__(self, start): self.x = start def next(self, inc=1): self.x = self.x + inc return self.x def set(self, value): self.x = value
Q4.
Adjust the class definition so we can construct a new counter
starting at zero when we call FancyCounter()
with no arguments.
Q5.
Add another argument to the constructor
that lets us set the default increment when we create a new instance.
This default increment would be used
if no increment is specified when we call next()
.
The default increment should be 1 if we construct an instance
of Counter with no arguments. Here's the expected behaviour:
>>> a = FancyCounter() >>> a.x 0 >>> a.next() 1 >>> a.next() 2 >>> a.next(3) 5 >>> a.next(3) 8 >>> b = FancyCounter(5, 7) >>> b.next() 12 >>> b.next() 19 >>> b.next(1) 20 >>> b.next(1) 21 >>> b.next() 28 >>>
Q6.
Adjust the set()
method
so it can be used to set the default increment as well as the value.
Q7.
In your final FancyCounter
class,
what are the minimum and maximum
number of arguments allowed for each of the following calls?
Assume that c
is an instance of FancyCounter.
FancyCounter(...) FancyCounter.__init__(...) FancyCounter.next(...) FancyCounter.set(...) c.__init__(...) c.next(...) c.set(...)
7. Common Special Methods
I mentioned earlier that there are many special method names. Here are two that are used particularly often.
As you know, when you enter an expression, the interpreter displays a representation of the result. The default representation for an instance object is pretty dull. It just shows the module name and class name of the instance, and the instance's unique identifier in hexadecimal:
>>> c = Counter() >>> c <__main__.Counter instance at 0x8156e54> >>>
You can control this display by defining a __repr__
method that returns a string.
Having a more meaningful representation can make debugging a lot easier.
Remember that the convention is for the representation
to show what you would have to type in to produce the value,
if possible.
So if your object is simple enough
that you could create an identical object
just by calling the constructor with some set of arguments,
the representation could show exactly how to construct the object.
For example:
class Counter: def __init__(self, start): self.x = start def __repr__(self): return 'Counter(%d)' % self.x def next(self, inc=1): self.x = self.x + inc return self.x
Only do this when you can regenerate the entire state of the object just from the constructor. For anything more complicated, you can just make a simplified explanation of what's in the object, and put that in angle brackets. (Angle brackets are the convention for representing something that can't be typed in.)
Whenever an instance object x
is converted to a string,
the x.__str__()
method is called.
Conversion to a string can be caused by an
explicit call to str(x)
,
or by formatting, as in '%s' % x
,
or just by printing, as in print x
.
Defining a __str__
method lets you control this.
If you define __repr__
but not __str__
,
conversion to a string will use __repr__
.
8. The Number Protocol
Some special methods can be used to implement your own behaviour for arithmetic operators. For a binary operator, the behaviour is determined by the left operand; if the left operand is an instance, a special method is called to determine the result.
Assuming that x
is an instance
of a class called XClass
,
the three expressions in each row are equivalent:
x + y x.__add__(y) XClass.__add__(x, y) x - y x.__sub__(y) XClass.__sub__(x, y) -x x.__neg__() XClass.__neg__(x) abs(x) x.__abs__() XClass.__abs__(x) x * y x.__mul__(y) XClass.__mul__(x, y) x ** y x.__pow__(y) XClass.__pow__(x, y) x / y x.__div__(y) XClass.__div__(x, y) x % y x.__mod__(y) XClass.__mod__(x, y)
Q8. Extra for experts (or for bored people). Define a class for complex numbers.
9. The Comparison Protocol
The special method __cmp__
defines how your object
compares to other objects.
It should return a number less than, equal to, or greater than zero
if the left operand is less than, equal to, or greater than the right operand.
This affects the result of all the comparison operators,
as well as the sorting behaviour when you call .sort()
on a list containing instance objects.
Assuming that x
is an instance
of a class called XClass
,
the three expressions in each row are equivalent:
x < y x.__cmp__(y) < 0 XClass.__cmp__(x, y) < 0 x > y x.__cmp__(y) > 0 XClass.__cmp__(x, y) > 0 x <= y x.__cmp__(y) <= 0 XClass.__cmp__(x, y) <= 0 x >= y x.__cmp__(y) >= 0 XClass.__cmp__(x, y) >= 0 x == y x.__cmp__(y) == 0 XClass.__cmp__(x, y) == 0 x != y x.__cmp__(y) != 0 XClass.__cmp__(x, y) != 0
(In recent versions of Python,
you can also define each of these operations separately.
The corresponding methods are called
__lt__
,
__gt__
,
__le__
,
__ge__
,
__eq__
, and
__ne__
.
Each one is expected to return a true or false value.
If these are defined, __cmp__
is not used.)
The default comparison behaviour for instances is just to compare the identifiers. This means that they won't sort in any meaningful order, though it will be consistent. It also means that, unless you define a special comparison method, comparing two instances for equality is the same as comparing their identities.
10. The Collection Protocol
You can also make your objects behave like sequences or dictionaries by responding to the usual operators.
Assuming that x
is an instance
of a class called XClass
,
the three expressions in each row are equivalent:
len(x) x.__len__() XClass.__len__(x) x[i] x.__getitem__(i) XClass.__getitem__(x, i) x[i] = y x.__setitem__(i, y) XClass.__setitem__(x, i, y) del x[i] x.__delitem__(i) XClass.__delitem__(x, i) y in x x.__contains__(y) XClass.__contains__(x, y)
This is how the CGI module produces a dictionary-like
object containing the values of the form fields.
If you want to support dictionary behaviour completely,
you also need to implement
get()
,
keys()
,
values()
, and
items()
.
Once you implement __getitem__()
,
it will be possible to use an instance of your class
in a for
loop (like for item in x
).
When a for
loop executes over your instance,
your instance will get a call to __getitem__(0)
,
then __getitem__(1)
,
then __getitem__(2)
,
and so on until your __getitem__
method
raises an IndexError
.
Also, if you implement __getitem__
but not __contains__
,
Python will take care of the in
operator for you.
If someone asks y in x
,
it will have the effect of
running a for
loop over x
,
comparing y
with each retrieved item.
You don't need to memorize all these special method names;
you can always look them up in help under
the topic SPECIALMETHODS
,
or on the Web (a Google search for "Python special methods" does the trick).
There are many others that haven't been mentioned here.
All you really need to know is that
it is possible to create an object
that completely emulates the behaviour of just about anything in Python,
if you find yourself wanting to do so.
Here's the sixth assignment.
If you have any questions about these exercises or the assignment, feel free to send me e-mail at bczestyca.