Threeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee.
Make sure you find your partner before starting this. You and your partner will need to arrange a time to meet before the next homework assignment is due.
1. Optional and Keyword Arguments
This is a feature of function calls in Python that isn't present in most other programming languages. When you define a function, the arguments have names:
>>> def parrot(first, second, third): ... print first, second, third ... >>>
You can use those names when you call the function, to specify the arguments in any order you want. You simply write the argument name with an equals sign preceding the argument:
>>> parrot(1, 2, 3) 1 2 3 >>> parrot(first='we', third='pythons', second='like') we like pythons >>>
This can help make your program more resilient to change when you're using a function that has lots of arguments. Someone else can rearrange the arguments and your program will still work.
You can use the same syntax when defining a function to specify an optional argument. The value after the equals sign is the default value that the argument will get if the caller doesn't specify that argument. You can have as many optional arguments as you like, but they must go at the end of the list.
In the following example, the exponent
argument
is optional and defaults to 2.
>>> def root(x, exponent=2): ... return x ** (1.0 / exponent) ... >>> root(4) 2.0 >>> root(5) 2.2360679774997898 >>> root(5, 3) 1.7099759466766968 >>>
These two features are very handy together when you're describing an operation that has many small options. For example, if you were writing a drawing program, it might have a function like this:
def draw_circle(x, y, radius=100, thickness=1, colour='black', fill='white'): ...
Then you could call this function as simply
draw_circle(200, 200)
,
or as
draw_circle(200, 200, colour='red')
,
or as
draw_circle(200, 200, radius=50, fill='blue')
,
depending how specific you wanted to be.
The default values of the arguments can be expressions. The expressions are evaluated and wrapped up in the function at the time the function is defined, not when it is called.
2. Iteration
The for
keyword lets you run a loop
that iterates over the elements of any sequence.
>>> for c in 'abcd': ... print c ... a b c d >>> for item in [3, [], 'ooga', 5.7]: ... print item ... 3 [] ooga 5.7 >>>
In the first example above, we're iterating over the four elements of a string. (Each character is a string of length 1.) In the second example, we're iterating over the four elements of a list.
We can also use for
to iterate over the lines of a text file.
Files are opened using the built-in open
function.
Suppose the file foo.txt
contains:
Ah. I'd like to have an argument, please. Certainly sir. Have you been here before? No, I haven't, this is my first time. I see. Well, do you want to have just one argument, or were you thinking of taking a course?
Then we could read the contents like this:
>>> file = open('foo.txt') >>> for line in file: ... print line ... Ah. I'd like to have an argument, please. Certainly sir. Have you been here before? No, I haven't, this is my first time. I see. Well, do you want to have just one argument, or were you thinking of taking a course? >>> line 'or were you thinking of taking a course?\n' >>>
Notice a couple of things here.
The printed output is double-spaced
because each line we read from the file
is a string with a newline character ('\n'
) on the end.
The print
statement adds its own newline character
at the end, so we get an extra blank line.
Also notice that the variable line
retains its last value after the for
loop has ended.
You can see from the above that it contains a trailing newline character.
Iterating with a for
statement
always requires that you have a sequence to iterate over.
So, in order to run a loop to count numbers,
we have to generate a sequence of numbers.
The range()
function takes care of this for us.
This function can take one, two, or three arguments:
range(end)
produces the list of integers from0
toend-1
.range(start, end)
produces the list of integers fromstart
toend-1
.range(start, end, step)
produces the list of integers starting fromstart
, incrementing bystep
each time, not including any numbers beyondend-1
.
>>> range(5) [0, 1, 2, 3, 4] >>> range(3, 8) [3, 4, 5, 6, 7] >>> range(3, 30, 7) [3, 10, 17, 24] >>> range(20, 10, -1) [20, 19, 18, 17, 16, 15, 14, 13, 12, 11] >>>
Q1.
What happens if you ask for the range()
of a single negative number?
Q2.
What happens if the start
argument
is bigger than the end
argument?
Q3. What happens if some of the arguments contain fractions?
When you get to this point, get out of your chair, find someone else in the room, and ask them if they have any Grey Poupon. No, really. I mean it. Take a break.
3. List methods
You've already seen the .append()
method
on lists: it alters the list in place, adding one element on the end.
You can use dir()
to discover the other methods on lists:
>>> dir([])
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__delslice__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__',
'__init__', '__le__', '__len__', '__lt__', '__mul__', '__ne__',
'__new__', '__reduce__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__setslice__', '__str__', 'append', 'count', 'extend',
'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
>>>
You can ignore all of the names that start and end with underscores,
for now. Those are all special methods.
(Special methods are methods that are called automatically
instead of by name. For example, some of them correspond
to operators: x + y
corresponds to calling
x.__add__(y)
. But we won't worry about this until later.)
The ones we care about are the nine "normal" methods:
x.append(item)
adds a single item onto the end of listx
, increasing the length ofx
by 1.
x.count(y)
tells you how many timesy
occurs in the listx
.
x.extend(items)
expects another sequence as the argument, and appends all the elements of that sequence onto the end ofx
.
x.index(y)
tells you the first position at whichy
occurs as an element ofx
.
x.insert(index, item)
inserts an item into the listx
at a particular index, and shoves everything after it to the right.
x.pop(index)
removes a single item from the listx
, and returns it.
x.remove(y)
finds the first occurrence ofy
in the list and removes it.
x.reverse()
reverses the listx
in place.
Andx.sort()
sorts the listx
in place.
Give them all a whirl.
Q4.
What happens if you try to insert
at a negative index?
Q5.
What happens if you try to append
a list to itself?
Q6.
What happens if you try to extend
a list with itself?
Q7. Write a little function to find the median of a list of numbers.
4. String methods
Strings also have a lot of methods, which will come in handy
as you're doing the assignment. Again, you can get a list of them
using dir()
:
>>> dir('') ['__add__', '__class__', '__contains__', '__delattr__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__repr__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'replace', 'rfind', 'rindex', 'rjust', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper'] >>>
Here are the most commonly used ones:
x.strip()
produces a new string by removing all the whitespace characters (spaces, tabs, newlines) from the beginning and end ofx
. The similar methodx.lstrip()
strips spaces only off the beginning, andx.rstrip()
strips spaces only off the end.
x.split()
produces a list of words by splittingx
on whitespace. Any whitespace at the beginning and end of the string is ignored.
x.split(substring)
splits the stringx
on every occurrence ofsubstring
.
x.join(list)
takes a list of strings, and joins them all together with repetitions ofx
. If the list has length n, then n - 1 copies ofx
will be used to join it together.
x.replace(old, new)
replaces every occurrence ofold
withnew
, and returns the new string.
x.lower()
andx.upper()
change all the characters to lowercase or uppercase.
x.startswith(string)
andx.endswith(string)
compare the beginning or ending part ofx
to a given string.
x.find(substring)
searches for a substring within the stringx
, and returns the index of the first occurrence. It returns -1 if the substring is not found.x.rfind(substring)
searches from right to left, returning the last occurrence. Both these methods also accept an optional second argument, an index at which to start looking.
Try them out.
There is also an operator called in
that you can use to test if something is a member of a sequence.
>>> 'a' in 'abc' 1 >>> 'd' in 'abc' 0 >>> 3 in [1, 2, 3] 1 >>> 3 in [1, 2, 4] 0 >>>
Curiously enough,
the opposite of in
is not in
.
>>> 3 not in [1, 2, 4]
1
>>>
Q8. Write a little function that takes a sentence and converts it (very simplistically) into Pig Latin. Each word that begins with a vowel should have "way" appended in the resulting sentence; each word that begins with a consonant should have the consonant moved to the end of the word and "ay" appended after that. Don't worry about punctuation, combined consonants, or capital letters; just handle these two simple cases:
>>> def piglatin(sentence): ... <you fill this in> ... >>> piglatin(' ethel the aardvark goes quantity surveying.') 'ethelway hetay aardvarkway oesgay uantityqay urveying.say' >>>
When you get here, find someone you haven't met yet and show them your Pig Latin program.
5. Dictionaries
The dictionary type in Python is a totally new type of collection. Like lists, dictionaries contain things, but they don't order them in sequence. Instead, dictionaries contain associations between pairs of things. Each pair consists of a key and a value. You look up things in a dictionary by providing the key, and the dictionary returns the value.
To write a dictionary, you use curly braces, and join each key-value pair with a colon.
>>> d = {'a': 'aardvark', 'b': 'balloon'} >>> print d {'a': 'aardvark', 'b': 'balloon'} >>> len(d) 2 >>> d['a'] 'aardvark' >>> d['b'] 'balloon' >>> d['c'] Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: c >>> d['x'] = 'xylophone' >>> print d {'a': 'aardvark', 'b': 'balloon', 'x': 'xylophone'} >>> d['x'] 'xylophone' >>>
As you can see from the example, len()
returns
the number of pairs in the dictionary.
You use square brackets to look up things in a dictionary,
just like a list, except the thing inside the square brackets
is a key instead of a numeric index.
You also use square brackets to add a new pair to a dictionary,
like we did with the key 'x' and the value 'xylophone'.
There will be more to say about dictionaries later, but for now you only need to know two methods:
>>> d.keys() ['a', 'b'] >>> print d.get('c') None >>> print d.get('c', 'flugelhorn') flugelhorn >>>
The keys()
method returns a list of the keys
in a dictionary, which is helpful for getting at the contents.
The keys will be returned in no particular order.
Printing a dictionary also prints the key-value pairs
in no particular order.
The get()
method provides you a safe way of
looking up a value, when you don't know whether the key is present.
If you simply ask to get(x)
and x
is
not a key in the dictionary, you will get None
.
You can also specify a second argument to get,
which will be the default value returned if the key is missing.
You can also test to see if a key is in a dictionary
using the in
operator.
Note that this only checks for the existence a key;
it doesn't check for a value.
>>> 'a' in d 1 >>> 'c' in d 0 >>> 'aardvark' in d 0 >>> for key in d: print key ... a b >>>
The looping statement for x in d
loops over the keys.
It does the same thing as for x in d.keys()
.
Q9. Write a little function that will count how many times each element occurs in a sequence. The result should be a dictionary; in each pair, the key should be one of the elements, and the value should be the number of times it occurred.
>>> count([3,7,6,5,5,6,7,3,5]) {3: 2, 5: 3, 6: 2, 7: 2} >>> count('bcbabbebcbbabeba') {'a': 3, 'c': 2, 'b': 9, 'e': 2} >>>
6. Regular Expressions
The re
module lets you search for more interesting
patterns in strings, using regular expressions we described in class.
A regular expression is a string that specifies a pattern for matching against other strings. Regular expressions use a special syntax (for example, to allow for wildcards).
Here are some of the common constructs in regular expressions:
.
matches any character
spam
matches the exact string "spam"
[abc]
matches the charactera
,b
, orc
[a-m]
matches any lowercase letter froma
tom
[^aeiou]
matches any character except for a lowercase vowel
You can specify repetitions or optional parts using the following operators:
x*
matches zero or any number of repetitions ofx
x?
matches zero or one occurrence ofx
x+
matches at least one repetition ofx
To make these operators apply to more than one character, you group parts of the expression in parentheses:
abc+
matchesab
followed by at least one repetition ofc
(abc)+
matches at least one repetition ofabc
You can also combine parts of the expression:
spam|eggs
matchesspam
oreggs
Finally, you can specify whether your pattern has to occur at the beginning or end of the string:
^pres
will match any string that starts withpres
ing$
will match any string that ends withing
There are more features and operators in regular expressions, but these should suffice for now. Here are some examples.
[aeiou]+
will match the entire stringeeeee
[aeiou]+
will match the first three letters of the stringauireu
[aeiou]+$
will match the last two letters of the stringauireu
^[aeiou]+$
will not match the stringauireu
[aeiou]+
will match the last two letters of the stringzoo
^[aeiou]+
will not match the stringzoo
To use regular expressions in a program,
you must import the re
module
and use it to compile your regular expressions.
This will produce a pattern object.
The pattern object has a method, search()
,
that you can then call to search a string for a match.
Here are some of the above examples in Python:
>>> import re >>> pat = re.compile('[aeiou]+') >>> pat.search('eeeee') <_sre.SRE_Match object at 0x81c2b90> >>> pat.search('auireu') <_sre.SRE_Match object at 0x8193b38> >>>
The weird-looking SRE_Match
object
represents the results of the match:
>>> match = pat.search('auireu') >>> match.start() 0 >>> match.end() 3 >>> match.group() 'aui' >>> match = pat.search('zoo') >>> match.start() 1 >>> match.end() 3 >>> match.group() 'oo' >>>
The group()
method tells you what part matched,
and the start()
and end()
methods
return its position in the string.
The search()
method on a pattern will return
None
if there is no match.
>>> pat = re.compile('^[aeiou]+') >>> print pat.search('zoo') None
Patterns also have a method called
sub(replacement, string)
that will find all occurrences of the pattern
in the string and substitute in a replacement.
>>> pat = re.compile('[aeiou]+') >>> pat.sub('', 'zozoozieiou') 'zzz' >>> pat.sub('ee', 'zozoozieiou') 'zeezeezee' >>>
The replacement string can refer to what was matched in the original string. Every time you use a pair of parentheses to group together part of a regular expression, that part is called a group. When a pattern search is performed, the part of the original string that matches each group is saved in the match object. The first left-parenthesis starts group 1, the second left-parenthesis starts group 2, and so on.
Wherever the replacement string contains a backslash followed by a number, the result will substitute a copy of the group referenced by that number. An example will probably help make this clearer:
>>> pat = re.compile('(a..) (b..)') >>> pat.sub('\\2 \\1', 'art bat cat ack bop') 'bat art cat bop ack' >>>
In the above example, the pattern looks for a three-letter word starting with 'a', followed by a space, followed by a three-letter word starting with 'b'. The pattern matches the string "art bat cat ack bop" twice: it matches "art bat" and also matches "ack bop". Both of these matches are replaced in the result. The replacement string is the second group, followed by a space, followed by the first group. So the effect is to swap the two words "art" and "bat", and to swap the two words "bop" and "ack".
The special pattern '\b' matches the boundary at the beginning or end of a word, and the special pattern '\w' matches a character in a word (either a letter or a number).
Given this, you can do the Pig Latin transformation of a sentence in just two steps. First, you handle all the words that start with vowels; then you handle all the words that start with consonants. Here's the first step:
>>> aardvark = 'ethel the aardvark goes quantity surveying.') >>> >>> pat = re.compile('\\b([aeiouAEIOU]\\w*)') >>> pat.sub('\\1way', aardvark) 'ethelway the aardvarkway goes quantity surveying.' >>>
Q10. Try writing the regular expression and substitution for the second step, words that start with consonants.
Q11. (Optional, open-ended.) Adjust your regular expression to handle more interesting cases, like words that start with "tr" or "qu".
I know regular expressions are kind of hairy-looking. Please feel free to ask me about them if you find them confusing.
Onward to the third assignment.
If you have any questions about these exercises or the assignment, feel free to send me e-mail at bczestyca.