Beautiful Code, Spring 2003: Explore 5

This week, we shift from learning the language into a particular application: CGI scripting.

1. Putting Pages on the Web

All of you said that you'd had some experience with HTML, so you'll be putting that to good use here.

On the EECS machines, you put web pages in the public_html directory. In order for the visitors to see your pages, you have to make sure this directory is world-readable and world-executable. Doing chmod 755 public_html will take care of this.

cory% mkdir public_html
cory% chmod 755 public_html
cory% ls -ald public_html
drwxr-xr-x   2 cs198-aa cs198        512 Feb 18 19:32 public_html/
cory%

If your account name is cs198-xx and you put a page called foo.html in this directory, then you will be able to view it using the URL:

http://inst.eecs.berkeley.edu/~cs198-xx/foo.html

2. The Common Gateway Interface

CGI is simply a standard for allowing web servers to call programs (instead of only serving static files on the disk). The basic idea is that when you request a certain kind of file (a CGI script), the web server executes it instead of sending the file over to the client. If the output from the CGI script is correctly formatted, it appears in the user's web browser.

By the output we mean any text that the script prints on standard output (the Python print command does this). In order to be correctly formatted, the output must begin with some headers, followed by a blank line, and then the content of the file to deliver. Each header is a line starting with a header name (which may not contain spaces, but may contain hyphens), a colon, and a space. Most of the headers are optional, but if any content is sent, the Content-Type header is mandatory. This header specifies the MIME media type of the content. For HTML this is text/html.

Putting this all together gives us a simple CGI script:

#!/usr/bin/env python

print 'Content-Type: text/html'
print
print '<h1>Hello!</h1>'

For a file to work as a CGI script on the EECS web server, the filename must end in .cgi and the file must be world-executable. The first line causes the script to be run with Python.

Try putting the above text into a file called hello.cgi in your public_html directory. Set the permissions with chmod 711 hello.cgi. Now you should be able to go to the following URL in your web browser:

http://inst.eecs.berkeley.edu/~cs198-xx/hello.cgi

Congratulations. You've written your first Python CGI script.

3. Error Handling

There are a few different ways that a CGI script can fail:

The script file doesn't have an acceptable name.
The script file is not executable.
The directory containing the script file is not executable.
The first line doesn't start with #! and the name of a program.
The program named in the first line doesn't exist or won't run.
The script has incorrect syntax.
The script fails to produce a Content-Type line.
The script fails to produce a blank line after the headers.
The script encounters an error while running.

You may want to keep this checklist handy so you can go through it when you have a problem.

In cases 1 through 6, the problem prevents the script from even starting. You will get an error message from the server like "Permission Denied" or "Forbidden" or "Internal Server Error". Or you might just get the source code of the script dumped on you.

To avoid problem 6, it's a good idea to check your script from the command line. Just run it by typing python hello.cgi. Python will tell you if there's a syntax error.

In case 7 you might get an "Internal Server Error" or you might see the output from the script as text instead of HTML.

In case 8 you'll definitely get a server error.

Case 9 has to do with the logic of your program, and is by far the most complex and challenging type to solve.

Usually, when you run a Python program and it encounters an error, you'll see a traceback displayed on your terminal with information about what kind of error occurred, and where it occurred in the program. However, if you have an error in a CGI script, the error message has nowhere to go.

Try inserting the line print x before the last print in your script. If you now visit your script in a Web browser, you'll see that the output disappears, but there's no indication of an error. The program just stops when it hits the error and you don't see anything after that.

To help you diagnose these problems, there is a module called cgitb that will display these tracebacks more nicely. Take your altered script and add these two lines at the top, after the #! line:

import cgitb
cgitb.enable()

Now if you try visiting the page again, you'll see a detailed explanation of the error in your Web browser.

It's generally a good idea to use this module whenever you're developing CGI scripts. The information it provides about errors can save you a lot of time.

When you are doing a real production and don't want users to see dumps of your source code, you can turn off cgitb, or you can have it save the error reports in files instead. For example, the command

cgitb.enable(display=0, logdir='/tmp')

tells cgitb not to display error reports in the Web browser, and to store them in files in /tmp instead.

Q1. Write a CGI script that displays the current time. Have a look at the time module for functions that will help you get the time (import time, then use help(time) to examine the module).

Q2. Write a CGI script that displays a chess board by generating the HTML for a table with eight rows and eight columns. To make the squares black and white, set the background colour of each cell (for example, <td bgcolor="#000000">). You can put whatever you want in each cell to fill it out (such as your initials, or a couple of stars, etc.). Produce each row by adding together eight cells and then putting <tr> and </tr> tags around the row; produce the table by adding together all the rows and putting <table> and </table> tags around the entire table. Save this script as chess.cgi in your public_html directory.

When you have completed Q2, pronounce this place name: Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch. Then i'll come over and look at your chess board.

4. Forms

The main way that you make CGI scripts interactive is to accept input using HTML forms. A form in HTML is enclosed with <form> and </form> tags. The starting <form> tag should have an attribute named action that gives the URL to which the form input will be sent.

The purpose of the HTML form is to let the user enter the values for some form fields. Each field has a name and a string value. When the user submits the form, your CGI script gets all these field names and values.

The form can contain text and normal HTML tags, as well as various input elements for the form fields, most notably <input>. Every form should have a submit button (made with <input type=submit>) so that the user can submit the form.

I won't go through all of the form elements here. There's a pretty good overview of forms at w3schools.com. All we're going to use here are text fields, but you are welcome to get as fancy as you want.

Let's suppose we wanted to provide a Web page that would calculate the square of any number you entered. Here's a simple example of an HTML form:

<form action="square.cgi">
Please enter a number:
<input type=text name=number>
<input type=submit value="Okay.">
</form>

Now you are probably wondering how a Python script receives the input from a form. The cgi module has some nice utilities to take care of this for you. There are a few ways, but one that i like is called SvFormContentDict(). This function returns a dictionary mapping the field names to their string values. It's really an object that behaves like a dictionary, not an actual dictionary, but most of the things you would do with a dictionary work fine.

If you were to display the above form, and the user entered "3" and pressed the "Okay" button, the CGI script named square.cgi would get executed. Calling SvFormContentDict() in that script would yield a dictionary containing {'number': '3'}. So a script for square.cgi might look like this:

#!/usr/bin/env python

import cgitb                      # Always remember to do this first.
cgitb.enable()

import cgi
form = cgi.SvFormContentDict()

x = int(form['number'])           # Values are strings, so we need to convert.

print 'Content-Type: text/html'
print
print 'The square of', x, 'is', x*x, '.'

Q3. Create a little HTML form that will call your chessboard script. The form should let the user enter a number. Adjust your chess.cgi so it will produce a chessboard of the given size.

5. Redirection

Sometimes it's useful to be able to redirect the user to another URL. This is easy; just print a Location header with the URL you want the user to follow. In order for the Location line to be understood as a header, your program must not print anything before this line.

print 'Location: http://zesty.ca/'
print

Of course, your program could calculate this URL in any way you want, instead of just printing a static address.

Remember to include the blank line afterwards. Even if there is no content, you must print the blank line.

Q4. The random module provides a handy function, random.choice(), that picks a random item from a list. Choose some news websites that you like, then use this function to create a CGI script that redirects the user to one of the news sites at random.

6. Persistence

A drawback of CGI scripts is that, when you load the Web page, your program runs once and quits. Each time you submit a form or click on a link to your script, your script starts fresh from the beginning. This makes it harder to write programs that remember what to do as you go from page to page.

To keep data around between runs, you can save it on the disk. Writing to files is pretty straightforward: first you open a file for writing with the open() function, passing 'w' as the second argument.

file = open('whatever.txt', 'w')

Then you call the write() method on the file with a string to write to the file.

file.write('lovely spam')

Finally you close the file using the close() method.

file.close()

This would rewrite the file whatever.txt, replacing whatever it previously contained with the string lovely spam. Alternatively, if you want, you can open the file with 'a' (for "append") as the second argument. If you do this, anything you write() to the file will be appended to the end.

What if the data you want to store isn't a string, or if you want to store more than one string? Python has a useful module called pickle that will help you store data of practically any type. When you call pickle.dump(data, file), it will encode the data into a string for you and save it in the file, just like a pickle jar. Later you can open the file and ask it to reconstruct the original data with pickle.load(). You can call dump() more than once to dump many things into a file, or use dump() to append items to the end of a file. The objects you dumped into the file will come back in the same order when you call load().

>>> import pickle
>>> data = {3: 8, 'a': [1, 3.2, None], ('pq', 'g'): (1, (2, (3,)))}
>>> data
{'a': [1, 3.2000000000000002, None], 3: 8, ('pq', 'g'): (1, (2, (3,)))}
>>> data.keys()
['a', 3, ('pq', 'g')]
>>>
>>> file = open('/tmp/test.pkl', 'w')
>>> pickle.dump(data, file)
>>> file.close()
>>>
>>> import os
>>> os.path.getsize('/tmp/test.pkl')
101
>>>
>>> file = open('/tmp/test.pkl')
>>> file.read()
"(dp0\nS'a'\np1\n(lp2\nI1\naF3.2000000000000002\naNasI3\nI8\ns(S'pq'\np3\nS'g'\np4\ntp5\n(I1\n(I2\n(I3\ntp6\ntp7\ntp8\ns."
>>>
>>> file = open('/tmp/test.pkl', 'a')
>>> pickle.dump(-5, file)
>>> file.close()
>>>
>>> file = open('/tmp/test.pkl')
>>> pickle.load(file)
{'a': [1, 3.2000000000000002, None], 3: 8, ('pq', 'g'): (1, (2, (3,)))}
>>> pickle.load(file)
-5
>>> pickle.load(file)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.2/pickle.py", line 977, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.2/pickle.py", line 592, in load
    dispatch[key](self)
  File "/usr/lib/python2.2/pickle.py", line 606, in load_eof
    raise EOFError
EOFError
>>>

"EOF" stands for "end of file". After both objects have been loaded from the file, trying to load another object fails because there's nothing more to read, so we get an EOFError above.

I've tried to keep the lab a little shorter today because the exercises may involve debugging. CGI is a pretty wide open topic, so you can delve into that as much as you like. In the meantime, here's the fifth assignment.

If you have any questions about these exercises or the assignment, feel free to send me e-mail at bczestyca.