Assignment 3 Part 1 (submitted by Michael on Feb 10, 15:34)

Beautiful Code
Ka-Ping Yee

Feb 13, 23:52 - Karl (line 1): syntax error, don't use ".py"

Feb 13, 23:53 - Karl (line 3): Syntax error, not sure if you can put that on the same line like that

Feb 13, 23:53 - Karl (line 5): Syntax error, need "speling."

Feb 13, 23:53 - Karl (line 16): Syntax error, need "speling." also line 20

Feb 13, 23:54 - Karl (line 23): I think you tried to say something like "count how many times each word occurs in misspelled" but that doesn't work (and would be an asymptotically inefficient algorithm)

Feb 13, 23:56 - Karl (line 19): If you use a list here, you should use append() instead of inserting at index 0. inserting at index 0 in what is basically an array is O(N) and makes the entire loop O(N^2) while appending is amortized O(1)

Feb 13, 23:57 - Karl (line 19): Using a dictionary instead of a list here would simplfiy lines 22-30, since it automatically gives you counting and sorting

Feb 13, 23:59 - Karl (line 23): It's a bad idea to reuse variables with different meanings. If you are trying to save memory (which doesn't really matter here) I think you can help Python garbage collect by setting old variables to None.

  1 import
  3 def misspell():  """returns a list of words which appear in Bush's speech but are not in our dictionary"""
  4     import re
  5     set_dictionary('dict.txt')
  6     input = open('input.txt')
  7     input =                       #input is now a string of words
  8     pat = re.compile('[\n(--).,?";!:/$1234567890]')   
  9     input = pat.sub(' ', input)                #input is now stripped of punctuation marks
 10     pat = re.compile('[.]')              #for words like U.S.
 11     input = pat.sub('', input)
 12     pat = re.compile('[1234567890]*\\w*]')       #for words like 20th
 13     input = pat.sub('', input)
 14     input = input.split()                      #input is now a sequence/list of words
 15     misspelled = []                            #misspelled is the list of misspelled words
 16     index = spellcheck_text(input, index)  #index is the index of the next misspelled word
 18     while index !=-1:                          #as long as there is one more misspelled word
 19         misspelled.insert(0, input[index])     #add the misspelled word into the misspelled words list 
 20         index = spellcheck_text(input, index + 1) #obtain the index of the next misspelled word
 22     misspelled.sort()
 23     misspelled = count(misspelled)            #misspelled is a dictionary of keys (words) and values (occurences)
 24     sortedmisspelled = []      
 25     for key in misspelled.keys():
 26         sortedmisspelled.append(key + ' (' + str(misspelled[key]) + ')' )   #formats the list of words and values
 27     sortedmisspelled.sort()                   #sorts the lists of words and values
 28     for element in sortedmisspelled:          #prints each word and its value on its own line
 29         print element
 30         print '\n'