Assignment 3 Part 1 (submitted by Michael on Feb 10, 15:34)

Beautiful Code
Ka-Ping Yee

AUTHORS:   Adam   Calvin   Chris   David   Derek   Hunter   Jacob   Jason   Jun   Karl   Kevin   Michael   Morgan   Nadia   Nerissa   Omair   Peter   Peterson   Ping   Richard   Rosie   Scott   Thanh   Varun

Download this file.

COMMENTS

Feb 13, 23:52 - Karl (line 1): syntax error, don't use ".py"

Feb 13, 23:53 - Karl (line 3): Syntax error, not sure if you can put that on the same line like that

Feb 13, 23:53 - Karl (line 5): Syntax error, need "speling."

Feb 13, 23:53 - Karl (line 16): Syntax error, need "speling." also line 20

Feb 13, 23:54 - Karl (line 23): I think you tried to say something like "count how many times each word occurs in misspelled" but that doesn't work (and would be an asymptotically inefficient algorithm)

Feb 13, 23:56 - Karl (line 19): If you use a list here, you should use append() instead of inserting at index 0. inserting at index 0 in what is basically an array is O(N) and makes the entire loop O(N^2) while appending is amortized O(1)

Feb 13, 23:57 - Karl (line 19): Using a dictionary instead of a list here would simplfiy lines 22-30, since it automatically gives you counting and sorting

Feb 13, 23:59 - Karl (line 23): It's a bad idea to reuse variables with different meanings. If you are trying to save memory (which doesn't really matter here) I think you can help Python garbage collect by setting old variables to None.

Please log in if you would like to add comments.

   

  1 import speling.py
  2 
  3 def misspell():  """returns a list of words which appear in Bush's speech but are not in our dictionary"""
  4     import re
  5     set_dictionary('dict.txt')
  6     input = open('input.txt')
  7     input = input.read()                       #input is now a string of words
  8     pat = re.compile('[\n(--).,?";!:/$1234567890]')   
  9     input = pat.sub(' ', input)                #input is now stripped of punctuation marks
 10     pat = re.compile('[.]')              #for words like U.S.
 11     input = pat.sub('', input)
 12     pat = re.compile('[1234567890]*\\w*]')       #for words like 20th
 13     input = pat.sub('', input)
 14     input = input.split()                      #input is now a sequence/list of words
 15     misspelled = []                            #misspelled is the list of misspelled words
 16     index = spellcheck_text(input, index)  #index is the index of the next misspelled word
 17 
 18     while index !=-1:                          #as long as there is one more misspelled word
 19         misspelled.insert(0, input[index])     #add the misspelled word into the misspelled words list 
 20         index = spellcheck_text(input, index + 1) #obtain the index of the next misspelled word
 21 
 22     misspelled.sort()
 23     misspelled = count(misspelled)            #misspelled is a dictionary of keys (words) and values (occurences)
 24     sortedmisspelled = []      
 25     for key in misspelled.keys():
 26         sortedmisspelled.append(key + ' (' + str(misspelled[key]) + ')' )   #formats the list of words and values
 27     sortedmisspelled.sort()                   #sorts the lists of words and values
 28     for element in sortedmisspelled:          #prints each word and its value on its own line
 29         print element
 30         print '\n'