Download this file.
Feb 13, 23:52 - Karl (line 1): syntax error, don't use ".py"
Feb 13, 23:53 - Karl (line 3): Syntax error, not sure if you can put that on the same line like that
Feb 13, 23:53 - Karl (line 5): Syntax error, need "speling."
Feb 13, 23:53 - Karl (line 16): Syntax error, need "speling." also line 20
Feb 13, 23:54 - Karl (line 23): I think you tried to say something like "count how many times each word occurs in misspelled" but that doesn't work (and would be an asymptotically inefficient algorithm)
Feb 13, 23:56 - Karl (line 19): If you use a list here, you should use append() instead of inserting at index 0. inserting at index 0 in what is basically an array is O(N) and makes the entire loop O(N^2) while appending is amortized O(1)
Feb 13, 23:57 - Karl (line 19): Using a dictionary instead of a list here would simplfiy lines 22-30, since it automatically gives you counting and sorting
Feb 13, 23:59 - Karl (line 23): It's a bad idea to reuse variables with different meanings. If you are trying to save memory (which doesn't really matter here) I think you can help Python garbage collect by setting old variables to None.
Please log in if you would like to add comments. | |
1 | import speling.py | 2 | | 3 | def misspell(): """returns a list of words which appear in Bush's speech but are not in our dictionary""" | 4 | import re | 5 | set_dictionary('dict.txt') | 6 | input = open('input.txt') | 7 | input = input.read() #input is now a string of words | 8 | pat = re.compile('[\n(--).,?";!:/$1234567890]') | 9 | input = pat.sub(' ', input) #input is now stripped of punctuation marks | 10 | pat = re.compile('[.]') #for words like U.S. | 11 | input = pat.sub('', input) | 12 | pat = re.compile('[1234567890]*\\w*]') #for words like 20th | 13 | input = pat.sub('', input) | 14 | input = input.split() #input is now a sequence/list of words | 15 | misspelled = [] #misspelled is the list of misspelled words | 16 | index = spellcheck_text(input, index) #index is the index of the next misspelled word | 17 | | 18 | while index !=-1: #as long as there is one more misspelled word | 19 | misspelled.insert(0, input[index]) #add the misspelled word into the misspelled words list | 20 | index = spellcheck_text(input, index + 1) #obtain the index of the next misspelled word | 21 | | 22 | misspelled.sort() | 23 | misspelled = count(misspelled) #misspelled is a dictionary of keys (words) and values (occurences) | 24 | sortedmisspelled = [] | 25 | for key in misspelled.keys(): | 26 | sortedmisspelled.append(key + ' (' + str(misspelled[key]) + ')' ) #formats the list of words and values | 27 | sortedmisspelled.sort() #sorts the lists of words and values | 28 | for element in sortedmisspelled: #prints each word and its value on its own line | 29 | print element | 30 | print '\n' |
|