来自.itemgetter的奇怪输出，用于按值python进行列表排序

Weird output from .itemgetter for list sorting by values python

所以我正在研究google python代码类，并尝试做单词"u count.py"练习。目的是创建一个按字数(值)排序的单词字典(key)，并将其作为元组返回以供打印。

我创建了一个助手函数来创建字典：

1
2
3
4
5
6
7
8
9
10
11

def dict_creator(filename): #helper function to create a dictionary each 'word' is a key and the 'wordcount' is the value
input_file = open(filename, 'r') #open file as read
for line in input_file: #for each line of text in the input file
words = line.split() #split each line into individual words
for word in words: #for each word in the words list(?)
word = word.lower() #make each word lower case.
if word not in word_count: #if the word hasn't been seen before
word_count[word] = 1 #create a dictionary key with the 'word' and assign a value of 1
else: word_count[word] += 1 #if 'word' seen before, increase value by 1
return word_count #return word_count dictionary
word_count.close()

我现在正在使用本文中概述的.itemgetter方法创建按值排序的字典：link。这是我的代码：

1
2
3
4
5
6
7
8
9

def print_words(filename):
word_count = dict_creator(filename) #run dict_creator on input file (creating dictionary)
print sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True)
#print dictionary in total sorted descending by value. Values have been doubled compared to original dictionary?
for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
#create sorted list of tuples using operator module functions sorted in an inverse manner
a = word
b = word_count[word]
print a, b #print key and value

但是，当我在测试文件和较小的文件上运行代码时，它会抛出一个键错误(如下所示)。

1
2
3
4
5
6

Traceback (most recent call last):
File"F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 74, in <module>
print_words(lorem_ipsum) #run input file through print_words
File"F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 70, in print_words
b = word_count[word]
KeyError: ('in', 3)

我已经打印了原始字典和已排序的字典，并且似乎字典排序后所有值都翻了一番。我查看了与此类问题相关的几个线程，并查看了.itemgetter文档，但是我似乎找不到其他有类似问题的人。

有人能指出是什么导致我的代码在word_count函数中第二次迭代字典，这会导致值的增加吗？

谢谢！

某人

(1)你没有在dict_creator中定义word_count。我本来想看看

1	word_count = {}

开始时。这意味着，无论word_count的变化是在其他地方和全球范围内定义的，所以每当您调用dict_creator时，它都会添加到同一个word_count字典中，从而增加值。您只有一个word_count，至少从您所显示的代码来看是这样的。

(2)对于键错误：

1
2
3
4

for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
#create sorted list of tuples using operator module functions sorted in an inverse manner
a = word
b = word_count[word]

iteritems()返回元组，因此word已经类似于('dict_creator', 1)。你只需按原样打印即可。调用word_count[word]试图使用(key, value)的元组作为键。你看，尽管你已经叫了变量词，它实际上是word_and_count，和word, count = word_and_count。

(3)在本部分中：

1 2	return word_count #return word_count dictionary word_count.close()

我认为你的意思是input_file.close()，但是在你返回之后关闭文件是没有意义的，因为那一行不会被执行。另一种选择是使用with习语：

1
2
3

with open(filename) as input_file:
code_goes_here = True
return word_count

在这里，文件将自动关闭。

在进行上述更改之后，您的代码似乎对我有用。