Stopwords Removal with Python
我不明白为什么这个代码不起作用。当我单击运行时,它会说"删除停止字后:无"。有人能帮助解决这个问题吗?非常感谢。
1 2 3 4 5 6 7 8 | stop_words = ["the","of","a","to","be","from","or"] last = lower_words.split() for i in stop_words: lastone = last.remove(i) print" AAfter stopwords removal: ",lastone |
因此,当您执行
对于您要做的事情,您可能希望删除
1 2 3 | stop_words = set(["the","of","a","to","be","from","or"]) last = lower_words.split() last = [word for word in last if word not in stop_words] |
将
为了完整起见,下面是您需要如何处理
1 2 3 4 5 6 7 8 | stop_words = ["the","of","a","to","be","from","or"] last = lower_words.split() for word in stop_words: try: while True: last.remove(word) except ValueError: pass |
号
这是一个接收文本并返回不带停止字的文本的函数。它通过忽略字典中的每一个单词来实现它的目标。我对每个单词i使用.lower()函数,因为大多数stopwords包都是小写字母,但我们的文本可能不是。
1 2 3 4 5 6 7 8 9 10 | def cut_stop_words(text,stopwords): new_text= '' for i in text.split(): if (i.lower()) in stopwords: pass else: new_text= new_text.strip() + ' ' + i return new_text |