How to split a string into a list?
我希望我的python函数分割一个句子(输入)并将每个单词存储在一个列表中。我当前的代码将句子拆分,但不将单词存储为列表。我该怎么做?
1 2 3 4 5 6 7 8 9 10 | def split_line(text): # split the text words = text.split() # for each word in the line: for word in words: # print the word print(words) |
1 | text.split() |
这应该足以将每个单词存储在一个列表中。
第二,这可能是一个打字错误,但你的循环有点混乱。如果您真的想使用append,它将是:
1 | words.append(word) |
不
1 | word.append(words) |
在任何连续运行的空白处拆分
1 | words = text.split() |
在分隔符:
1 | words = text.split(",") |
words变量将是一个
分裂()
Return a list of the words in the string, using sep as the delimiter
... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
1 2 3 4 | >>> line="a sentence with a few words" >>> line.split() ['a', 'sentence', 'with', 'a', 'few', 'words'] >>> |
根据你计划如何处理你的句子列表,你可能想看看自然语言的工具包。它主要处理文本处理和评估。您还可以使用它来解决您的问题:
1 2 | import nltk words = nltk.word_tokenize(raw_sentence) |
这样做还有一个额外的好处,那就是拆分标点符号。
例子:
1 2 3 4 5 6 | >>> import nltk >>> s ="The fox's foot grazed the sleeping dog, waking it." >>> words = nltk.word_tokenize(s) >>> words ['The', 'fox',"'s", 'foot', 'grazed', 'the', 'sleeping', 'dog', ',', 'waking', 'it', '.'] |
这允许您过滤掉不需要的标点符号,只使用单词。
请注意,如果您不打算对句子进行任何复杂的操作,使用
[编辑]
这个算法怎么样?在空白处拆分文本,然后修剪标点符号。这会小心地删除单词边缘的标点符号,而不会损害单词内部的撇号,如
1 2 3 4 5 6 7 8 9 | >>> text "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'" >>> text.split() ["'Oh,", 'you',"can't", 'help',"that,'", 'said', 'the', 'Cat:',"'we're", 'all', 'mad', 'here.',"I'm", 'mad.',"You're","mad.'"] >>> import string >>> [word.strip(string.punctuation) for word in text.split()] ['Oh', 'you',"can't", 'help', 'that', 'said', 'the', 'Cat',"we're", 'all', 'mad', 'here',"I'm", 'mad',"You're", 'mad'] |
I want my python function to split a sentence (input) and store each word in a list
1 2 3 4 5 6 | >>> the_string ="this is a sentence" >>> words = the_string.split("") >>> print(words) ['this', 'is', 'a', 'sentence'] >>> type(words) <type 'list'> # or <class 'list'> in Python 3.0 |
你的问题是因为打字错误,你写的是
将
1 2 3 4 | def split_line(text): words = text.split() for current_word in words: print(words) |
…当你应该这样做的时候:
1 2 3 4 | def split_line(text): words = text.split() for current_word in words: print(current_word) |
如果出于某种原因,您希望在for循环中手动构造一个列表,那么您将使用list
1 2 3 | my_list = [] # make empty list for current_word in words: my_list.append(current_word.lower()) |
或者更整洁一点,使用列表理解:
1 | my_list = [current_word.lower() for current_word in words] |
shlex具有
1 2 3 | >>> import shlex >>> shlex.split("sudo echo 'foo && bar'") ['sudo', 'echo', 'foo && bar'] |
我想你是因为打字错误而困惑。
用循环中的
如果你想要一个单词/句子的所有字符在一个列表中,请执行以下操作:
1 2 3 4 5 6 | print(list("word")) # ['w', 'o', 'r', 'd'] print(list("some sentence")) # ['s', 'o', 'm', 'e', ' ', 's', 'e', 'n', 't', 'e', 'n', 'c', 'e'] |