关于python：如何查找所有出现的子字符串？

How to find all occurrences of a substring?

python使用string.find()和string.rfind()来获取字符串中子字符串的索引。

我想知道是否有像string.find_all()这样的东西可以返回所有找到的索引(不仅是从开始的第一个索引，还是从结束的第一个索引)。

例如：

1
2
3
4
5
6
7

string ="test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

相关讨论

没有简单的内置字符串函数可以满足您的需求，但是您可以使用更强大的正则表达式：

1
2
3

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

如果要查找重叠的匹配项，lookahead将执行以下操作：

1 2	[m.start() for m in re.finditer('(?=tt)', 'ttt')] #[0, 1]

号

如果希望反向查找所有不重叠的内容，可以将正向和负向查找组合成如下表达式：

1
2
3

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer返回一个生成器，因此您可以将上面的[]更改为()以获得一个生成器，而不是一个列表，如果您只迭代一次结果，那么这个列表将更有效。

相关讨论

1
2
3
4
5

>>> help(str.find)
Help on method_descriptor:

find(...)
S.find(sub [,start [,end]]) -> int

。

因此，我们可以自己构建它：

1
2
3
4
5
6
7
8
9

def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

不需要临时字符串或正则表达式。

相关讨论

这里有一种(非常低效)方法来获取所有(即，甚至重叠)匹配：

1
2
3

>>> string ="test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

。

相关讨论

同样，旧线程，但这里是我的解决方案，使用生成器和普通str.find。

1
2
3
4
5
6
7

def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)

号例子

1 2	x = 'banananassantana' [(i, x[i:i+2]) for i in findall('na', x)]

号

退货

1	[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

号

相关讨论

您可以使用re.finditer()进行不重叠的匹配。

1
2
3
4

>>> import re
>>> aString = 'this is a string where the substring"is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

但不适用于：

1
2
3
4

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

。

相关讨论

来吧，让我们重聚在一起。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""

substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found

return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

。

不需要这样的正则表达式。

相关讨论

如果你只是在寻找一个角色，这是可行的：

1
2
3
4

string ="dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

。

也，

1
2
3
4

string ="test test test test"
match ="test"
len(string.split(match)) - 1
# produces 4

号

我的预感是这两个(尤其是2)都没有表现得非常出色。

这是一个旧线索，但我感兴趣，想分享我的解决方案。

1
2
3
4
5
6
7
8
9
10
11

def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result

号

它应该返回找到子字符串的位置列表。如果您看到错误或改进空间，请发表评论。

这条线有点旧，但这对我很有用：

1
2
3
4
5
6
7
8
9
10
11

numberString ="onetwothreefourfivesixseveneightninefiveten"
testString ="five"

marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)

您可以尝试：

1
2
3
4
5
6
7
8
9

>>> string ="test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] =="test":
print index

0
5
10
15

号

这对我来说是个骗局，用Re.finditer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import re

text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'

# find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))

。

其他人提供的任何解决方案都完全基于可用的find()方法或任何可用的方法。

What is the core basic algorithm to find all the occurrences of a
substring in a string?

号

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes

号

You can also inherit str class to new class and can use this function
below.

号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes

号

调用方法

newstr.find_all('Do you find this answer helpful? then upvote
this!','this')

号

您可以轻松使用：

1	string.count('test')!

号

https://www.programmiz.com/python-programming/methods/string/count

干杯！

相关讨论

在文档中查找大量关键字时，请使用FlashText

1
2
3
4
5
6

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

号

在搜索单词的大列表中，FlashText比Regex运行得更快。

Python的方式是：

1
2
3
4
5
6
7
8
9
10

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o') # will return all positions of 'o'

[4, 7, 20, 26]
>>>

相关讨论

请看下面的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result

if __name__ == '__main__':
text ="How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)

号