How to find all occurrences of a substring?
python使用
我想知道是否有像
例如:
1 2 3 4 5 6 7 | string ="test test test test" print string.find('test') # 0 print string.rfind('test') # 15 #this is the goal print string.find_all('test') # [0,5,10,15] |
没有简单的内置字符串函数可以满足您的需求,但是您可以使用更强大的正则表达式:
1 2 3 | import re [m.start() for m in re.finditer('test', 'test test test test')] #[0, 5, 10, 15] |
如果要查找重叠的匹配项,lookahead将执行以下操作:
1 2 | [m.start() for m in re.finditer('(?=tt)', 'ttt')] #[0, 1] |
号
如果希望反向查找所有不重叠的内容,可以将正向和负向查找组合成如下表达式:
1 2 3 | search = 'tt' [m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')] #[1] |
1 2 3 4 5 | >>> help(str.find) Help on method_descriptor: find(...) S.find(sub [,start [,end]]) -> int |
。
因此,我们可以自己构建它:
1 2 3 4 5 6 7 8 9 | def find_all(a_str, sub): start = 0 while True: start = a_str.find(sub, start) if start == -1: return yield start start += len(sub) # use start += 1 to find overlapping matches list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15] |
不需要临时字符串或正则表达式。
这里有一种(非常低效)方法来获取所有(即,甚至重叠)匹配:
1 2 3 | >>> string ="test test test test" >>> [i for i in range(len(string)) if string.startswith('test', i)] [0, 5, 10, 15] |
。
同样,旧线程,但这里是我的解决方案,使用生成器和普通
1 2 3 4 5 6 7 | def findall(p, s): '''Yields all the positions of the pattern p in the string s.''' i = s.find(p) while i != -1: yield i i = s.find(p, i+1) |
号例子
1 2 | x = 'banananassantana' [(i, x[i:i+2]) for i in findall('na', x)] |
号
退货
1 | [(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')] |
号
您可以使用
1 2 3 4 | >>> import re >>> aString = 'this is a string where the substring"is" is repeated several times' >>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))] [(2, 4), (5, 7), (38, 40), (42, 44)] |
但不适用于:
1 2 3 4 | In [1]: aString="ababa" In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))] Output: [(0, 3)] |
。
来吧,让我们重聚在一起。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def locations_of_substring(string, substring): """Return a list of locations of a substring.""" substring_length = len(substring) def recurse(locations_found, start): location = string.find(substring, start) if location != -1: return recurse(locations_found + [location], location+substring_length) else: return locations_found return recurse([], 0) print(locations_of_substring('this is a test for finding this and this', 'this')) # prints [0, 27, 36] |
。
不需要这样的正则表达式。
如果你只是在寻找一个角色,这是可行的:
1 2 3 4 | string ="dooobiedoobiedoobie" match = 'o' reduce(lambda count, char: count + 1 if char == match else count, string, 0) # produces 7 |
。
也,
1 2 3 4 | string ="test test test test" match ="test" len(string.split(match)) - 1 # produces 4 |
号
我的预感是这两个(尤其是2)都没有表现得非常出色。
这是一个旧线索,但我感兴趣,想分享我的解决方案。
1 2 3 4 5 6 7 8 9 10 11 | def find_all(a_string, sub): result = [] k = 0 while k < len(a_string): k = a_string.find(sub, k) if k == -1: return result else: result.append(k) k += 1 #change to k += len(sub) to not search overlapping results return result |
号
它应该返回找到子字符串的位置列表。如果您看到错误或改进空间,请发表评论。
这条线有点旧,但这对我很有用:
1 2 3 4 5 6 7 8 9 10 11 | numberString ="onetwothreefourfivesixseveneightninefiveten" testString ="five" marker = 0 while marker < len(numberString): try: print(numberString.index("five",marker)) marker = numberString.index("five", marker) + 1 except ValueError: print("String not found") marker = len(numberString) |
您可以尝试:
1 2 3 4 5 6 7 8 9 | >>> string ="test test test test" >>> for index,value in enumerate(string): if string[index:index+(len("test"))] =="test": print index 0 5 10 15 |
号
这对我来说是个骗局,用Re.finditer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import re text = 'This is sample text to test if this pythonic '\ 'program can serve as an indexing platform for '\ 'finding words in a paragraph. It can give '\ 'values as to where the word is located with the '\ 'different examples as stated' # find all occurances of the word 'as' in the above text find_the_word = re.finditer('as', text) for match in find_the_word: print('start {}, end {}, search string \'{}\''. format(match.start(), match.end(), match.group())) |
。
其他人提供的任何解决方案都完全基于可用的find()方法或任何可用的方法。
What is the core basic algorithm to find all the occurrences of a
substring in a string?
号
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def find_all(string,substring): """ Function: Returning all the index of substring in a string Arguments: String and the search string Return:Returning a list """ length = len(substring) c=0 indexes = [] while c < len(string): if string[c:c+length] == substring: indexes.append(c) c=c+1 return indexes |
号
You can also inherit str class to new class and can use this function
below.
号
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | class newstr(str): def find_all(string,substring): """ Function: Returning all the index of substring in a string Arguments: String and the search string Return:Returning a list """ length = len(substring) c=0 indexes = [] while c < len(string): if string[c:c+length] == substring: indexes.append(c) c=c+1 return indexes |
号
调用方法
newstr.find_all('Do you find this answer helpful? then upvote
this!','this')
号
您可以轻松使用:
1 | string.count('test')! |
号
https://www.programmiz.com/python-programming/methods/string/count
干杯!
在文档中查找大量关键字时,请使用FlashText
1 2 3 4 5 6 | from flashtext import KeywordProcessor words = ['test', 'exam', 'quiz'] txt = 'this is a test' kwp = KeywordProcessor() kwp.add_keywords_from_list(words) result = kwp.extract_keywords(txt, span_info=True) |
号
在搜索单词的大列表中,FlashText比Regex运行得更快。
Python的方式是:
1 2 3 4 5 6 7 8 9 10 | mystring = 'Hello World, this should work!' find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s] # s represents the search string # c represents the character string find_all(mystring,'o') # will return all positions of 'o' [4, 7, 20, 26] >>> |
请看下面的代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/env python # coding:utf-8 '''黄哥Python''' def get_substring_indices(text, s): result = [i for i in range(len(text)) if text.startswith(s, i)] return result if __name__ == '__main__': text ="How much wood would a wood chuck chuck if a wood chuck could chuck wood?" s = 'wood' print get_substring_indices(text, s) |
号