Python正则表达式找到所有重叠匹配？

overlappingpythonregex

Python regex find all overlapping matches?

我正在尝试使用Python2.6中的re在一个更大的数字系列中查找每个10位数字系列。

我很容易抓住没有重叠的匹配，但我想要数字系列中的每一个匹配。如。

在"123456789123456789"中

我应该得到以下列表：

1	[1234567891,2345678912,3456789123,4567891234,5678912345,6789123456,7891234567,8912345678,9123456789]

我已经找到了对"lookahead"的引用，但是我看到的示例只显示了成对的数字，而不是较大的分组，而且我无法将它们转换为超过两位数的数字。

相关讨论

在lookahead中使用捕获组。lookahead捕获您感兴趣的文本，但实际匹配在技术上是lookahead之前的零宽度子字符串，因此匹配在技术上是不重叠的：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

import re
s ="123456789123456789"
matches = re.finditer(r'(?=(\d{10}))',s)
results = [int(match.group(1)) for match in matches]
# results:
# [1234567891,
# 2345678912,
# 3456789123,
# 4567891234,
# 5678912345,
# 6789123456,
# 7891234567,
# 8912345678,
# 9123456789]

相关讨论

您还可以尝试使用支持重叠匹配的第三方regex模块(而不是re)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

>>> import regex as re
>>> s ="123456789123456789"
>>> matches = re.findall(r'\d{10}', s, overlapped=True)
>>> for match in matches: print match
...
1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789

。

我喜欢正则表达式，但这里不需要它们。

简单地

1
2
3
4
5
6

s = "123456789123456789"

n = 10
li = [ s[i:i+n] for i in xrange(len(s)-n+1) ]
print '
'.join(li)

号

结果

1
2
3
4
5
6
7
8
9

1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789

相关讨论