Identify groups of continuous numbers in a list
我想在一个列表中确定连续数字组,以便:
1 | myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]) |
返回:
1 | [(2,5), (12,17), 20] |
号
想知道最好的方法是什么(特别是如果Python中有内置的东西)。
编辑:注意,我最初忘了提到单独的数字应该作为单独的数字返回,而不是范围。
编辑2:回答OP新要求
1 2 3 4 5 6 7 | ranges = [] for key, group in groupby(enumerate(data), lambda (index, item): index - item): group = map(itemgetter(1), group) if len(group) > 1: ranges.append(xrange(group[0], group[-1])) else: ranges.append(group[0]) |
输出:
1 | [xrange(2, 5), xrange(12, 17), 20] |
。
您可以用range或任何其他自定义类替换xrange。
python文档有一个非常好的方法:
1 2 3 4 5 | from operator import itemgetter from itertools import groupby data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] for k, g in groupby(enumerate(data), lambda (i,x):i-x): print map(itemgetter(1), g) |
输出:
1 2 | [2, 3, 4, 5] [12, 13, 14, 15, 16, 17] |
。
如果要获得完全相同的输出,可以执行以下操作:
1 2 3 4 | ranges = [] for k, g in groupby(enumerate(data), lambda (i,x):i-x): group = map(itemgetter(1), g) ranges.append((group[0], group[-1])) |
。
输出:
1 | [(2, 5), (12, 17)] |
编辑:这个例子已经在文档中解释过了,但也许我应该更详细地解释一下:
The key to the solution is
differencing with a range so that
consecutive numbers all appear in same
group.
号
如果数据是:
1 2 3 4 5 | groupby( [(0, 2), (1, 3), (2, 4), (3, 5), (4, 12), (5, 13), (6, 14), (7, 15), (8, 16), (9, 17)], lambda (i,x):i-x ) |
。
lambda函数从元素值中减去元素索引。所以当你在每个项目上应用lambda时。您将获得groupby的以下密钥:
1 | [-2, -2, -2, -2, -8, -8, -8, -8, -8, -8] |
groupby按相等的键值对元素进行分组,因此前4个元素将被分组在一起,以此类推。
我希望这能使它更可读。
首先导入所需的库
1 2 3 4 5 6 7 8 9 | from itertools import groupby from operator import itemgetter ranges =[] for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]): group = (map(itemgetter(1),g)) group = list(map(int,group)) ranges.append((group[0],group[-1])) |
。
4.0版中增加了
演示
1 2 3 4 5 6 | import more_itertools as mit iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] [list(group) for group in mit.consecutive_groups(iterable)] # [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]] |
代码
应用该工具,我们可以生成一个求连续数字范围的生成器函数。
1 2 3 4 5 6 7 8 9 10 11 12 13 | def find_ranges(iterable): """Yield range of consecutive numbers.""" for group in mit.consecutive_groups(iterable): group = list(group) if len(group) == 1: yield group[0] else: yield group[0], group[-1] iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] list(find_ranges(iterable)) # [(2, 5), (12, 17), 20] |
。
源代码实现模拟经典配方(如@nadia alramli所示)。
注:
"天真"的解决方案,我觉得至少有点可读性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57] def group(L): first = last = L[0] for n in L[1:]: if n - 1 == last: # Part of the group, bump the end last = n else: # Not part of the group, yield current group and start a new yield first, last first = last = n yield first, last # Yield the last group >>>print list(group(x)) [(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)] |
号
假设您的列表已排序:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> from itertools import groupby >>> def ranges(lst): pos = (j - i for i, j in enumerate(lst)) t = 0 for i, els in groupby(pos): l = len(list(els)) el = lst[t] t += l yield range(el, el+l) >>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] >>> list(ranges(lst)) [range(2, 6), range(12, 18)] |
号
在这里,它应该可以工作,而不需要任何导入:
1 2 3 4 5 6 7 8 9 10 11 12 | def myfunc(lst): ret = [] a = b = lst[0] # a and b are range's bounds for el in lst[1:]: if el == b+1: b = el # range grows else: # range ended ret.append(a if a==b else (a,b)) # is a single or a range? a = b = el # let's start again with a single ret.append(a if a==b else (a,b)) # corner case for last single/range return ret |
号
请注意,使用
1 2 3 | for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]): group = list(map(itemgetter(1), g)) ranges.append((group[0], group[-1])) |
号
这不使用标准函数-它只是对输入进行迭代,但它应该可以工作:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def myfunc(l): r = [] p = q = None for x in l + [-1]: if x - 1 == q: q += 1 else: if p: if q > p: r.append('%s-%s' % (p, q)) else: r.append(str(p)) p = q = x return '(%s)' % ', '.join(r) |
注意,它要求输入只包含按升序排列的正数。您应该验证输入,但为了清晰起见,省略了此代码。
1 2 3 4 5 6 7 8 9 10 11 | import numpy as np myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20] sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1) l = [] for s in sequences: if len(s) > 1: l.append((np.min(s), np.max(s))) else: l.append(s[0]) print(l) |
号
输出:
1 | [(2, 5), (12, 17), 20] |
号
这是我想出来的答案。我编写代码是为了让其他人理解,所以我对变量名和注释相当冗长。
首先是一个快速助手函数:
1 2 3 4 5 6 7 8 9 | def getpreviousitem(mylist,myitem): '''Given a list and an item, return previous item in list''' for position, item in enumerate(mylist): if item == myitem: # First item has no previous item if position == 0: return None # Return previous item return mylist[position-1] |
号
然后是实际代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def getranges(cpulist): '''Given a sorted list of numbers, return a list of ranges''' rangelist = [] inrange = False for item in cpulist: previousitem = getpreviousitem(cpulist,item) if previousitem == item - 1: # We're in a range if inrange == True: # It's an existing range - change the end to the current item newrange[1] = item else: # We've found a new range. newrange = [item-1,item] # Update to show we are now in a range inrange = True else: # We were in a range but now it just ended if inrange == True: # Save the old range rangelist.append(newrange) # Update to show we're no longer in a range inrange = False # Add the final range found to our list if inrange == True: rangelist.append(newrange) return rangelist |
号
示例运行:
1 | getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17]) |
号
返回:
1 | [[2, 5], [12, 17]] |
号
不需要额外导入就可以工作的简短解决方案。它接受任何iterable,对未排序的输入进行排序,并删除重复项:
1 2 3 4 5 | def ranges(nums): nums = sorted(set(nums)) gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e] edges = iter(nums[:1] + sum(gaps, []) + nums[-1:]) return list(zip(edges, edges)) |
例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> ranges([2, 3, 4, 7, 8, 9, 15]) [(2, 4), (7, 9), (15, 15)] >>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100]) [(-1, 3), (12, 13), (15, 15), (100, 100)] >>> ranges(range(100)) [(0, 99)] >>> ranges([0]) [(0, 0)] >>> ranges([]) [] |
这与@dansalmo的解决方案是一样的,我发现它令人惊讶,尽管有点难以阅读和应用(因为它不是作为函数给出的)。
请注意,可以通过修改RETURN语句,轻松地将其修改为吐出"传统"的开放范围
1 | return [(s, e+1) for s, e in zip(edges, edges)] |
我把这个答案从另一个被标记为这个问题的副本的问题中复制过来,目的是让它更容易找到(在我刚刚再次搜索这个主题之后,一开始只在这里找到问题,对给出的答案不满意)。
使用numpy+理解列表:使用numpy diff函数,可以识别出其差不等于一的后续输入向量项。需要考虑输入向量的开始和结束。
1 2 3 4 5 6 7 8 | import numpy as np data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]) d = [i for i, df in enumerate(np.diff(data)) if df!= 1] d = np.hstack([-1, d, len(data)-1]) # add first and last elements d = np.vstack([d[:-1]+1, d[1:]]).T print(data[d]) |
号
输出:
1 2 3 | [[ 2 5] [12 17] [20 20]] |
号
注:省略了个别数字应被区别对待的请求(作为个别数字返回,而不是范围)。这可以通过进一步的后处理结果来实现。通常情况下,这会使事情变得更复杂,而不会获得任何好处。