join list of lists in python
在python中,是否是将列表列表加入单个列表(或迭代器)的简短语法?
例如,我有一个如下的列表,我想迭代A、B和C。
1 | x = [["a","b"], ["c"]] |
我能想到的最好办法是如下。
1 2 3 4 5 | result = [] [ result.extend(el) for el in x] for el in result: print el |
1 2 3 | import itertools a = [["a","b"], ["c"]] print list(itertools.chain.from_iterable(a)) |
如果你只深入到一个层次,嵌套的理解也会起作用:
1 2 3 4 5 | >>> x = [["a","b"], ["c"]] >>> [inner ... for outer in x ... for inner in outer] ['a', 'b', 'c'] |
在一条线上,它变成:
1 2 | >>> [j for i in x for j in i] ['a', 'b', 'c'] |
1 2 3 | x = [["a","b"], ["c"]] result = sum(x, []) |
这就是所谓的扁平化,有很多实现:
- 关于python flatten的更多信息
- Python诡计
- 在python中扁平化列表
这个怎么样,尽管它只适用于1级深的嵌套:
1 2 3 4 5 6 7 | >>> x = [["a","b"], ["c"]] >>> for el in sum(x, []): ... print el ... a b c |
从这些链接中,显然最完整、快速、优雅的ETC实现如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def flatten(l, ltypes=(list, tuple)): ltype = type(l) l = list(l) i = 0 while i < len(l): while isinstance(l[i], ltypes): if not l[i]: l.pop(i) i -= 1 break else: l[i:i + 1] = l[i] i += 1 return ltype(l) |
1 2 | l = [] map(l.extend, list_of_lists) |
最短!
对于无限嵌套的元素,这是递归工作的:
1 2 3 4 5 6 7 | def iterFlatten(root): if isinstance(root, (list, tuple)): for element in root: for e in iterFlatten(element): yield e else: yield root |
结果:
1 2 3 | >>> b = [["a", ("b","c")],"d"] >>> list(iterFlatten(b)) ['a', 'b', 'c', 'd'] |
性能比较:
1 2 3 4 5 6 7 8 | import itertools import timeit big_list = [[0]*1000 for i in range(1000)] timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100) timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100) timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100) timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100) [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)] |
生产:
1 2 3 4 5 6 7 8 9 10 11 12 13 | >>> import itertools >>> import timeit >>> big_list = [[0]*1000 for i in range(1000)] >>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100) [3.016212113769325, 3.0148865239060227, 3.0126415732791028] >>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100) [3.019953987082083, 3.528754223385439, 3.02181439266457] >>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100) [1.812084445152557, 1.7702404451095965, 1.7722977998725362] >>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100) [5.409658160700605, 5.477502077679354, 5.444318360412744] >>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)] [399.27587954973444, 400.9240571138051, 403.7521153804846] |
这是在Windows XP 32位上使用的python 2.7.1,但是上面注释中的@temoto让
离
派对迟到了,但是…
我对python不熟悉,来自Lisp背景。这就是我想到的(查看lulz的var名称):
1 2 3 4 5 6 7 8 | def flatten(lst): if lst: car,*cdr=lst if isinstance(car,(list,tuple)): if cdr: return flatten(car) + flatten(cdr) return flatten(car) if cdr: return [car] + flatten(cdr) return [car] |
似乎起作用了。测试:
1 | flatten((1,2,3,(4,5,6,(7,8,(((1,2))))))) |
返回:
1 | [1, 2, 3, 4, 5, 6, 7, 8, 1, 2] |
如果需要列表而不是生成器,请使用EDOCX1[0]
1 2 3 | from itertools import chain x = [["a","b"], ["c"]] y = list(chain(*x)) |
你所描述的就是扁平化一个列表,有了这个新的知识,你可以在谷歌上找到很多解决方案(没有内置的扁平化方法)。以下是其中一个,来自http://www.daniel-lemire.com/blog/archives/2006/05/10/flattning-lists-in-python/:
1 2 3 4 5 6 7 8 9 | def flatten(x): flat = True ans = [] for i in x: if ( i.__class__ is list): ans = flatten(i) else: ans.append(i) return ans |
总是有reduce(不推荐使用functools):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> x = [ [ 'a', 'b'], ['c'] ] >>> for el in reduce(lambda a,b: a+b, x, []): ... print el ... __main__:1: DeprecationWarning: reduce() not supported in 3.x; use functools.reduce() a b c >>> import functools >>> for el in functools.reduce(lambda a,b: a+b, x, []): ... print el ... a b c >>> |
不幸的是,列表连接的加号运算符不能用作函数——或者幸运的是,如果您希望lambda丑陋以提高可见性。
或递归操作:
1 2 3 4 5 6 7 8 9 10 | def flatten(input): ret = [] if not isinstance(input, (list, tuple)): return [input] for i in input: if isinstance(i, (list, tuple)): ret.extend(flatten(i)) else: ret.append(i) return ret |
当我必须创建一个包含数组元素及其计数的字典时,我遇到了类似的问题。答案是相关的,因为,我扁平了一个列表,得到了我需要的元素,然后进行分组和计数。我使用python的map函数生成元素的元组,它是数组上的count和groupby。请注意,groupby将数组元素本身作为keypunc。作为一个相对较新的python编码器,我发现它对我来说更容易理解,同时也是一个python。
在我讨论代码之前,这里是我必须先扁平化的数据示例:
1 2 3 4 5 6 7 8 | {"_id" : ObjectId("4fe3a90783157d765d000011"),"status" : ["opencalais" ], "content_length" : 688,"open_calais_extract" : {"entities" : [ {"type" :"Person","name" :"Iman Samdura","rel_score" : 0.223 }, {"type" :"Company", "name" :"Associated Press", "rel_score" : 0.321 }, {"type" :"Country", "name" :"Indonesia", "rel_score" : 0.321 }, ... ]}, "title" :"Indonesia Police Arrest Bali Bomb Planner","time" :"06:42 ET", "filename" :"021121bn.01","month" :"November","utctime" : 1037836800, "date" :"November 21, 2002","news_type" :"bn","day" :"21" } |
它是来自Mongo的查询结果。下面的代码将这些列表的集合展平。
1 2 3 4 | def flatten_list(items): return sorted([entity['name'] for entity in [entities for sublist in [item['open_calais_extract']['entities'] for item in items] for entities in sublist]) |
首先,我将提取所有"实体"集合,然后对于每个实体集合,迭代字典并提取name属性。
对于一个平坦的水平,如果你关心速度,这是比任何以前的答案在所有条件下我尝试。(也就是说,如果您需要将结果作为列表。如果您只需要动态地迭代它,那么链示例可能更好。)它的工作方式是预先分配一个最终大小的列表,并按片复制部分(这是比任何迭代器方法都低级别的块复制):
1 2 3 4 5 6 7 8 9 10 11 12 13 | def join(a): """Joins a sequence of sequences into a single sequence. (One-level flattening.) E.g., join([(1,2,3), [4, 5], [6, (7, 8, 9), 10]]) = [1,2,3,4,5,6,(7,8,9),10] This is very efficient, especially when the subsequences are long. """ n = sum([len(b) for b in a]) l = [None]*n i = 0 for b in a: j = i+len(b) l[i:j] = b i = j return l |
带注释的排序时间列表:
1 2 3 4 5 6 7 | [(0.5391559600830078, 'flatten4b'), # join() above. (0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a) (0.5419249534606934, 'flatten4a'), # Similar, using zip() (0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a)) (0.7472689151763916, 'flatten1'), # list(itertools.chain(*a)) (1.5468521118164062, 'flatten3'), # [i for j in a for i in j] (26.696547985076904, 'flatten2')] # sum(a, []) |
遗憾的是,python没有一种简单的方法来扁平列表。试试这个:
1 2 3 4 5 6 7 | def flatten(some_list): for element in some_list: if type(element) in (tuple, list): for item in flatten(element): yield item else: yield element |
它将递归地展开一个列表;然后您可以这样做
1 2 3 4 5 | result = [] [ result.extend(el) for el in x] for el in flatten(result): print el |