Divide a list into multiple lists based on a bin size
我有一个包含超过100000个值的列表。
我需要根据一个特定的箱宽(比如0.1)将列表分成多个较小的列表。有人能帮我写一个python程序吗?
我的列表如下
1 2 3 4 5 6 7 8 9 10 11 12 | -0.234 -0.04325 -0.43134 -0.315 -0.6322 -0.245 -0.5325 -0.6341 -0.5214 -0.531 -0.124 -0.0252 |
我想要这样的输出
1 2 3 4 5 6 7 | list1 = [-0.04325, -0.0252] list2 = [-0.124] list3 = [-0.234, -0.245 ] list4 = [-0.315] list5 = [-0.43134] list6 = [-0.5325, -0.5214, -0.531] list7 = [-0.6322, -0.6341] |
号
以下是使用numpys数字化的简单而好的方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | >>> import numpy as np >>> mylist = np.array([-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245, -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252]) >>> bins = np.arange(0,-1,-0.1) >>> for i in xrange(1,10): ... mylist[np.digitize(mylist,bins)==i] ... array([-0.04325, -0.0252 ]) array([-0.124]) array([-0.234, -0.245]) array([-0.315]) array([-0.43134]) array([-0.5325, -0.5214, -0.531 ]) array([-0.6322, -0.6341]) array([], dtype=float64) array([], dtype=float64) |
。
数字化,返回一个数组,其中包含每个元素所属bin的索引值。
这将创建一个dict,其中每个值都是一个适合于bin的元素列表。
1 2 3 4 5 | import collections bins = collections.defaultdict(list) binId = lambda x: int(x*10) for val in vals: bins[binId(val)].append(val) |
这项工作:
1 2 3 4 5 6 7 8 9 10 11 | l=[-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245, -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252] d={} for k,v in zip([int(i*10) for i in l],l): d.setdefault(k,[]).append(v) LoL=[d[e] for e in sorted(d.keys(), reverse=True)] for i,l in enumerate(LoL,1): print('list',i,l) |
。
印刷品:
1 2 3 4 5 6 7 | list 1 [-0.04325, -0.0252] list 2 [-0.124] list 3 [-0.234, -0.245] list 4 [-0.315] list 5 [-0.43134] list 6 [-0.5325, -0.5214, -0.531] list 7 [-0.6322, -0.6341] |
工作原理:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | 1: The list >>> l=[-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245, ... -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252] 2: Produce the keys: >>> [int(i*10) for i in l] [-2, 0, -4, -3, -6, -2, -5, -6, -5, -5, -1, 0] 3: Produce tuples to put in the dict: >>> zip([int(i*10) for i in l],l) [(-2, -0.234), (0, -0.04325), (-4, -0.43134), (-3, -0.315), (-6, -0.6322), (-2, -0.245), (-5, -0.5325), (-6, -0.6341), (-5, -0.5214), (-5, -0.531), (-1, -0.124), (0, -0.0252)] 4: unpack the tuples into k,v and loop over the list >>>for k,v in zip([int(i*10) for i in l],l): 5: add k key to a dict (if not there) and append the float value to a list associated with that key: d.setdefault(k,[]).append(v) |
。
我建议在这些语句上使用Python教程。
这就是你想要的吗?(示例输出可能会有所帮助:)
1 2 3 4 5 6 7 | f = [-0.234, -0.04325, -0.43134, -0.315, -0.6322, -0.245, -0.5325, -0.6341, -0.5214, -0.531, -0.124, -0.0252] import numpy as np data = np.array(f) hist, edges = np.histogram(data, bins=10) print hist |
号
产量:
1 | [2 3 0 1 0 1 2 0 1 2] |
因此,为垃圾箱分配点的问题可能会有所帮助。
可以使用
1 2 3 4 5 6 7 8 9 | import itertools as it iterable = ['-0.234', '-0.04325', '-0.43134', '-0.315', '-0.6322', '-0.245', '-0.5325', '-0.6341', '-0.5214', '-0.531', '-0.124', '-0.0252'] a,b,c,d,e,f,g = [list(g) for k, g in it.groupby(sorted(iterable), key=lambda x: x[:4])] c # ['-0.234', '-0.245'] |
号
注意:这个简单的键函数假定iterable中的值在-0.0和-10.0之间。一般情况下考虑使用
有关
我们可以用第三方图书馆
鉴于
1 2 3 4 5 6 7 | iterable = ( "-0.234 -0.04325 -0.43134 -0.315 -0.6322 -0.245" "-0.5325 -0.6341 -0.5214 -0.531 -0.124 -0.0252" ).split() iterable # ['-0.234', '-0.04325', '-0.43134', '-0.315', '-0.6322', '-0.245', '-0.5325', '-0.6341', '-0.5214', '-0.531', '-0.124', '-0.0252'] |
代码
1 2 3 4 5 6 7 8 9 10 | import more_itertools as mit keyfunc = lambda x: float("{:.1f}".format(float(x))) bins = mit.bucket(iterable, key=keyfunc) keys = [-0.0,-0.1,-0.2, -0.3,-0.4,-0.5,-0.6] a,b,c,d,e,f,g = [list(bins[k]) for k in keys] c # ['-0.234', '-0.245'] |
。
细节
我们可以通过键函数进行装箱,我们定义键函数将数字格式化为单精度,即
1 2 | keyfunc = lambda x: float("{:.1f}".format(float(x))) bins = mit.bucket(iterable, key=keyfunc) |
通过键功能定义的键访问这些存储箱:
1 2 3 | c = list(bins[-0.2]) c # ['-0.234', '-0.245'] |
。
通过迭代键访问所有容器:
1 2 3 4 5 6 | f = lambda x: float("{:.1f}".format(float(x))) bins = mit.bucket(iterable, key=keyfunc) keys = [-0.0,-0.1,-0.2, -0.3,-0.4,-0.5,-0.6] for k in keys: print("{} --> {}".format(k, list(bins[k]))) |
号
产量
1 2 3 4 5 6 7 | -0.0 --> ['-0.04325', '-0.0252'] -0.1 --> ['-0.124'] -0.2 --> ['-0.234', '-0.245'] -0.3 --> ['-0.315'] -0.4 --> ['-0.43134'] -0.5 --> ['-0.5325', '-0.5214', '-0.531'] -0.6 --> ['-0.6322', '-0.6341'] |
号
列表理解和解包是另一个选项(参见代码示例)。
有关更多详细信息,请参阅