How can I count the occurrences of a list item?
对于一个项目,我如何计算它在python列表中的出现次数?
如果只需要一个项目的计数,请使用
1 2 | >>> [1, 2, 3, 4, 1, 4, 1].count(1) 3 |
如果要对多个项目进行计数,请不要使用此选项。在循环中调用
如果您使用的是python 2.7或3,并且希望每个元素出现的次数:
1 2 3 4 | >>> from collections import Counter >>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red'] >>> Counter(z) Counter({'blue': 3, 'red': 2, 'yellow': 1}) |
计算列表中一个项目的出现次数
对于只计算一个列表项的出现次数,可以使用
1 2 3 4 5 | >>> l = ["a","b","b"] >>> l.count("a") 1 >>> l.count("b") 2 |
计算列表中所有项目的出现次数也称为"计数"列表或创建计数计数器。
用count()对所有项目进行计数
为了计算
1 | [[x,l.count(x)] for x in set(l)] |
(或类似于字典
例子:
1 2 3 4 5 | >>> l = ["a","b","b"] >>> [[x,l.count(x)] for x in set(l)] [['a', 1], ['b', 2]] >>> dict((x,l.count(x)) for x in set(l)) {'a': 1, 'b': 2} |
用计数器()计算所有项目
另外,还有来自
1 | Counter(l) |
例子:
1 2 3 4 | >>> l = ["a","b","b"] >>> from collections import Counter >>> Counter(l) Counter({'b': 2, 'a': 1}) |
柜台快多少?
我查了一下
下面是我使用的脚本:
1 2 3 4 5 6 7 8 9 10 11 12 13 | from __future__ import print_function import timeit t1=timeit.Timer('Counter(l)', \ 'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]' ) t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]', 'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]' ) print("Counter():", t1.repeat(repeat=3,number=10000)) print("count(): ", t2.repeat(repeat=3,number=10000) |
输出:
1 2 | Counter(): [0.46062711701961234, 0.4022796869976446, 0.3974247490405105] count(): [7.779430688009597, 7.962715800967999, 8.420845870045014] |
另一种获取每个项目在字典中出现次数的方法:
1 | dict((i, a.count(i)) for i in a) |
见:http://docs.python.org/tutorial/datastructures.html更多列表
Given an item, how can I count its occurrences in a list in Python?
下面是一个示例列表:
1 2 3 | >>> l = list('aaaaabbbbcccdde') >>> l ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e'] |
这里有
1 2 | >>> l.count('b') 4 |
这对任何列表都适用。元组也有此方法:
1 2 3 4 5 | >>> t = tuple('aabbbffffff') >>> t ('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f') >>> t.count('f') 6 |
然后是collections.counter。您可以将任何ITerable转储到计数器中,而不仅仅是一个列表,并且计数器将保留元素计数的数据结构。
用途:
1 2 3 4 | >>> from collections import Counter >>> c = Counter(l) >>> c['b'] 4 |
计数器基于python字典,它们的键是元素,因此键需要是可哈希的。它们基本上类似于允许冗余元素进入它们的集合。
您可以使用ITerables从计数器中进行加或减:
1 2 3 4 5 6 | >>> c.update(list('bbb')) >>> c['b'] 7 >>> c.subtract(list('bbb')) >>> c['b'] 4 |
您还可以使用计数器执行多组操作:
1 2 3 4 5 6 7 8 9 | >>> c2 = Counter(list('aabbxyz')) >>> c - c2 # set difference Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1}) >>> c + c2 # addition of all elements Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1}) >>> c | c2 # set union Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1}) >>> c & c2 # set intersection Counter({'a': 2, 'b': 2}) |
为什么不熊猫呢?
另一个答案表明:
Why not use pandas?
熊猫是一个公共图书馆,但它不在标准图书馆。把它作为需求添加是非常重要的。
在列表对象本身和标准库中都有针对这个用例的内置解决方案。
如果您的项目还不需要熊猫,那么仅仅为了这个功能而将它作为一个需求是愚蠢的。
如果您想一次计算所有值,可以使用numpy数组和
1 2 3 | import numpy as np a = np.array([1, 2, 3, 4, 1, 4, 1]) np.bincount(a) |
给出
1 | >>> array([0, 3, 1, 1, 2]) |
我将所有建议的解决方案(以及一些新的解决方案)与PerfPlot(我的一个小项目)进行了比较。
计数一个项目对于足够大的数组,结果是
1 | numpy.sum(numpy.array(a) == 1) |
比其他解决方案稍快。
清点所有项目如前所述,
1 | numpy.bincount(a) |
是你想要的。
复制绘图的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | from collections import Counter from collections import defaultdict import numpy import operator import pandas import perfplot def counter(a): return Counter(a) def count(a): return dict((i, a.count(i)) for i in set(a)) def bincount(a): return numpy.bincount(a) def pandas_value_counts(a): return pandas.Series(a).value_counts() def occur_dict(a): d = {} for i in a: if i in d: d[i] = d[i]+1 else: d[i] = 1 return d def count_unsorted_list_items(items): counts = defaultdict(int) for item in items: counts[item] += 1 return dict(counts) def operator_countof(a): return dict((i, operator.countOf(a, i)) for i in set(a)) perfplot.show( setup=lambda n: list(numpy.random.randint(0, 100, n)), n_range=[2**k for k in range(20)], kernels=[ counter, count, bincount, pandas_value_counts, occur_dict, count_unsorted_list_items, operator_countof ], equality_check=None, logx=True, logy=True, ) |
2。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | from collections import Counter from collections import defaultdict import numpy import operator import pandas import perfplot def counter(a): return Counter(a) def count(a): return dict((i, a.count(i)) for i in set(a)) def bincount(a): return numpy.bincount(a) def pandas_value_counts(a): return pandas.Series(a).value_counts() def occur_dict(a): d = {} for i in a: if i in d: d[i] = d[i]+1 else: d[i] = 1 return d def count_unsorted_list_items(items): counts = defaultdict(int) for item in items: counts[item] += 1 return dict(counts) def operator_countof(a): return dict((i, operator.countOf(a, i)) for i in set(a)) perfplot.show( setup=lambda n: list(numpy.random.randint(0, 100, n)), n_range=[2**k for k in range(20)], kernels=[ counter, count, bincount, pandas_value_counts, occur_dict, count_unsorted_list_items, operator_countof ], equality_check=None, logx=True, logy=True, ) |
如果你可以使用
1 2 3 4 5 6 7 8 | >>> import pandas as pd >>> a = [1, 2, 3, 4, 1, 4, 1] >>> pd.Series(a).value_counts() 1 3 4 2 3 1 2 1 dtype: int64 |
它也会根据频率自动对结果排序。
如果希望结果在列表中,请执行以下操作
1 2 | >>> pd.Series(a).value_counts().reset_index().values.tolist() [[1, 3], [4, 2], [3, 1], [2, 1]] |
为什么不使用熊猫?
1 2 3 4 5 6 7 | import pandas as pd l = ['a', 'b', 'c', 'd', 'a', 'd', 'a'] # converting the list to a Series and counting the values my_count = pd.Series(l).value_counts() my_count |
输出:
1 2 3 4 5 | a 3 d 2 b 1 c 1 dtype: int64 |
如果要查找特定元素的计数,请说a,尝试:
1 | my_count['a'] |
输出:
1 | 3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict) from collections import defaultdict def count_unsorted_list_items(items): """ :param items: iterable of hashable items to count :type items: iterable :returns: dict of counts like Py2.7 Counter :rtype: dict """ counts = defaultdict(int) for item in items: counts[item] += 1 return dict(counts) # Python >= 2.2 (generators) def count_sorted_list_items(items): """ :param items: sorted iterable of items to count :type items: sorted iterable :returns: generator of (item, count) tuples :rtype: generator """ if not items: return elif len(items) == 1: yield (items[0], 1) return prev_item = items[0] count = 1 for item in items[1:]: if prev_item == item: count += 1 else: yield (prev_item, count) count = 1 prev_item = item yield (item, count) return import unittest class TestListCounters(unittest.TestCase): def test_count_unsorted_list_items(self): D = ( ([], []), ([2], [(2,1)]), ([2,2], [(2,2)]), ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]), ) for inp, exp_outp in D: counts = count_unsorted_list_items(inp) print inp, exp_outp, counts self.assertEqual(counts, dict( exp_outp )) inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)]) self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) ) def test_count_sorted_list_items(self): D = ( ([], []), ([2], [(2,1)]), ([2,2], [(2,2)]), ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]), ) for inp, exp_outp in D: counts = list( count_sorted_list_items(inp) ) print inp, exp_outp, counts self.assertEqual(counts, exp_outp) inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)]) self.assertEqual(exp_outp, list( count_sorted_list_items(inp) )) # ... [(2,2), (4,1), (2,1)] |
我今天遇到了这个问题,在我想检查它之前,我已经推出了自己的解决方案。这是:
1 | dict((i,a.count(i)) for i in a) |
对于大的列表来说真的很慢。我的解决方案
1 2 3 4 5 6 7 8 | def occurDict(items): d = {} for i in items: if i in d: d[i] = d[i]+1 else: d[i] = 1 return d |
实际上比计数器解决方案要快一点,至少对于Python2.7来说是这样。
要计算具有公共类型的不同元素的数量:
1 2 3 | li = ['A0','c5','A8','A2','A5','c2','A3','A9'] print sum(1 for el in li if el[0]=='A' and el[1] in '01234') |
给予
1 2 3 4 5 6 7 8 | from collections import Counter country=['Uruguay', 'Mexico', 'Uruguay', 'France', 'Mexico'] count_country = Counter(country) output_list= [] for i in count_country: output_list.append([i,count_country[i]]) print output_list |
输出列表:
1 | [['Mexico', 2], ['France', 1], ['Uruguay', 2]] |
建议使用numpy的bincount,但它只适用于具有非负整数的一维数组。此外,生成的数组可能会混淆(它包含原始列表中从最小到最大的整数,并将缺少的整数设置为0)。
使用numpy的更好方法是使用属性
1 2 | # a = [1, 1, 0, 2, 1, 0, 3, 3] a_uniq, counts = np.unique(a, return_counts=True) # array([0, 1, 2, 3]), array([2, 3, 1, 2] |
然后我们可以把它们配对成
1 | dict(zip(a_uniq, counts)) # {0: 2, 1: 3, 2: 1, 3: 2} |
它还可以与其他数据类型和"二维列表"一起使用,例如
1 2 3 | >>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']] >>> dict(zip(*np.unique(a, return_counts=True))) {'a': 3, 'b': 3, 'c': 2} |
用
安托弗可以通过
"重复"计数
1 2 3 4 5 6 | from itertools import groupby L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c'] # Input list counts = [(i, len(list(c))) for i,c in groupby(L)] # Create value-count pairs as list of tuples print(counts) |
退换商品
1 | [('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)] |
注意它是如何将前三个
具有唯一计数
如果需要唯一的组计数,只需对输入列表进行排序:
1 2 | counts = [(i, len(list(c))) for i,c in groupby(sorted(L))] print(counts) |
退换商品
1 | [('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)] |
您还可以使用内置模块
1 2 3 | >>> import operator >>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1) 3 |
以下是三种解决方案:
最快的是使用for循环并将其存储在dict中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import time from collections import Counter def countElement(a): g = {} for i in a: if i in g: g[i] +=1 else: g[i] =1 return g z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4] #Solution 1 - Faster st = time.monotonic() for i in range(1000000): b = countElement(z) et = time.monotonic() print(b) print('Simple for loop and storing it in dict - Duration: {}'.format(et - st)) #Solution 2 - Fast st = time.monotonic() for i in range(1000000): a = Counter(z) et = time.monotonic() print (a) print('Using collections.Counter - Duration: {}'.format(et - st)) #Solution 3 - Slow st = time.monotonic() for i in range(1000000): g = dict([(i, z.count(i)) for i in set(z)]) et = time.monotonic() print(g) print('Using list comprehension - Duration: {}'.format(et - st)) |
结果
1 #Solution 1 - Faster
1 2 | {1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3} Simple for loop and storing it in dict - Duration: 12.032000000000153 |
1 #Solution 2 - Fast
1 2 | Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1}) Using collections.Counter - Duration: 15.889999999999418 |
1 #Solution 3 - Slow
1 2 | {1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1} Using list comprehension - Duration: 33.0 |
可能不是最有效的,需要额外的通行证才能删除重复项。
功能实现:
1 2 | arr = np.array(['a','a','b','b','b','c']) print(set(map(lambda x : (x , list(arr).count(x)) , arr))) |
返回:
1 | {('c', 1), ('b', 3), ('a', 2)} |
或以
1 | print(dict(map(lambda x : (x , list(arr).count(x)) , arr))) |
返回:
1 | {'b': 3, 'c': 1, 'a': 2} |
1 | sum([1 for elem in <yourlist> if elem==<your_value>]) |
这将返回您的值
如果需要特定元素的出现次数:
1 2 3 4 5 6 7 | >>> from collections import Counter >>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red'] >>> single_occurrences = Counter(z) >>> print(single_occurrences.get("blue")) 3 >>> print(single_occurrences.values()) dict_values([3, 2, 1]) |
1 2 3 4 5 6 | def countfrequncyinarray(arr1): r=len(arr1) return {i:arr1.count(i) for i in range(1,r+1)} arr1=[4,4,4,4] a=countfrequncyinarray(arr1) print(a) |