关于python：为什么迭代一个小字符串比一个小列表更慢？

Why is it slower to iterate over a small string than a small list?

我在玩timeit，注意到在一个小字符串上做一个简单的列表理解比在一个小字符串列表上做同样的操作要花更长的时间。有什么解释吗？这几乎是时间的1.35倍。

1
2
3
4
5

>>> from timeit import timeit
>>> timeit("[x for x in 'abc']")
2.0691067844831528
>>> timeit("[x for x in ['a', 'b', 'c']]")
1.5286479570345861

在较低的层次上发生了什么导致了这一点？

相关讨论

页：1

(P)The actual speed difference is closer to 70%(or more)11 a lot of the overhead is removed，for Python 2.好的，好的。
(P)2.Object creation is not at fault.Neither method creates a new object，as one-character strings are cached.好的，好的。
(P)The difference is unobvious，but is like created from a greater number of checks on string indexing，with regards to the type and well-formedness.It is also quite likely thanks to the need to check what to return.好的，好的。
(P)List indexing is remarkably fast.好的，好的。

字母名称(P)这场灾难与你所发现的…好的，好的。(P)You must be using Python 2，then.好的，好的。字母名称(P)让我们解释一下版本之间的区别。I'll examine the compiled code.好的，好的。(P)For Python 3:好的，好的。字母名称(P)You see here that the list variant is like to be slower due to the building of the list each time.好的，好的。(P)这是好的，好的。字母名称(P)页：1The string variant only has好的，好的。字母名称(P)You can check that this does seem to make a difference：好的，好的。字母名称(P)This produces just好的，好的。字母名称(P)As Tuples are imutable.测试：好的，好的。字母名称(P)很好，回头见。好的，好的。(P)For Python 2:好的，好的。字母名称(P)The ODD thing is that we have the same building of the list，but it's still faster for this.Python 2 is acting strangely fast.好的，好的。(P)让我们重新开始，重新开始。The EDOCX1 o nique is to prevent it getting optimised out.好的，好的。字母名称(P)We can see that initialization is not significant enough to account for the difference between the versions(those numbers are small)！We can thus conclude that pyton 3 has slower comprehensions.This makes sense as pyton 3 changed comprehensions to have safe scoping.好的，好的。(P)Well，now improve the benchmark(I'm just removing over head that isn't iteration).This removes the building of the迭接ble by pre-assigning it：好的，好的。字母名称字母名称(P)我们可以检查如果呼叫EDOCX1的英文字母1是头：好的，好的。字母名称字母名称(P)不，不，不。The difference is too small，especially for Python 3.好的，好的。(P)这样我们就可以把它移开了做所有的事情！The aim is just to have a longer iteration so the time hides overhead.好的，好的。字母名称字母名称(P)这并不是一个真正的变化，但它是一个小的。好的，好的。(P)So remove the comprehension.It's overhead that's not part of the question:好的，好的。字母名称字母名称(P)That's more like it！We can get slightly faster still by using EDOCX1 English 2 to iterate.It's basically the same，but it's faster:好的，好的。字母名称字母名称(P)我的印象是，唯一的竞争对手是拜占庭。We can check this explicitly by trying EDOCX1 plus 3 welcx1 and EDOCX1 in both:好的，好的。

(P)字母名称好的，好的。字母名称字母名称(P)在这里，你看到Python 3 actually faster than Python 2。好的，好的。

(P)字母名称4好的，好的。

1
2
3
4
5

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join( chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 800 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = [ chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 394 usec per loop

1
2
3
4
5

>>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 1.07 msec per loop

>>> python2 -m timeit -s 'import random; from collections import deque; iterable = [unichr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 469 usec per loop

号

同样，python 3速度更快，尽管这是意料之中的(str在python3中引起了很大的关注)。好的。

事实上，这个unicode到bytes的差别非常小，令人印象深刻。好的。

因此，让我们分析一下这一个案例，因为它对我来说既快捷又方便：好的。

1
2
3
4
5

>>> python3 -m timeit -s 'import random; from collections import deque; iterable ="".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 777 usec per loop

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'
1000 loops, best of 3: 405 usec per loop

实际上，我们可以排除蒂姆·彼得10次投赞成票的答案！好的。

1
2
3

>>> foo = iterable[123]
>>> iterable[36] is foo
True

。这些不是新物体！

但这值得一提：指数化成本。差异可能在索引中，因此删除迭代并只索引：好的。

1
2
3
4
5

>>> python3 -m timeit -s 'import random; iterable ="".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'
10000000 loops, best of 3: 0.0397 usec per loop

>>> python3 -m timeit -s 'import random; iterable = [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'
10000000 loops, best of 3: 0.0374 usec per loop

差别似乎不大，但至少一半的成本是间接费用：好的。

1 2	>>> python3 -m timeit -s 'import random; iterable = [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123' 100000000 loops, best of 3: 0.0173 usec per loop

。

所以速度差就足以决定归咎于它。我想。好的。

那么，为什么索引列表的速度要快得多呢？好的。

好吧，我会回复你的，但我想这取决于检查内部字符串(或者缓存字符，如果它是一个单独的机制的话)。这将比最佳速度慢。但我会去查一下来源(虽然我对C不太满意)：。好的。

以下是资料来源：好的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

static PyObject *
unicode_getitem(PyObject *self, Py_ssize_t index)
{
void *data;
enum PyUnicode_Kind kind;
Py_UCS4 ch;
PyObject *res;

if (!PyUnicode_Check(self) || PyUnicode_READY(self) == -1) {
PyErr_BadArgument();
return NULL;
}
if (index < 0 || index >= PyUnicode_GET_LENGTH(self)) {
PyErr_SetString(PyExc_IndexError,"string index out of range");
return NULL;
}
kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
if (ch < 256)
return get_latin1_char(ch);

res = PyUnicode_New(1, ch);
if (res == NULL)
return NULL;
kind = PyUnicode_KIND(res);
data = PyUnicode_DATA(res);
PyUnicode_WRITE(kind, data, 0, ch);
assert(_PyUnicode_CheckConsistency(res, 1));
return res;
}

。

从上面走，我们会有一些支票。这些很无聊。然后一些分配，这也应该是无聊的。第一句有趣的话是好的。

1	ch = PyUnicode_READ(kind, data, index);

但我们希望这很快，因为我们通过索引从一个连续的C数组中读取数据。结果，ch将小于256，因此我们将返回get_latin1_char(ch)中的缓存字符。好的。

所以我们就跑(放弃第一张支票)好的。

1
2
3
4

kind = PyUnicode_KIND(self);
data = PyUnicode_DATA(self);
ch = PyUnicode_READ(kind, data, index);
return get_latin1_char(ch);

。

在哪里？好的。

1
2
3
4

#define PyUnicode_KIND(op) \
(assert(PyUnicode_Check(op)), \
assert(PyUnicode_IS_READY(op)), \
((PyASCIIObject *)(op))->state.kind)

(这很无聊，因为断言在调试时会被忽略[所以我可以检查它们是否快速]，并且((PyASCIIObject *)(op))->state.kind)是(我认为)间接的和C级的转换)；好的。

1
2
3
4

#define PyUnicode_DATA(op) \
(assert(PyUnicode_Check(op)), \
PyUnicode_IS_COMPACT(op) ? _PyUnicode_COMPACT_DATA(op) : \
_PyUnicode_NONCOMPACT_DATA(op))

。

(由于类似的原因，这也很无聊，假设宏(Something_CAPITALIZED都很快)好的。

1
2
3
4
5
6
7
8
9

#define PyUnicode_READ(kind, data, index) \
((Py_UCS4) \
((kind) == PyUnicode_1BYTE_KIND ? \
((const Py_UCS1 *)(data))[(index)] : \
((kind) == PyUnicode_2BYTE_KIND ? \
((const Py_UCS2 *)(data))[(index)] : \
((const Py_UCS4 *)(data))[(index)] \
) \
))

号

(涉及索引，但实际上并不慢)和好的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

static PyObject*
get_latin1_char(unsigned char ch)
{
PyObject *unicode = unicode_latin1[ch];
if (!unicode) {
unicode = PyUnicode_New(1, ch);
if (!unicode)
return NULL;
PyUnicode_1BYTE_DATA(unicode)[0] = ch;
assert(_PyUnicode_CheckConsistency(unicode, 1));
unicode_latin1[ch] = unicode;
}
Py_INCREF(unicode);
return unicode;
}

号

这证实了我的怀疑：好的。

这是缓存的：好的。
1
PyObject *unicode = unicode_latin1[ch];

号
这应该很快。if (!unicode)没有运行，因此在这种情况下，它实际上相当于好的。
1
2
3
PyObject *unicode = unicode_latin1[ch];
Py_INCREF(unicode);
return unicode;

号

老实说，在测试了assert的速度之后(通过禁用它们[我认为它在C级断言上有效…])，唯一合理的缓慢部分是：好的。

1
2
3

PyUnicode_IS_COMPACT(op)
_PyUnicode_COMPACT_DATA(op)
_PyUnicode_NONCOMPACT_DATA(op)

号

它们是：好的。

1 2	#define PyUnicode_IS_COMPACT(op) \ (((PyASCIIObject*)(op))->state.compact)

号

(像以前一样快)好的。

1
2
3
4

#define _PyUnicode_COMPACT_DATA(op) \
(PyUnicode_IS_ASCII(op) ? \
((void*)((PyASCIIObject*)(op) + 1)) : \
((void*)((PyCompactUnicodeObject*)(op) + 1)))

号

(如果宏IS_ASCII很快，则为快速)，以及好的。

1
2
3

#define _PyUnicode_NONCOMPACT_DATA(op) \
(assert(((PyUnicodeObject*)(op))->data.any), \
((((PyUnicodeObject *)(op))->data.any)))

号

(同样快速，因为它是断言加上间接寻址加上强制转换)。好的。

所以我们下(兔子洞)去：好的。

1	PyUnicode_IS_ASCII

号

哪个是好的。

1
2
3
4

#define PyUnicode_IS_ASCII(op) \
(assert(PyUnicode_Check(op)), \
assert(PyUnicode_IS_READY(op)), \
((PyASCIIObject*)op)->state.ascii)

隐马尔可夫模型。。。那似乎太快了…好的。

好吧，好吧，不过我们把它和PyList_GetItem比较一下。(是的，谢谢蒂姆·彼得斯给了我更多的工作要做：p。)好的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
if (!PyList_Check(op)) {
PyErr_BadInternalCall();
return NULL;
}
if (i < 0 || i >= Py_SIZE(op)) {
if (indexerr == NULL) {
indexerr = PyUnicode_FromString(
"list index out of range");
if (indexerr == NULL)
return NULL;
}
PyErr_SetObject(PyExc_IndexError, indexerr);
return NULL;
}
return ((PyListObject *)op) -> ob_item[i];
}

。

我们可以看到，在非错误情况下，这只会运行：好的。

1
2
3

PyList_Check(op)
Py_SIZE(op)
((PyListObject *)op) -> ob_item[i]

号

其中PyList_Check是好的。字母名称(P)好球，好球！Tabs！是吗？(issue21587)that got fixed and merged in 5 minutes.Like…是啊。Damn.他们都被诅咒了好的，好的。字母名称字母名称字母名称(P)So this is normally really little(two indirections and a couple of boolen checks)unless EDOCX1 genital is on，in which case…什么？是好的，好的。(P)他们有无独立和一个城堡(EDOCX1)和我们的捐赠。好的，好的。(P)So there are definitely fewer checks for lists，and the small speed differences certainly imply that it could be relevant.好的，好的。(P)I think in general，there's just more type-checking and indirection EDOCX1It seems I'm missing a point，but what？好的，好的。好吧。

相关讨论

您将代码表示为不言自明的，甚至将代码片段表示为结论。不幸的是，对我来说，我不能真正跟随它。不说你找出问题所在的方法不可靠，但如果更容易理解，那就更好了。

我试图改进它，但我不确定如何使它更清楚。注意，我不写C，所以这是对代码的高级分析，只有总体概念才是重要的。

如果你能加一个简短的总结段落就好了。

@我已经加了。如果感觉不够，告诉我。不幸的是，它也强调了我并不知道答案。

在我接受你的答案(我很想看到更具体的东西弹出)之前，我再给你一天时间，但感谢你提供了一个非常有趣和研究得很好的答案。

注意，您正在瞄准一个移动目标；-)这个实现不仅在python 2和python 3之间有所不同，而且在不同的版本之间也有所不同。例如，在当前的开发主干上，get_latin1_char()技巧不再存在于unicode_getitem()中，而是存在于较低级别的unicode_char中。所以现在有另一个级别的函数调用——或者不调用(取决于使用的编译器和优化标志)。在这个细节层次上，根本没有可靠的答案；-)

顺便说一句，也显示PyList_GetItem()的代码-在错误检查之后，它只是return ((PyListObject *)op) -> ob_item[i]。很短很瘦。字符串索引并不慢，但代码比列表索引要复杂得多。

@提姆彼得斯是的，好吧，不管怎样：P。完成。

当您迭代大多数容器对象(列表、元组、dicts，…)时，迭代器将传递容器中的对象。
但是当您在一个字符串上迭代时，必须为每个传递的字符创建一个新的对象-字符串不是"容器"，在同样的意义上，列表是一个容器。在迭代创建这些对象之前，字符串中的单个字符不作为不同的对象存在。

相关讨论

事实上，我不认为这是真的。你可以和is核对一下。听起来不错，但我真的觉得不行。

看看@veedrac答案。

stringobject.c表明，字符串的__getitem__只从存储的1个字符字符串表中检索结果，因此这些字符串的分配成本只发生一次。

@用户2357112，是的，对于Python2中的普通字符串来说，这是一个关键点。在python 3中，所有字符串都是"正式"的unicode，并且涉及到更多的细节(请参见veedrac的答案)。例如，在python 3中，在s = chr(256)之后，s is chr(256)返回False，因为只知道类型是不够的，因为在触发数据值的覆盖下存在大量特殊情况。

为字符串创建迭代器可能会产生开销。而数组在实例化时已经包含迭代器。
编辑：

1
2
3
4
>>> timeit("[x for x in ['a','b','c']]")
0.3818681240081787
>>> timeit("[x for x in 'abc']")
0.3732869625091553

这是用2.7运行的，但是在我的Mac Book Pro i7上。这可能是系统配置差异的结果。

相关讨论

即使只是使用直接迭代器，字符串仍然明显较慢。timeit("[x代表x in it]，"it=iter("abc"))=0.34543599384033535；timeit("[x代表x in it]，"it=iter(list("abc"))=0.27911691380446508

Python join:为什么是string.join(list)而不是list.join(string)?

使用for循环迭代字典

关于C++：为什么在独立循环中元素的添加比组合循环快得多？

为什么C++中的STDIN读行比Python慢得多？

关于Java：为什么处理一个排序数组比一个未排序数组更快？

关于python：如何在pandas中迭代数据帧中的行？

关于python：如果PyPy快6.3倍，为什么我不应该使用PyPy而不是CPython？

关于Java：为什么打印"B"比打印"#"慢得多？

关于python：为什么[]比list()快？