python的字符串连接与str.join相比有多慢？

How slow is Python's string concatenation vs. str.join?

由于我在回答这个问题时的评论，我想知道+=操作符和''.join()操作符之间的速度差是多少。

那么，两者之间的速度比较是什么呢？

相关讨论

发件人：有效的字符串连接

方法1：

1
2
3
4
5

def method1():
out_str = ''
for num in xrange(loop_count):
out_str += 'num'
return out_str

方法4：

1
2
3
4
5

def method4():
str_list = []
for num in xrange(loop_count):
str_list.append('num')
return ''.join(str_list)

现在我意识到它们并不具有严格的代表性，第四种方法在遍历和联接每个项之前附加到一个列表中，但这是一个公平的指示。

字符串联接比连接快得多。

为什么？字符串是不可变的，不能就地更改。要更改一个，需要创建一个新的表示(二者的串联)。

alt text

相关讨论

我的原始代码是错误的，看起来+连接通常更快(尤其是在较新硬件上使用较新版本的python时)。

时间如下：

1	Iterations: 1,000,000

Windows7上的python 3.3，核心i7

1
2
3
4
5
6
7

String of len: 1 took: 0.5710 0.2880 seconds
String of len: 4 took: 0.9480 0.5830 seconds
String of len: 6 took: 1.2770 0.8130 seconds
String of len: 12 took: 2.0610 1.5930 seconds
String of len: 80 took: 10.5140 37.8590 seconds
String of len: 222 took: 27.3400 134.7440 seconds
String of len: 443 took: 52.9640 170.6440 seconds

Windows7上的python 2.7，核心i7

1
2
3
4
5
6
7

String of len: 1 took: 0.7190 0.4960 seconds
String of len: 4 took: 1.0660 0.6920 seconds
String of len: 6 took: 1.3300 0.8560 seconds
String of len: 12 took: 1.9980 1.5330 seconds
String of len: 80 took: 9.0520 25.7190 seconds
String of len: 222 took: 23.1620 71.3620 seconds
String of len: 443 took: 44.3620 117.1510 seconds

在linux mint、python 2.7上，一些较慢的处理器

1
2
3
4
5
6
7

String of len: 1 took: 1.8840 1.2990 seconds
String of len: 4 took: 2.8394 1.9663 seconds
String of len: 6 took: 3.5177 2.4162 seconds
String of len: 12 took: 5.5456 4.1695 seconds
String of len: 80 took: 27.8813 19.2180 seconds
String of len: 222 took: 69.5679 55.7790 seconds
String of len: 443 took: 135.6101 153.8212 seconds

代码如下：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

from __future__ import print_function
import time

def strcat(string):
newstr = ''
for char in string:
newstr += char
return newstr

def listcat(string):
chars = []
for char in string:
chars.append(char)
return ''.join(chars)

def test(fn, times, *args):
start = time.time()
for x in range(times):
fn(*args)
return"{:>10.4f}".format(time.time() - start)

def testall():
strings = ['a', 'long', 'longer', 'a bit longer',
'''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz oijewf sdkjjka dsf sdk siasjk dfwijs''',
'''this is a really long string that's so long
it had to be triple quoted and contains lots of
superflous characters for kicks and gigles
@!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
'''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']

for string in strings:
print("String of len:", len(string),"took:", test(listcat, 1000000, string), test(strcat, 1000000, string),"seconds")

testall()

相关讨论

现有的答案写得很好，研究得也很好，但这里是另一个关于python 3.6时代的答案，因为现在我们有了文字字符串插值(aka，f字符串)：

1
2
3
4
5
6
7

>>> import timeit
>>> timeit.timeit('f\'{"a"}{"b"}{"c"}\'', number=1000000)
0.14618930302094668
>>> timeit.timeit('"".join(["a","b","c"])', number=1000000)
0.23334730707574636
>>> timeit.timeit('a ="a"; a +="b"; a +="c"', number=1000000)
0.14985873899422586

使用CPython 3.6.5在2012视网膜MacBook Pro上进行测试，Intel Core i7的频率为2.3 GHz。

这绝不是任何正式的基准，但看起来使用f字符串的性能与使用+=串联的性能大致相同；当然，欢迎任何改进的指标或建议。

相关讨论

我重写了最后一个答案，周可以分享一下你对我测试方法的看法吗？

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

import time

start1 = time.clock()
for x in range (10000000):
dog1 = ' and '.join(['spam', 'eggs', 'spam', 'spam', 'eggs', 'spam','spam', 'eggs', 'spam', 'spam', 'eggs', 'spam'])

end1 = time.clock()
print("Time to run Joiner =", end1 - start1,"seconds")

start2 = time.clock()
for x in range (10000000):
dog2 = 'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'+' and '+'spam'+' and '+'eggs'+' and '+'spam'

end2 = time.clock()
print("Time to run + =", end2 - start2,"seconds")

注意：这个例子是用python 3.5编写的，其中range()的作用类似于前一个xrange()。

我得到的输出：

1 2	Time to run Joiner = 27.086106206103153 seconds Time to run + = 69.79100515996426 seconds

就我个人而言，我更喜欢''加入([])而不是'plusser方式'，因为它更干净，更易读。

这就是愚蠢的程序设计用来测试的原因：)

使用加

1
2
3
4
5
6
7
8
9

import time

if __name__ == '__main__':
start = time.clock()
for x in range (1, 10000000):
dog ="a" +"b"

end = time.clock()
print"Time to run Plusser =", end - start,"seconds"

产量：

1	Time to run Plusser = 1.16350010965 seconds

现在加入……

1
2
3
4
5
6
7
8

import time
if __name__ == '__main__':
start = time.clock()
for x in range (1, 10000000):
dog ="a".join("b")

end = time.clock()
print"Time to run Joiner =", end - start,"seconds"

产量：

1	Time to run Joiner = 21.3877386651 seconds

所以在Windows上的python2.6上，我会说+比join快18倍。)

相关讨论