Python string formatting: % vs. .format
python 2.6引入了
下面使用每种方法,结果相同,那么有什么区别呢?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/python sub1 ="python string!" sub2 ="an arg" a ="i am a %s" % sub1 b ="i am a {0}".format(sub1) c ="with %(kwarg)s!" % {'kwarg':sub2} d ="with {kwarg}!".format(kwarg=sub2) print a #"i am a python string!" print b #"i am a python string!" print c #"with an arg!" print d #"with an arg!" |
此外,字符串格式化在Python中何时发生?例如,如果我的日志记录级别设置为高,那么执行以下
1 | log.debug("some debug info: %s" % some_info) |
回答你的第一个问题…
1 | "hi there %s" % name |
然而,如果
1 | "hi there %s" % (name,) # supply the single argument as a single-item tuple |
真是难看。
你为什么不使用它?
- 不知道(我读之前)
- 必须与python 2.5兼容
要回答第二个问题,字符串格式化与任何其他操作同时发生-在计算字符串格式化表达式时。python不是一种懒惰的语言,它在调用函数之前对表达式进行计算,因此在您的
模运算符(%)无法执行的操作,afaik:
1 2 | tu = (12,45,22222,103,6) print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu) |
结果
1 | 12 22222 45 22222 103 22222 6 22222 |
非常有用。
另一点:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | li = [12,45,78,784,2,69,1254,4785,984] print map('the number is {}'.format,li) from datetime import datetime,timedelta once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0) delta = timedelta(days=13, hours=8, minutes=20) gen =(once_upon_a_time +x*delta for x in xrange(20)) print ' '.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen)) |
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984'] 2010-07-01 12:00:00 2010-07-14 20:20:00 2010-07-28 04:40:00 2010-08-10 13:00:00 2010-08-23 21:20:00 2010-09-06 05:40:00 2010-09-19 14:00:00 2010-10-02 22:20:00 2010-10-16 06:40:00 2010-10-29 15:00:00 2010-11-11 23:20:00 2010-11-25 07:40:00 2010-12-08 16:00:00 2010-12-22 00:20:00 2011-01-04 08:40:00 2011-01-17 17:00:00 2011-01-31 01:20:00 2011-02-13 09:40:00 2011-02-26 18:00:00 2011-03-12 02:20:00 |
假设您使用的是python的
1 | log.debug("some debug info: %s", some_info) |
这样可以避免进行格式化,除非记录器实际记录了某些内容。
从python 3.6(2016)起,可以使用f字符串替换变量:
1 2 3 4 | >>> origin ="London" >>> destination ="Paris" >>> f"from {origin} to {destination}" 'from London to Paris' |
注意
参见https://docs.python.org/3.6/reference/lexical_analysis.html_f-strings
PEP 3101建议用python 3中新的高级字符串格式替换
但是请小心,刚才我在用现有代码中的
看看这个python交互会话日志:
1 2 3 4 5 6 7 8 | Python 2.7.2 (default, Aug 27 2012, 19:52:55) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 ; s='й' ; u=u'й' ; s '\xd0\xb9' ; u u'\u0439' |
1 2 3 4 | ; '%s' % s '\xd0\xb9' ; '%s' % u u'\u0439' |
将unicode对象作为参数提供给
1 2 3 4 5 6 | ; '{}'.format(s) '\xd0\xb9' ; '{}'.format(u) Traceback (most recent call last): File"<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256) |
但是,
1 2 3 4 | ; u'{}'.format(s) u'\xd0\xb9' ; u'{}'.format(u) u'\u0439' |
只有原始字符串是unicode时,它才能使用unicode参数fine。
1 2 | ; '{}'.format(u'i') 'i' |
或者参数字符串是否可以转换为字符串(所谓的"字节数组")。
1 2 3 4 5 6 7 8 9 10 | In [12]: class A(object): ....: def __init__(self, x, y): ....: self.x = x ....: self.y = y ....: In [13]: a = A(2,3) In [14]: 'x is {0.x}, y is {0.y}'.format(a) Out[14]: 'x is 2, y is 3' |
或者,作为关键字参数:
1 2 | In [15]: 'x is {a.x}, y is {a.y}'.format(a=a) Out[15]: 'x is 2, y is 3' |
据我所知,用
正如我今天发现的,通过
示例(使用python 3.3.5):
1 2 3 4 5 6 7 8 9 | #!/usr/bin/env python3 from decimal import * getcontext().prec = 50 d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard print('%.50f' % d) print('{0:.50f}'.format(d)) |
输出:
0.00000000000000000000000312375239000000009907464850
0.00000000000000000000000312375239000000000000000000
当然可能会有一些解决办法,但您仍然可以考虑立即使用
从我的测试来看,
测试代码:
Python 2.7.2:
1 2 3 | import timeit print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')") print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')") |
结果:
1 2 | > format: 0.470329046249 > %: 0.357107877731 |
Python 3.5.2
1 2 3 | import timeit print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")) print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")) |
结果
1 2 | > format: 0.5864730989560485 > %: 0.013593495357781649 |
在python2中,差异很小,而在python3中,
感谢@chris cogdon提供示例代码。
作为旁注,您不必为日志使用新样式格式而受到性能影响。您可以将任何对象传递给
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import logging class NewStyleLogMessage(object): def __init__(self, message, *args, **kwargs): self.message = message self.args = args self.kwargs = kwargs def __str__(self): args = (i() if callable(i) else i for i in self.args) kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items()) return self.message.format(*args, **kwargs) N = NewStyleLogMessage # Neither one of these messages are formatted (or calculated) until they're # needed # Emits"Lazily formatted log entry: 123 foo" in log logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo')) def expensive_func(): # Do something that takes a long time... return 'foo' # Emits"Expensive log entry: foo" in log logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func)) |
这都在python 3文档(https://docs.python.org/3/howto/logging cookbook.html格式化样式)中描述。但是,它也可以与Python2.6一起使用(https://docs.python.org/2.6/library/logging.html将任意对象用作消息)。
使用此技术的一个优点是,它允许使用惰性值,例如上面的函数
如果您的python>=3.6,那么f-string格式的literal就是您的新朋友。
它更简单、干净、性能更好。
1 2 3 4 5 6 7 8 9 10 | In [1]: params=['Hello', 'adam', 42] In [2]: %timeit"%s %s, the answer to everything is %d."%(params[0],params[1],params[2]) 448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [3]: %timeit"{} {}, the answer to everything is {}.".format(*params) 449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}." 12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each) |
当您格式化regex表达式时,
1 | '{type_names} [a-z]{2}'.format(type_names='triangle|square') |
升起
1 | '%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'} |
这样可以避免将regex写为
我想补充一下,从3.6版开始,我们可以使用如下的fstring
1 2 3 | foo ="john" bar ="smith" print(f"My name is {foo} {bar}") |
给出
My name is john smith
所有内容都转换为字符串
1 2 | mylist = ["foo","bar"] print(f"mylist = {mylist}") |
结果:
mylist = ['foo', 'bar']
您可以传递函数,就像在其他格式方法中一样
1 | print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}') |
举个例子
Hello, here is the date : 16/04/2018
对于python版本>=3.6(请参阅PEP 498)
1 2 3 4 5 6 7 | s1='albha' s2='beta' f'{s1}{s2:>10}' #output 'albha beta' |
但有一件事是,如果您有嵌套的大括号,将不适用于格式,但
例子:
1 2 3 4 5 6 7 8 | >>> '{{0}, {1}}'.format(1,2) Traceback (most recent call last): File"<pyshell#3>", line 1, in <module> '{{0}, {1}}'.format(1,2) ValueError: Single '}' encountered in format string >>> '{%s, %s}'%(1,2) '{1, 2}' >>> |
python 3.6.7比较:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | #!/usr/bin/env python import timeit def time_it(fn): """ Measure time of execution of a function """ def wrapper(*args, **kwargs): t0 = timeit.default_timer() fn(*args, **kwargs) t1 = timeit.default_timer() print("{0:.10f} seconds".format(t1 - t0)) return wrapper @time_it def new_new_format(s): print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}") @time_it def new_format(s): print("new_format:","{0} {1} {2} {3} {4}".format(*s)) @time_it def old_format(s): print("old_format:","%s %s %s %s %s" % s) def main(): samples = (("uno","dos","tres","cuatro","cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14,"cuatro", 5.5),) for s in samples: new_new_format(s) new_format(s) old_format(s) print("-----") if __name__ == '__main__': main() |
输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | new_new_format: uno dos tres cuatro cinco 0.0000170280 seconds new_format: uno dos tres cuatro cinco 0.0000046750 seconds old_format: uno dos tres cuatro cinco 0.0000034820 seconds ----- new_new_format: 1 2 3 4 5 0.0000043980 seconds new_format: 1 2 3 4 5 0.0000062590 seconds old_format: 1 2 3 4 5 0.0000041730 seconds ----- new_new_format: 1.1 2.1 3.1 4.1 5.1 0.0000092650 seconds new_format: 1.1 2.1 3.1 4.1 5.1 0.0000055340 seconds old_format: 1.1 2.1 3.1 4.1 5.1 0.0000052130 seconds ----- new_new_format: uno 2 3.14 cuatro 5.5 0.0000053380 seconds new_format: uno 2 3.14 cuatro 5.5 0.0000047570 seconds old_format: uno 2 3.14 cuatro 5.5 0.0000045320 seconds ----- |
严格地说,我们确实离最初的话题越来越远了,但为什么不呢:
当使用getText模块提供本地化的GUI时,旧样式字符串和新样式字符串是唯一的方法;不能在那里使用F字符串。我觉得新款式是这个案子的最佳选择。这里有一个问题。