Exceptions catching performance in python
我知道python中的异常在
这是否意味着:
1 2 3 4 | try: some code except MyException: pass |
比这更快?
1 2 3 4 | try: some code except MyException as e: pass |
除了Francesco的答案之外,似乎捕获的(相对)昂贵部分之一是异常匹配:
1 2 3 4 5 6 7 8 9 10 | >>> timeit.timeit('try: raise KeyError except KeyError: pass', number=1000000 ) 1.1587663322268327 >>> timeit.timeit('try: raise KeyError except: pass', number=1000000 ) 0.9180641582179874 |
看(CPython 2)反汇编:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | >>> def f(): ... try: ... raise KeyError ... except KeyError: ... pass ... >>> def g(): ... try: ... raise KeyError ... except: ... pass ... >>> dis.dis(f) 2 0 SETUP_EXCEPT 10 (to 13) 3 3 LOAD_GLOBAL 0 (KeyError) 6 RAISE_VARARGS 1 9 POP_BLOCK 10 JUMP_FORWARD 17 (to 30) 4 >> 13 DUP_TOP 14 LOAD_GLOBAL 0 (KeyError) 17 COMPARE_OP 10 (exception match) 20 POP_JUMP_IF_FALSE 29 23 POP_TOP 24 POP_TOP 25 POP_TOP 5 26 JUMP_FORWARD 1 (to 30) >> 29 END_FINALLY >> 30 LOAD_CONST 0 (None) 33 RETURN_VALUE >>> dis.dis(g) 2 0 SETUP_EXCEPT 10 (to 13) 3 3 LOAD_GLOBAL 0 (KeyError) 6 RAISE_VARARGS 1 9 POP_BLOCK 10 JUMP_FORWARD 7 (to 20) 4 >> 13 POP_TOP 14 POP_TOP 15 POP_TOP 5 16 JUMP_FORWARD 1 (to 20) 19 END_FINALLY >> 20 LOAD_CONST 0 (None) 23 RETURN_VALUE |
请注意,catch块无论如何都会加载Exception并将其与
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | >>> def f2(): ... try: ... raise KeyError ... except KeyError as ke: ... pass ... >>> dis.dis(f2) 2 0 SETUP_EXCEPT 10 (to 13) 3 3 LOAD_GLOBAL 0 (KeyError) 6 RAISE_VARARGS 1 9 POP_BLOCK 10 JUMP_FORWARD 19 (to 32) 4 >> 13 DUP_TOP 14 LOAD_GLOBAL 0 (KeyError) 17 COMPARE_OP 10 (exception match) 20 POP_JUMP_IF_FALSE 31 23 POP_TOP 24 STORE_FAST 0 (ke) 27 POP_TOP 5 28 JUMP_FORWARD 1 (to 32) >> 31 END_FINALLY >> 32 LOAD_CONST 0 (None) 35 RETURN_VALUE |
唯一的区别是单个
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | >>> def f(): ... try: ... raise ValueError ... except KeyError: ... pass ... except IOError: ... pass ... except SomeOtherError: ... pass ... except: ... pass ... >>> dis.dis(f) 2 0 SETUP_EXCEPT 10 (to 13) 3 3 LOAD_GLOBAL 0 (ValueError) 6 RAISE_VARARGS 1 9 POP_BLOCK 10 JUMP_FORWARD 55 (to 68) 4 >> 13 DUP_TOP 14 LOAD_GLOBAL 1 (KeyError) 17 COMPARE_OP 10 (exception match) 20 POP_JUMP_IF_FALSE 29 23 POP_TOP 24 POP_TOP 25 POP_TOP 5 26 JUMP_FORWARD 39 (to 68) 6 >> 29 DUP_TOP 30 LOAD_GLOBAL 2 (IOError) 33 COMPARE_OP 10 (exception match) 36 POP_JUMP_IF_FALSE 45 39 POP_TOP 40 POP_TOP 41 POP_TOP 7 42 JUMP_FORWARD 23 (to 68) 8 >> 45 DUP_TOP 46 LOAD_GLOBAL 3 (SomeOtherError) 49 COMPARE_OP 10 (exception match) 52 POP_JUMP_IF_FALSE 61 55 POP_TOP 56 POP_TOP 57 POP_TOP 9 58 JUMP_FORWARD 7 (to 68) 10 >> 61 POP_TOP 62 POP_TOP 63 POP_TOP 11 64 JUMP_FORWARD 1 (to 68) 67 END_FINALLY >> 68 LOAD_CONST 0 (None) 71 RETURN_VALUE |
将复制异常并尝试将其与列出的每个异常进行匹配,逐一进行,直到找到匹配为止,这可能(可能)被暗示为"捕获性能差"。
我认为两者在速度方面是相同的:
1 2 3 4 5 6 7 8 9 10 | >>> timeit.timeit('try: raise KeyError except KeyError: pass', number=1000000 ) 0.7168641227143269 >>> timeit.timeit('try: raise KeyError except KeyError as e: pass', number=1000000 ) 0.7733279216613766 |
捕获并不昂贵,看起来相对较慢的部分是堆栈跟踪本身的创建,如果需要,随后展开堆栈。
我所知道的所有基于堆栈的语言都允许您捕获堆栈跟踪,这需要执行这些操作。
与上述两个操作相比,捕获量微不足道。下面是一些代码来演示随着堆栈深度的增加,性能下降。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | #!/usr/bin/env python import os import re import time import pytest max_depth = 10 time_start = [0] * (max_depth + 1) time_stop = [0] * (max_depth + 1) time_total = [0] * (max_depth + 1) depth = [] for x in range(0, max_depth): depth.append(x) @pytest.mark.parametrize('i', depth) def test_stack(benchmark, i): benchmark.pedantic(catcher2, args=(i,i), rounds=10, iterations=1000) #@pytest.mark.parametrize('d', depth) #def test_recursion(benchmark, d): # benchmark.pedantic(catcher, args=(d,), rounds=50, iterations=50) def catcher(i, depth): try: ping(i, depth) except Exception: time_total[depth] += time.clock() - time_start[depth] def recurse(i, depth): if(d > 0): recurse(--i, depth) thrower(depth) def catcher2(i, depth): global time_total global time_start try: ping(i, depth) except Exception: time_total[depth] += time.clock() - time_start[depth] def thrower(depth): global time_start time_start[depth] = time.clock() raise Exception('wtf') def ping(i, depth): if(i < 1): thrower(i, depth) return pong(i, depth) def pong(i, depth): if(i < 0): thrower(i,depth) return ping(i - 4, depth) if __name__ =="__main__": rounds = 200000 class_time = 0 class_start = time.clock() for round in range(0, rounds): ex = Exception() class_time = time.clock() - class_start print("%d ex = Exception()'s %f" % (rounds, class_time)) for depth in range(0, max_depth): #print("Depth %d" % depth) for round in range(0, rounds): catcher(depth, depth) for rep in range(0, max_depth): print("depth=%d time=%f" % (rep, time_total[rep]/1000000)) |
输出是,时间(相对时间)调用
1 2 3 4 5 6 7 8 9 10 11 12 | 200000 ex = Exception()'s 0.040469 depth=0 time=0.103843 depth=1 time=0.246050 depth=2 time=0.401459 depth=3 time=0.565742 depth=4 time=0.736362 depth=5 time=0.921993 depth=6 time=1.102257 depth=7 time=1.278089 depth=8 time=1.463500 depth=9 time=1.657082 |
Python上比我更好的人可能会得到
注意,几周前有一个与Java有关的问题非常相似。无论使用何种语言,它都是一个非常有用的线索......
抛出异常的哪一部分是昂贵的?
Python程序由代码块构成。块是一段Python程序文本,作为一个单元执行。在Python中,核心块表示为struct basicblock:
CPython的/ Python的/ compile.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | typedef struct basicblock_ { /* Each basicblock in a compilation unit is linked via b_list in the reverse order that the block are allocated. b_list points to the next block, not to be confused with b_next, which is next by control flow. */ struct basicblock_ *b_list; /* number of instructions used */ int b_iused; /* length of instruction array (b_instr) */ int b_ialloc; /* pointer to an array of instructions, initially NULL */ struct instr *b_instr; /* If b_next is non-NULL, it is a pointer to the next block reached by normal control flow. */ struct basicblock_ *b_next; /* b_seen is used to perform a DFS of basicblocks. */ unsigned b_seen : 1; /* b_return is true if a RETURN_VALUE opcode is inserted. */ unsigned b_return : 1; /* depth of stack upon entry of block, computed by stackdepth() */ int b_startdepth; /* instruction offset for block, computed by assemble_jump_offsets() */ int b_offset; } basicblock; |
循环,try / except和try / finally语句处理不同的东西。对于这3个语句使用框架块:
CPython的/ Python的/ compile.c
1 2 3 4 5 6 | enum fblocktype { LOOP, EXCEPT, FINALLY_TRY, FINALLY_END }; struct fblockinfo { enum fblocktype fb_type; basicblock *fb_block; }; |
代码块在执行帧中执行。
CPython的/包括/ frameobject.h
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; /* previous frame, or NULL */ PyCodeObject *f_code; /* code segment */ PyObject *f_builtins; /* builtin symbol table (PyDictObject) */ PyObject *f_globals; /* global symbol table (PyDictObject) */ PyObject *f_locals; /* local symbol table (any mapping) */ PyObject **f_valuestack; /* points after the last local */ /* Next free slot in f_valuestack. Frame creation sets to f_valuestack. Frame evaluation usually NULLs it, but a frame that yields sets it to the current stack top. */ PyObject **f_stacktop; PyObject *f_trace; /* Trace function */ /* In a generator, we need to be able to swap between the exception state inside the generator and the exception state of the calling frame (which shouldn't be impacted when the generator"yields" from an except handler). These three fields exist exactly for that, and are unused for non-generator frames. See the save_exc_state and swap_exc_state functions in ceval.c for details of their use. */ PyObject *f_exc_type, *f_exc_value, *f_exc_traceback; /* Borrowed reference to a generator, or NULL */ PyObject *f_gen; int f_lasti; /* Last instruction if called */ /* Call PyFrame_GetLineNumber() instead of reading this field directly. As of 2.3 f_lineno is only valid when tracing is active (i.e. when f_trace is set). At other times we use PyCode_Addr2Line to calculate the line from the current bytecode index. */ int f_lineno; /* Current line number */ int f_iblock; /* index in f_blockstack */ char f_executing; /* whether the frame is still executing */ PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */ PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */ } PyFrameObject; |
一个框架包含一些管理信息(用于调试),并确定代码块执行完成后执行的持续时间和方式。当您使用'as'语句时(在'import something as'或'Exception as'语句中),您只需执行名称绑定操作。即Python只是在框架对象的* f_locals符号表中添加对象的引用。因此,运行时不会产生任何开销。
但是你在分析时会有一些开销。
CPython的/模块/ parsermodule.c
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | static int validate_except_clause(node *tree) { int nch = NCH(tree); int res = (validate_ntype(tree, except_clause) && ((nch == 1) || (nch == 2) || (nch == 4)) && validate_name(CHILD(tree, 0),"except")); if (res && (nch > 1)) res = validate_test(CHILD(tree, 1)); if (res && (nch == 4)) res = (validate_name(CHILD(tree, 2),"as") && validate_ntype(CHILD(tree, 3), NAME)); return (res); } |
但是,在我看来,这可以忽略不计