关于性能：如何分析Python脚本？

How can you profile a Python script?

ProjectEuler和其他编码竞赛通常有最长的运行时间，或者人们吹嘘他们的特定解决方案运行的速度有多快。对于python，有时这些方法有些笨拙——即向__main__添加计时代码。

什么是一个好的方法来描述运行一个python程序需要多长时间？

python包含一个名为cprofile的分析器。它不仅给出了总的运行时间，而且还分别对每个函数进行了多次调用，并告诉您每个函数被调用了多少次，这样就很容易确定应该在哪里进行优化。

您可以从代码内部或从解释器调用它，如下所示：

1 2	import cProfile cProfile.run('foo()')

更有用的是，您可以在运行脚本时调用cprofile：

1	python -m cProfile myscript.py

为了更简单，我制作了一个名为"profile.bat"的小批处理文件：

1	python -m cProfile %1

所以我要做的就是跑步：

1	profile euler048.py

我得到这个：

1
2
3
4
5
6
7
8
9
10
11
12

1007 function calls in 0.061 CPU seconds

Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.061 0.061 <string>:1(<module>)
1000 0.051 0.000 0.051 0.000 euler048.py:2(<lambda>)
1 0.005 0.005 0.061 0.061 euler048.py:2(<module>)
1 0.000 0.000 0.061 0.061 {execfile}
1 0.002 0.002 0.053 0.053 {map}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler objects}
1 0.000 0.000 0.000 0.000 {range}
1 0.003 0.003 0.003 0.003 {sum}

编辑：更新了Pycon 2013的视频资源链接，标题为python分析也可以通过YouTube。

相关讨论

不久前，我制作了pycallgraph，它从您的Python代码中生成一个可视化。编辑：我已经更新了这个示例，以便使用3.3，这是本文中最新的版本。

在pip install pycallgraph和安装graphviz之后，可以从命令行运行它：

1	pycallgraph graphviz -- ./mypythonscript.py

或者，您可以分析代码的特定部分：

1
2
3
4
5

from pycallgraph import PyCallGraph
from pycallgraph.output import GraphvizOutput

with PyCallGraph(output=GraphvizOutput()):
code_to_profile()

其中任何一个都将生成一个与下图类似的pycallgraph.png文件：

enter image description here

相关讨论

值得指出的是，使用探查器只在主线程上工作(默认情况下)，如果使用其他线程，就不会从中获得任何信息。这可能有点难懂，因为在探查器文档中完全没有提到它。

如果您还想分析线程，那么您需要查看文档中的threading.setprofile()函数。

您还可以创建自己的threading.Thread子类来执行此操作：

1
2
3
4
5
6
7
8

class ProfiledThread(threading.Thread):
# Overrides threading.Thread.run()
def run(self):
profiler = cProfile.Profile()
try:
return profiler.runcall(threading.Thread.run, self)
finally:
profiler.dump_stats('myprofile-%d.profile' % (self.ident,))

用那个ProfiledThread类代替标准类。它可能会给你更多的灵活性，但我不确定它是否值得，特别是如果你使用的是第三方代码，而第三方代码不会使用你的类。

相关讨论

python wiki是一个很好的分析资源页面：http://wiki.python.org/moin/pythonspeed/performancetips分析代码

与python文档一样：http://docs.python.org/library/profile.html网站

如Chris Lawlor所示，cprofile是一个很好的工具，可以很容易地用于打印到屏幕上：

1	python -m cProfile -s time mine.py

或归档：

1	python -m cProfile -o output.file mine.py

ps>如果您使用的是ubuntu，请确保安装python概要文件。

1	sudo apt-get install python-profiler

如果输出到文件，可以使用以下工具获得良好的可视化效果

pycallgraph：创建调用图图像的工具安装：

1	sudo pip install pycallgraph

运行：

1	pycallgraph mine.py args

观点：

1	gimp pycallgraph.png

你可以使用任何你想查看的png文件，我使用了gimp不幸的是我经常

点：图形对于cairo渲染器位图太大。缩放0.257079以适应

这使我的图像非常小。因此，我通常创建SVG文件：

1	pycallgraph -f svg -o pycallgraph.svg mine.py

ps>确保安装graphviz(提供点程序)：

1	sudo pip install graphviz

通过@maxy/@quotlibetor使用gprof2dot的替代图形：

1
2
3

sudo pip install gprof2dot
python -m cProfile -o profile.pstats mine.py
gprof2dot -f pstats profile.pstats | dot -Tsvg -o mine.svg

相关讨论

@Maxy对这个答案的评论帮了我很大的忙，我认为它应该有自己的答案：我已经生成了cprofile.pstats文件，我不想用pycallgraph重新运行这些文件，所以我使用了gprof2dot，得到了漂亮的svg：

1
2
3
4
5

$ sudo apt-get install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot
$ ln -s"$PWD"/gprof2dot/gprof2dot.py ~/bin
$ cd $PROJECT_DIR
$ gprof2dot.py -f pstats profile.pstats | dot -Tsvg -o callgraph.svg

和BLAM！

它使用点(和pycallgraph使用的一样)，所以输出看起来类似。我觉得gprof2dot会损失更少的信息，尽管：

gprof2dot example output

相关讨论

我在研究这个话题时遇到了一个叫做snakeviz的方便工具。snakeviz是一个基于Web的分析可视化工具。它很容易安装和使用。我使用它的通常方法是用%prun生成一个stat文件，然后在snakeviz中进行分析。

使用的主要VIZ技术是sunburst图表，如下图所示，其中函数调用的层次结构被安排为以角度宽度编码的弧层和时间信息。

最好的是你可以和图表交互。例如，要放大，可以单击一个弧，该弧及其子体将作为新的阳光束放大以显示更多详细信息。

enter image description here

我认为cProfile非常适合分析，而kcachegrind非常适合可视化结果。中间的pyprof2calltree处理文件转换。

1
2
3

python -m cProfile -o script.profile script.py
pyprof2calltree -i script.profile -o script.calltree
kcachegrind script.calltree

要安装所需的工具(至少在Ubuntu上)：

1 2	apt-get install kcachegrind pip install pyprof2calltree

结果：

Screenshot of the result

相关讨论

还值得一提的是gui cprofile dump viewer runsnakerun。它允许您进行排序和选择，从而放大程序的相关部分。图片中矩形的大小与所用时间成正比。如果将鼠标悬停在一个矩形上，它将突出显示在表中以及地图上的所有位置调用的矩形。当您双击一个矩形时，它会放大该部分。它将显示谁调用该部分以及该部分调用什么。

描述性信息非常有用。它向您显示了该位的代码，当您处理内置库调用时，这些代码会很有帮助。它告诉您查找代码的文件和行。

还想指出的是，OP说的"分析"，但似乎他是指"时机"。记住，程序在分析时运行速度会变慢。

enter image description here

一个好的分析模块是line_profiler(使用脚本kernprof.py调用)。它可以在这里下载。

我的理解是，cprofile只提供每个函数所花费的总时间的信息。所以单独的代码行没有计时。这是科学计算中的一个问题，因为通常一条线需要花费很多时间。而且，正如我所记得的，cprofile没有赶上我在say numpy.dot的时间。

剖面图

line_profiler(已在这里介绍)也启发了pprofile，描述如下：

Line-granularity, thread-aware deterministic and statistic pure-python
profiler

它提供了行粒度，如line_profiler，是纯python，可以作为独立的命令或模块使用，甚至可以生成callgrind格式的文件，可以用[k|q]cachegrind轻松分析。

VPROF

还有vprof，一个python包，描述如下：

[...] providing rich and interactive visualizations for various Python program characteristics such as running time and memory usage.

heatmap

最简单、最快捷的方法来找到所有时间都要去的地方。

1
2
3
4
5

1. pip install snakeviz

2. python -m cProfile -o temp.dat <PROGRAM>.py

3. snakeviz temp.dat

在浏览器中绘制饼图。最大的部分是问题函数。很简单。

我最近为可视化python运行时和导入配置文件创建了tuna；这在这里可能会有所帮助。

enter image description here

安装与

1	pip3 install tuna

创建运行时配置文件

1	python -mcProfile -o program.prof yourfile.py

或导入配置文件(需要python 3.7以上版本)

1	python -X importprofile yourfile.py 2> import.log

然后在文件上运行tuna

1	tuna program.prof

有很多很好的答案，但它们要么使用命令行，要么使用一些外部程序来分析和/或排序结果。

我真的错过了在我的IDE(EclipsePydev)中使用的一些方法，而不需要接触命令行或安装任何东西。就在这里。

不带命令行的分析

1
2
3
4
5
6
7
8
9
10
11

def count():
from math import sqrt
for x in range(10**5):
sqrt(x)

if __name__ == '__main__':
import cProfile, pstats
cProfile.run("count()","{}.profile".format(__file__))
s = pstats.Stats("{}.profile".format(__file__))
s.strip_dirs()
s.sort_stats("time").print_stats(10)

有关更多信息，请参阅文档或其他答案。

相关讨论

根据Joe Shaw关于多线程代码不能按预期工作的回答，我认为cprofile中的runcall方法只是围绕被分析的函数调用执行self.enable()和self.disable()调用，因此您可以自己执行，并在对现有代码的干扰最小的情况下拥有所需的任何代码。

相关讨论

在virtaal的源代码中，有一个非常有用的类和修饰符，它可以使分析(即使对于特定的方法/函数)非常容易。然后可以在kcachegrind中非常舒适地查看输出。

相关讨论

CProfile非常适合快速分析，但大多数时候它都以错误结束。函数runctx通过正确初始化环境和变量来解决这个问题，希望它对某些人有用：

1 2	import cProfile cProfile.runctx('foo()', None, locals())

我的方法是使用yappi(https://code.google.com/p/yappi/)。它与一个RPC服务器结合起来特别有用，在该服务器中(甚至只是为了调试)注册方法来启动、停止和打印分析信息，例如：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

@staticmethod
def startProfiler():
yappi.start()

@staticmethod
def stopProfiler():
yappi.stop()

@staticmethod
def printProfiler():
stats = yappi.get_stats(yappi.SORTTYPE_TTOT, yappi.SORTORDER_DESC, 20)
statPrint = '
'
namesArr = [len(str(stat[0])) for stat in stats.func_stats]
log.debug("namesArr %s", str(namesArr))
maxNameLen = max(namesArr)
log.debug("maxNameLen: %s", maxNameLen)

for stat in stats.func_stats:
nameAppendSpaces = [' ' for i in range(maxNameLen - len(stat[0]))]
log.debug('nameAppendSpaces: %s', nameAppendSpaces)
blankSpace = ''
for space in nameAppendSpaces:
blankSpace += space

log.debug("adding spaces: %s", len(nameAppendSpaces))
statPrint = statPrint + str(stat[0]) + blankSpace +"" + str(stat[1]).ljust(8) +"\t" + str(
round(stat[2], 2)).ljust(8 - len(str(stat[2]))) +"\t" + str(round(stat[3], 2)) +"
"

log.log(1000,"
name" + ''.ljust(maxNameLen - 4) +" ncall \tttot \ttsub")
log.log(1000, statPrint)

然后，当您的程序运行时，您可以随时通过调用startProfilerrpc方法启动profiler，并通过调用printProfiler将分析信息转储到日志文件(或修改rpc方法将其返回给调用方)并获得这样的输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

2014-02-19 16:32:24,128-|SVR-MAIN |-(Thread-3 )-Level 1000:
name ncall ttot tsub
2014-02-19 16:32:24,128-|SVR-MAIN |-(Thread-3 )-Level 1000:
C:\Python27\lib\sched.py.run:80 22 0.11 0.05
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\xmlRpc.py.iterFnc:293 22 0.11 0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\serverMain.py.makeIteration:515 22 0.11 0.0
M:\02_documents\_repos\09_aheadRepos\apps\ahdModbusSrv\pyAheadRpcSrv\PicklingXMLRPC.py._dispatch:66 1 0.0 0.0
C:\Python27\lib\BaseHTTPServer.py.date_time_string:464 1 0.0 0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py._get_raw_meminfo:243 4 0.0 0.0
C:\Python27\lib\SimpleXMLRPCServer.py.decode_request_content:537 1 0.0 0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\_psmswindows.py.get_system_cpu_times:148 4 0.0 0.0
<string>.__new__:8 220 0.0 0.0
C:\Python27\lib\socket.py.close:276 4 0.0 0.0
C:\Python27\lib\threading.py.__init__:558 1 0.0 0.0
<string>.__new__:8 4 0.0 0.0
C:\Python27\lib\threading.py.notify:372 1 0.0 0.0
C:\Python27\lib
fc822.py.getheader:285 4 0.0 0.0
C:\Python27\lib\BaseHTTPServer.py.handle_one_request:301 1 0.0 0.0
C:\Python27\lib\xmlrpclib.py.end:816 3 0.0 0.0
C:\Python27\lib\SimpleXMLRPCServer.py.do_POST:467 1 0.0 0.0
C:\Python27\lib\SimpleXMLRPCServer.py.is_rpc_path_valid:460 1 0.0 0.0
C:\Python27\lib\SocketServer.py.close_request:475 1 0.0 0.0
c:\users\zasiec~1\appdata\local\temp\easy_install-hwcsr1\psutil-1.1.2-py2.7-win32.egg.tmp\psutil\__init__.py.cpu_times:1066 4 0.0 0.0

它可能对短脚本不太有用，但有助于优化服务器类型的进程，特别是考虑到随着时间的推移，可以多次调用printProfiler方法来分析和比较，例如不同的程序使用场景。

相关讨论

pyvmmonitor是在python中处理分析的一个新工具：http://www.pyvmmonitor.com/

它有一些独特的特点，例如

将探查器附加到正在运行的(cpython)程序
Yappi集成的按需分析
不同机器上的配置文件
多进程支持(多进程、django…)
实时采样/CPU视图(带时间范围选择)
通过cprofile/profile集成进行确定性分析
分析现有的PSTAT结果
打开点文件
编程API访问
按方法或行对样本分组
Pydev集成
Pycharm集成

注意：它是商业的，但对于开源是免费的。

Ever want to know what the hell that python script is doing? Enter the
Inspect Shell. Inspect Shell lets you print/alter globals and run
functions without interrupting the running script. Now with
auto-complete and command history (only on linux).

Inspect Shell is not a pdb-style debugger.

https://github.com/amoffat/inspect-shell

你可以用那个(还有你的手表)。

这取决于您想从分析中看到什么。简单时间度量可以通过(bash)给出。

1	time python python_prog.py

即使是'/usr/bin/time'也可以使用'--verbose'标志输出详细的度量。

为了检查每个函数给出的时间度量，并更好地了解在函数上花费了多少时间，可以在python中使用内置的cprofile。

进入更详细的指标，比如性能，时间并不是唯一的指标。你可以担心内存、线程等。分析选项：1。line-profiler是另一个profiler，通常用于逐行查找计时指标。2。内存分析器是一种分析内存使用情况的工具。三。heapy(来自项目guppy)描述如何使用堆中的对象。

这些是我常用的。但是，如果你想了解更多，试试看这本书这是一本很好的书，从表演开始。您可以转到有关使用Cython和JIT(及时)编译的Python的高级主题。

要添加到https://stackoverflow.com/a/582337/1070617，

我编写的这个模块允许您使用CProfile并轻松查看其输出。更多信息请访问：https://github.com/ymichael/cprofilev

1 2	$ python -m cprofilev /your/python/program # Go to http://localhost:4000 to view collected statistics.

另请参见：http://ymichael.com/2014/03/08/profileing-python-with-cprofile.html，了解如何理解收集的统计信息。

还有一个叫做statprof的统计分析工具。它是一个采样分析器，因此它为代码增加了最小的开销，并给出了基于行(而不仅仅是基于函数)的时间安排。它更适用于软实时应用程序，如游戏，但其精度可能低于cprofile。

pypi中的版本有点旧，因此可以通过指定git存储库将其与pip一起安装：

1	pip install git+git://github.com/bos/statprof.py@1a33eba91899afe17a8b752c6dfdec6f05dd0c01

您可以这样运行它：

1
2
3
4

import statprof

with statprof.profile():
my_questionable_function()

另请参阅https://stackoverflow.com/a/10333592/320036

当我不是服务器的根用户时，我使用lsprofcalltree.py并按如下方式运行我的程序：

1	python lsprofcalltree.py -o callgrind.1 test.py

然后我可以用任何与callgrind兼容的软件打开报告，比如qcachegrind。