Code runs much faster in C than in NumPy
我在Python中使用NoMPY编写物理仿真代码,而不是将它重写到C++。在C++中,在Python大约40秒的时候只需要0.5秒。有人能帮我找到我犯了什么可怕的错误吗?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | import numpy as np def myFunc(i): uH = np.copy(u) for j in range(1, xmax-1): u[i][j] = a*uH[i][j-1]+(1-2*a)*uH[i][j]+a*uH[i][j+1] u[i][0] = u[i][0]/b for x in range(1, xmax): u[i][x] = (u[i][x]+a*u[i][x-1])/(b+a*c[x-1]) for x in range(xmax-2,-1,-1): u[i][x]=u[i][x]-c[x]*u[i][x+1] xmax = 101 tmax = 2000 #All other variables are defined here but I removed that for visibility uH = np.zeros((xmax,xmax)) u = np.zeros((xmax,xmax)) c = np.full(xmax,-a) uH[50][50] = 10000 for t in range(1, tmax): if t % 2 == 0: for i in range(0,xmax): myFunc(i) else: for i in range(0, xmax): myFunc(i) |
如果有人想运行它,这里有完整的代码:http://pastebin.com/20zspbqq编辑:所有变量都是在整个代码中定义的,可以在Pastebin上找到。不好意思弄混了,我想清除所有的混乱会使代码更容易理解
从根本上讲,C是一种编译语言,当Python是解释语言时,它的速度与易用性相反。
numpy可以填补这个空白,但必须避免在项目上循环,这通常需要一些技巧。
例如,
1 2 3 4 | def block1(): for i in range(xmax): for j in range(1, xmax-1): u[i][j] = a*uH[i][j-1]+(1-2*a)*uH[i][j]+a*uH[i][j+1] |
处于麻木状态:
1 2 | def block2(): u[:,1:-1] += a*np.diff(u,2) |
与更短更快(和更容易阅读和理解?):
1 2 3 4 5 | In [37]: %timeit block1() 10 loops, best of 3: 25.8 ms per loop In [38]: %timeit block2() 10000 loops, best of 3: 123 μs per loop |
最后,您可以通过及时编译来加速numpy代码,numba允许这样做。只需将代码的开头更改为:
1 2 3 4 | import numba @numba.jit def myFunc(u,i): ... |
脚本末尾的
所以当我运行您的numpy python代码时,运行需要4分钟,一旦我删除了numpy代码并用标准的python代码替换它,只需要1分钟!(我有一台速度不太快的电脑)
这是代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #import numpy as np def impl(i,row): if row: uH = u[:][:] # this copys the array 'u' for j in range(1, xmax-1): u[i][j] = a*uH[i][j-1]+(1-2*a)*uH[i][j]+a*uH[i][j+1] u[i][0] = u[i][0]/b for x in range(1, xmax): u[i][x] = (u[i][x]+a*u[i][x-1])/(b+a*c[x-1]) for x in range(xmax-2,-1,-1): u[i][x]=u[i][x]-c[x]*u[i][x+1] else: uH = u[:][:] # this copys the array 'u' for j in range(1, xmax-1): u[j][i]= a*uH[j-1][i]+(1-2*a)*uH[j][i]+a*uH[j+1][i] u[0][i] = u[0][i]/b for y in range(1, xmax): u[y][i] = (u[y][i]+a*u[y-1][i])/(b+a*c[y-1]) for y in range(xmax-2,-1,-1): u[y][i]=u[y][i]-c[y]*u[y+1][i] #Init xmax = 101 tmax = 2000 D = 0.5 l = 1 tSec = 0.1 uH = [[0.0]*xmax]*xmax #np.zeros((xmax,xmax)) u = [[0.0]*xmax]*xmax #np.zeros((xmax,xmax)) dx = l / xmax dt = tSec / tmax a = (D*dt)/(dx*dx); b=1+2*a print("dx=="+str(dx)) print("dt=="+str(dt)) print(" a=="+str(a)) #koeficient c v trojdiagonalnej matici c = [-a]*xmax #np.full(xmax,-a) c[0]=c[0]/b for i in range(1, xmax): c[i]=c[i]/(b+a*c[i-1]) uH[50][50] = 10000 u = uH for t in range(1, tmax): if t % 2 == 0: for i in range(0,xmax): impl(i,False) else: for i in range(0, xmax): impl(i,True) |
我相信,如果你使用了numpy的正确方式而不是数组的替代品,这可能会更快,但是,不使用numpy数组会将时间缩短到原来的1/4。