Self written os.walk-alike is much slower then os.walk itself - why?
不幸的是,这个代码比"os.walk"运行得慢,但是为什么呢?
是"for"循环导致它运行缓慢吗?
"类似于"os.walk"的代码:("os.walk"函数做它做的事情)
注意:我写信是为了提高自己!:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import os, time from os.path import * x ="" y = [] z = [] var = 0 def walk(xew): global top, var, x,y,z if not var: var = [xew] for i in var: try: for ii in os.listdir(i): y.append(ii) if isdir(i+os.sep+ii) else z.append(ii) x = top = i var = [top+os.sep+i for i in os.listdir(top) if isdir(top+os.sep+i)] except: continue yield x,y,z yield from walk(var) var.clear();y.clear();z.clear() |
例如:
2秒钟后结束:
1 2 | for x,y,z in walk(path): print(x) |
0.5秒后:
1 2 | for x,y,z in os.walk(path): print(x) |
Using
scandir() instead oflistdir() can significantly increase the performance of code that also needs file type or file attribute information, becauseos.DirEntry objects expose this information if the operating system provides it when scanning a directory. Allos.DirEntry methods may perform a system call, butis_dir() andis_file() usually only require a system call for symbolic links;os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.
接下来,您的代码过于频繁地调用
当
接下来,您应该去掉全局变量,并使用适当的变量名。
您可以研究
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def walk(top): dirs = [] nondirs = [] with os.scandir(top) as scandir_it: for entry in scandir_it: if entry.is_dir(): dirs.append(entry.name) else: nondirs.append(entry.name) yield top, dirs, nondirs for dirname in dirs: new_path = os.path.join(top, dirname) yield from walk(new_path) |
注意,没有使用全局变量;在这个算法中根本不需要任何变量。每个目录只有一个
这段代码几乎和
1 2 3 4 5 6 7 8 9 10 11 12 | import os, time from os.path import * def walk(top): x = top;y=[];z=[] try: for i in os.listdir(top): y.append(i) if isdir(top+os.sep+i) else z.append(i) except: pass else: yield x,y,z for q in y: yield from walk(top+os.sep+q) |