关于python：快点到os.walk还是glob？

directory-walkglobos.walkpythontraversal

Quicker to os.walk or glob?

我正在一个大硬盘上用python查找文件。我一直在看os.walk和glob。我通常使用os.walk，因为我发现它更整洁，而且似乎更快(对于通常大小的目录)。

有没有人对他们都有经验，可以说哪一个更有效率？如我所说，glob似乎比较慢，但是你可以使用通配符等，就像walk一样，你必须过滤结果。下面是一个查找核心转储的示例。

1
2
3
4
5
6
7

core = re.compile(r"core\.\d*")
for root, dirs, files in os.walk("/path/to/dir/")
for file in files:
if core.search(file):
path = os.path.join(root,file)
print"Deleting:" + path
os.remove(path)

或

1
2
3

for file in iglob("/path/to/dir/core.*")
print"Deleting:" + file
os.remove(file)

相关讨论

我做了一个关于1000个dirs中一个小的网页缓存的研究。任务是计算dirs中的文件总数。输出是：

1
2
3

os.listdir: 0.7268s, 1326786 files found
os.walk: 3.6592s, 1326787 files found
glob.glob: 2.0133s, 1326786 files found

如你所见，os.listdir是三个中最快的。在这项任务中，glog.glob比os.walk快。

来源：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

import os, time, glob

n, t = 0, time.time()
for i in range(1000):
n += len(os.listdir("./%d" % i))
t = time.time() - t
print"os.listdir: %.4fs, %d files found" % (t, n)

n, t = 0, time.time()
for root, dirs, files in os.walk("./"):
for file in files:
n += 1
t = time.time() - t
print"os.walk: %.4fs, %d files found" % (t, n)

n, t = 0, time.time()
for i in range(1000):
n += len(glob.glob("./%d/*" % i))
t = time.time() - t
print"glob.glob: %.4fs, %d files found" % (t, n)

相关讨论

在测量/分析之前，不要浪费时间进行优化。注意使代码简单易维护。

例如，在您的代码中，您预编译了re，这不会提高您的速度，因为re模块具有预编译res的内部re._cache。

保持简单

如果速度慢的话，那就做个侧写。

一旦你确切地知道需要优化什么，做一些调整，并总是记录下来。

注意，与"非优化"代码相比，几年前进行的一些优化可能会使代码运行较慢。这尤其适用于基于JIT的现代语言。

相关讨论

您可以使用os.walk，还可以使用glob样式匹配。

1
2
3
4

for root, dirs, files in os.walk(DIRECTORY):
for file in files:
if glob.fnmatch.fnmatch(file, PATTERN):
print file

不确定速度，但显然，由于os.walk是递归的，所以它们执行不同的操作。

*, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.listdir() and fnmatch.fnmatch() functions

我认为，即使使用glob，您仍然需要使用os.walk，除非您直接知道子目录树有多深。

顺便说一下，在Global文档中它说：

"*, ?, and character ranges expressed with [] will be correctly
matched. This is done by using the os.listdir() and fnmatch.fnmatch()
functions"

我只想用一个

1
2
3

for path, subdirs, files in os.walk(path):
for name in fnmatch.filter(files, search_str):
shutil.copy(os.path.join(path,name), dest)