How do I list all files of a directory?
如何在python中列出目录中的所有文件,并将它们添加到
如果您只需要文件,您可以使用
1 2 3 | from os import listdir from os.path import isfile, join onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))] |
或者您可以使用
1 2 3 4 5 6 | from os import walk f = [] for (dirpath, dirnames, filenames) in walk(mypath): f.extend(filenames) break |
最后,如该示例所示,将一个列表添加到另一个列表中,您可以使用
1 2 3 4 5 | >>> q = [1, 2, 3] >>> w = [4, 5, 6] >>> q = q + w >>> q [1, 2, 3, 4, 5, 6] |
就我个人而言,我更喜欢
我更喜欢使用
1 2 | import glob print(glob.glob("/home/adam/*.txt")) |
它将返回包含查询文件的列表:
1 | ['/home/adam/file1.txt', '/home/adam/file2.txt', .... ] |
1 2 | import os os.listdir("somedirectory") |
将返回"somedirectory"中所有文件和目录的列表。
获取使用python 2和3的文件列表
我还做了一个简短的视频:python:如何在目录中获取文件列表
操作表()
或者…如何获取当前目录(python 3)中的所有文件(和目录)
在python 3中,将文件保存在当前目录中最简单的方法就是这样。这真的很简单;使用
1 2 3 4 | >>> import os >>> arr = os.listdir() >>> arr ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents'] |
使用球
我发现glob更容易选择相同类型的文件或具有共同点的文件。请看以下示例:
1 2 3 4 5 | import glob txtfiles = [] for file in glob.glob("*.txt"): txtfiles.append(file) |
使用列表理解
1 2 3 | import glob mylist = [f for f in glob.glob("*.txt")] |
使用os.path.abspath获取完整路径名
正如您注意到的,在上面的代码中没有文件的完整路径。如果需要绝对路径,可以使用名为
1 2 3 4 | >>> import os >>> files_path = [os.path.abspath(x) for x in os.listdir()] >>> files_path ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt'] |
使用
我发现这对在许多目录中查找资料非常有用,它帮助我找到了一个我不记得名字的文件:
1 2 3 4 5 6 7 8 9 10 | import os # Getting the current work directory (cwd) thisdir = os.getcwd() # r=root, d=directories, f = files for r, d, f in os.walk(thisdir): for file in f: if".docx" in file: print(os.path.join(r, file)) |
os.listdir():获取当前目录中的文件(python 2)
1 2 3 4 5 6 7 | import os mylist ="" with open("filelist.txt","w", encoding="utf-8") as file: for eachfile in os.listdir(): mylist += eachfile +" " file.write(mylist) |
例子:一个与所有的txt文件的硬盘驱动器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | """We are going to save a txt file with all the files in your directory. We will use the function walk() """ import os # see all the methods of os # print(*dir(os), sep=",") listafile = [] percorso = [] with open("lista_file.txt","w", encoding='utf-8') as testo: for root, dirs, files in os.walk("D:\"): for file in files: listafile.append(file) percorso.append(root +"\" + file) testo.write(file +" ") listafile.sort() print("N. of files", len(listafile)) with open("lista_file_ordinata.txt","w", encoding="utf-8") as testo_ordinato: for file in listafile: testo_ordinato.write(file +" ") with open("percorso.txt","w", encoding="utf-8") as file_percorso: for file in percorso: file_percorso.write(file +" ") os.system("lista_file.txt") os.system("lista_file_ordinata.txt") os.system("percorso.txt") |
所有文件C:在一个文本文件
这是一个较短的版本前的代码。改变文件夹的文件启动,如果你发现需要从一开始的位置。该代码生成50 MB的文件在我的电脑在线文本和一些不那么具有50万株的完整的文件路径。
只获取文件列表(无子目录)的单行解决方案:
1 | filenames = next(os.walk(path))[2] |
或绝对路径名:
1 | paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]] |
从目录及其所有子目录获取完整文件路径
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import os def get_filepaths(directory): """ This function will generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames). """ file_paths = [] # List which will store all of the full filepaths. # Walk the tree. for root, directories, files in os.walk(directory): for filename in files: # Join the two strings in order to form the full filepath. filepath = os.path.join(root, filename) file_paths.append(filepath) # Add it to the list. return file_paths # Self-explanatory. # Run the above function and store its results in a variable. full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST") |
- 我在上面函数中提供的路径包含3个文件-其中两个在根目录中,另一个在名为"subfolder"的子文件夹中。现在可以执行以下操作:
print full_file_paths 将打印列表:['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']
如果愿意,您可以打开和读取内容,或者只关注扩展名为".dat"的文件,如下面的代码所示:
1 2 3 | for f in full_file_paths: if f.endswith(".dat"): print f |
自3.4版以来,内置迭代器的效率比
1 2 | >>> import pathlib >>> [p for p in pathlib.Path('.').iterdir() if p.is_file()] |
根据PEP428,
1 2 | >>> import os >>> [entry for entry in os.scandir('.') if entry.is_file()] |
注意,从3.5版开始,
让我也推荐阅读下面的影子突击队的评论。
初步说明
- 尽管问题文本中的文件和目录术语有明显的区别,但有些人可能会认为目录实际上是特殊的文件。
- 语句:"目录中的所有文件"可以用两种方式解释:
- 仅所有直接(或级别1)后代
- 整个目录树中的所有子代(包括子目录中的子代)
当问到这个问题时,我认为python 2是LTS版本,但是代码示例将由python 3(.5)运行(我将尽可能使它们与python 2兼容;另外,我要发布的属于python的任何代码都来自v3.5.4,除非另有说明)。这与问题中的另一个关键字"将它们添加到列表"有关:
- 在Python2.2之前的版本中,序列(iterables)主要由列表(元组、集合等)表示。
- 在python 2.2中,引入了generator的概念(:generators)——由提供:yield语句。随着时间的推移,对于返回/处理列表的函数,生成器对应项开始出现。
- 在Python3中,生成器是默认行为
- 不确定返回列表是否仍然是强制的(或者生成器也可以),但将生成器传递给列表构造函数将从中创建一个列表(并使用它)。下面的示例说明了上的差异:map(函数,iterable,…)
1
2
3
4
5
6
7
8>>> import sys
>>> sys.version
'2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function
>>> m, type(m)
([1, 2, 3], <type 'list'>)
>>> len(m)
31
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17>>> import sys
>>> sys.version
'3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3])
>>> m, type(m)
(<map object at 0x000001B4257342B0>, <class 'map'>)
>>> len(m)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
>>> lm0 = list(m) # Build a list from the generator
>>> lm0, type(lm0)
([1, 2, 3], <class 'list'>)
>>>
>>> lm1 = list(m) # Build a list from the same generator
>>> lm1, type(lm1) # Empty list now - generator already consumed
([], <class 'list'>)示例将基于一个名为root_dir的目录,其结构如下(此示例用于win,但我在lnx上也使用了相同的树):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34E:\Work\Dev\StackOverflow\q003207219>tree /f"root_dir"
Folder PATH listing for volume Work
Volume serial number is 00000029 3655:6FED
E:\WORK\DEV\STACKOVERFLOW\Q003207219
OOT_DIR
| file0
| file1
|
+---dir0
| +---dir00
| | | file000
| | |
| | +---dir000
| | file0000
| |
| +---dir01
| | file010
| | file011
| |
| +---dir02
| +---dir020
| +---dir0200
+---dir1
| file10
| file11
| file12
|
+---dir2
| | file20
| |
| +---dir20
| file200
|
+---dir3
解决方案方法:
:os.listdir(path='.')
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries
'.' and'..' ...
1
2
3
4
5
6
7
8 >>> import os
>>> root_dir ="root_dir" # Path relative to current dir (os.getcwd())
>>>
>>> os.listdir(root_dir) # List all the items in root_dir
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories)
['file0', 'file1']
更详细的示例(code_os_listdir.py):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import os from pprint import pformat def _get_dir_content(path, include_folders, recursive): entries = os.listdir(path) for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: yield entry_with_path if recursive: for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive): yield sub_entry else: yield entry_with_path def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) for item in _get_dir_content(path, include_folders, recursive): yield item if prepend_folder_name else item[path_len:] def _get_dir_content_old(path, include_folders, recursive): entries = os.listdir(path) ret = list() for entry in entries: entry_with_path = os.path.join(path, entry) if os.path.isdir(entry_with_path): if include_folders: ret.append(entry_with_path) if recursive: ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive)) else: ret.append(entry_with_path) return ret def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True): path_len = len(path) + len(os.path.sep) return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)] def main(): root_dir ="root_dir" ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True) lret0 = list(ret0) print(ret0, len(lret0), pformat(lret0)) ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False) print(len(ret1), pformat(ret1)) if __name__ =="__main__": main() |
笔记:
:os.walk(顶部,自上而下=true,onerror=none,followlinks=false)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (
dirpath ,dirnames ,filenames ).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 >>> import os
>>> root_dir = os.path.join(os.getcwd(),"root_dir") # Specify the full path
>>> root_dir
'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir'
>>>
>>> walk_generator = os.walk(root_dir)
>>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument)
>>> root_dir_entry
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
>>>
>>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path
['E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\file1']
>>>
>>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir)
... print(entry)
...
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir00', ['dir000'], ['file000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir00\\dir000', [], ['file0000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir01', [], ['file010', 'file011'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02', ['dir020'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir1', [], ['file10', 'file11', 'file12'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2', ['dir20'], ['file20'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2\\dir20', [], ['file200'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir3', [], [])
笔记:
- 在场景下,它使用
os.scandir (在旧版本上使用os.listdir ) - 它通过在子文件夹中循环进行重载提升
:glob.glob(路径名,*,recursive=false)(:glob.iglob(路径名,*,recursive=false))
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like
/usr/src/Python-1.5/Makefile ) or relative (like../../Tools/*/*.gif ), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell)....Changed in version 3.5: Support for recursive globs using"** ".
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 >>> import glob, os
>>> wildcard_pattern ="*"
>>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name
>>> root_dir
'root_dir\\*'
>>>
>>> glob_list = glob.glob(root_dir)
>>> glob_list
['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
>>>
>>> [item.replace("root_dir" + os.path.sep,"") for item in glob_list] # Strip the dir name and the path separator from begining
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> for entry in glob.iglob(root_dir +"*", recursive=True):
... print(entry)
...
root_dir\
root_dir\dir0
root_dir\dir0\dir00
root_dir\dir0\dir00\dir000
root_dir\dir0\dir00\dir000\file0000
root_dir\dir0\dir00\file000
root_dir\dir0\dir01
root_dir\dir0\dir01\file010
root_dir\dir0\dir01\file011
root_dir\dir0\dir02
root_dir\dir0\dir02\dir020
root_dir\dir0\dir02\dir020\dir0200
root_dir\dir1
root_dir\dir1\file10
root_dir\dir1\file11
root_dir\dir1\file12
root_dir\dir2
root_dir\dir2\dir20
root_dir\dir2\dir20\file200
root_dir\dir2\file20
root_dir\dir3
root_dir\file0
root_dir\file1
笔记:
- 使用
os.listdir 。 - 对于大型树(尤其是启用递归时),首选iglob
- 允许基于名称的高级筛选(由于通配符)
:class pathlib.path(*pathspegs)(python 3.4+,backport:[pypi]:pathlib2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 >>> import pathlib
>>> root_dir ="root_dir"
>>> root_dir_instance = pathlib.Path(root_dir)
>>> root_dir_instance
WindowsPath('root_dir')
>>> root_dir_instance.name
'root_dir'
>>> root_dir_instance.is_dir()
True
>>>
>>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only
['root_dir\\file0', 'root_dir\\file1']
笔记:
- 这是实现我们目标的一种方式
- 这是处理路径的OOP风格
- 提供了许多功能
:dircache.listdir(path)(仅限python 2)
- 但是,根据[github]:python/cpython-(2.7)cpython/lib/dircache.py,它只是一个(瘦)包装,覆盖
os.listdir ,带有缓存
1 2 3 4 5 6 7 8 9 10 11 12 13 | def listdir(path): """List directory contents, using cache.""" try: cached_mtime, list = cache[path] del cache[path] except KeyError: cached_mtime, list = -1, [] mtime = os.stat(path).st_mtime if mtime != cached_mtime: list = os.listdir(path) list.sort() cache[path] = mtime, list return list |
使用Opendir/Readdir/Closedir([MS.docs]:FindfirstFilew Function/[MS.docs]:FindnexFilew Function/[MS.docs]:Findclose Function(Via[Github]:Python/Cpython-(Master)Cpython/Modules/Posixmodule.c)
使用这些(win specific)功能作为Well(via[Github]:Mhammond/Pywin32-(Master)Pywin32/WIN32/SRC/WIN32File.i)
+U get dir 1.)can be implemented using any of these approaches(some will require more work and some less)
- Some advanced filtering(instead of just file vs.dir)could be done:E.G.The include ufolders argument could be replaced by another one(E.G.filter func).which would be a function that takes a path as an argument:EDOCX1>3>(this doesn t strip out out content anything)and inside \ function if inside \失败一个入口,它会变得很滑稽,但代码变得更复杂,长久它会被执行。
Nota Bene!自从我的回馈被使用以来,我必须指出,我在我的LAPTOP(WIN 10 X64)上做了一些测试,完全与这个问题无关,而且当回馈水平在某处达到(990.)(Recursionlimit-1000(Default)),I got stackoverflow:)。如果目录树超出了限度(我不是专家,所以我不知道是否有可能),那可能是个问题。我还必须指出,我没有尝试增加回归限额,因为我在这一地区没有经验(我如何才能在这之前增加到骨骼水平),但在理论上,总有失败的可能性,如果深度大于可能的最高回归限度(在这台机器上)
The code samples are only for demonstrative purposes.这意味着我没有把错误处理(我不认为有任何尝试/除了/ELSE/ELSE/Finally Block),所以代码并不坚固(原因是尽可能简单和简短)。生产误差处理
我真的很喜欢Adamk的回答,建议您使用来自同名模块的
但正如其他人在评论中指出的那样,
例如:
1 2 3 4 | from glob import glob # Return everything under C:\Users\admin that contains a folder called wlp. glob('C:\Users\admin\*\wlp') |
上面的情况很糟糕-路径已经硬编码,并且只能在驱动器名和
1 2 3 4 5 | from glob import glob from os.path import join # Return everything under Users, admin, that contains a folder called wlp. glob(join('Users', 'admin', '*', 'wlp')) |
上面的方法更有效,但是它依赖于文件夹名
1 2 3 4 5 | from glob import glob from os.path import expanduser, join # Return everything under the user directory that contains a folder called wlp. glob(join(expanduser('~'), '*', 'wlp')) |
这在所有平台上都非常有效。
另一个很好的例子是,它可以在不同的平台上完美地工作,并且可以做一些不同的事情:
1 2 3 4 5 6 | from glob import glob from os import getcwd from os.path import join # Return everything under the current directory that contains a folder called wlp. glob(join(getcwd(), '*', 'wlp')) |
希望这些示例能够帮助您了解在标准的Python库模块中可以找到的一些函数的强大功能。
1 2 3 4 5 6 7 8 | def list_files(path): # returns a list of names (with extension, without full path) of all files # in folder path files = [] for name in os.listdir(path): if os.path.isfile(os.path.join(path, name)): files.append(name) return files |
如果您正在寻找find的python实现,这是我经常使用的一个方法:
1 2 3 4 5 6 7 8 | from findtools.find_files import (find_files, Match) # Recursively find all *.sh files in **/usr/bin** sh_files_pattern = Match(filetype='f', name='*.sh') found_files = find_files(path='/usr/bin', match=sh_files_pattern) for found_file in found_files: print found_file |
所以我用它做了一个pypi包,还有一个github存储库。我希望有人会发现它对这段代码有潜在的用处。
返回绝对文件路径列表,不会递归到子目录中
1 | L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import os import os.path def get_files(target_dir): item_list = os.listdir(target_dir) file_list = list() for item in item_list: item_dir = os.path.join(target_dir,item) if os.path.isdir(item_dir): file_list += get_files(item_dir) else: file_list.append(item_dir) return file_list |
这里我使用递归结构。
我假设您的所有文件都是
可以使用
1 2 3 | import glob fnames = glob.glob("data/*.txt") #fnames: list data type |
For greater results, you can use
listdir() method of theos module along with a generator (a generator is a powerful iterator that keeps its state, remember?). The following code works fine with both versions: Python 2 and Python 3.
这里有一个代码:
1 2 3 4 5 6 7 8 9 | import os def files(path): for file in os.listdir(path): if os.path.isfile(os.path.join(path, file)): yield file for file in files("."): print (file) |
希望这有帮助。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | # -** coding: utf-8 -*- import os import traceback print ' ' def start(): address ="/home/ubuntu/Desktop" try: Folders = [] Id = 1 for item in os.listdir(address): endaddress = address +"/" + item Folders.append({'Id': Id, 'TopId': 0, 'Name': item, 'Address': endaddress }) Id += 1 state = 0 for item2 in os.listdir(endaddress): state = 1 if state == 1: Id = FolderToList(endaddress, Id, Id - 1, Folders) return Folders except: print"___________________________ ERROR ___________________________ " + traceback.format_exc() def FolderToList(address, Id, TopId, Folders): for item in os.listdir(address): endaddress = address +"/" + item Folders.append({'Id': Id, 'TopId': TopId, 'Name': item, 'Address': endaddress }) Id += 1 state = 0 for item in os.listdir(endaddress): state = 1 if state == 1: Id = FolderToList(endaddress, Id, Id - 1, Folders) return Id print start() |
使用发电机
1 2 3 4 5 6 7 8 | import os def get_files(search_path): for (dirpath, _, filenames) in os.walk(search_path): for filename in filenames: yield os.path.join(dirpath, filename) list_files = get_files('.') for filename in list_files: print(filename) |
您可以将此代码用于在文件的完整路径(目录+文件名)上运行的get迭代器。
1 2 3 4 5 6 | import os def get_iterator_all_files_name(dir_path): for (dirpath, dirnames, filenames) in os.walk(dir_path): for f in filenames: yield os.path.join(dirpath, f) |
或者用它,把它列入名单。
1 2 3 4 5 6 7 8 9 10 | import os def get_list_all_files_name(dir_path): all_files_path = [] for (dirpath, dirnames, filenames) in os.walk(dir_path): for f in filenames: all_files_path.append(os.path.join(dirpath, f)) return all_files_path |
Python3.4+的另一个非常易读的变体是使用pathlib.path.glob:
1 2 3 | from pathlib import Path folder = '/foo' [f for f in Path(folder).glob('*') if f.is_file()] |
更具体化是很简单的,例如只查找不是符号链接的python源文件,也可以在所有子目录中查找:
1 | [f for f in Path(folder).glob('**/*.py') if not f.is_symlink()] |
如果要使用其他文件类型或获取完整目录,请使用此函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import os def createList(foldername, fulldir = True, suffix=".jpg"): file_list_tmp = os.listdir(foldername) #print len(file_list_tmp) file_list = [] if fulldir: for item in file_list_tmp: if item.endswith(suffix): file_list.append(os.path.join(foldername, item)) else: for item in file_list_tmp: if item.endswith(suffix): file_list.append(item) return file_list |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import dircache list = dircache.listdir(pathname) i = 0 check = len(list[0]) temp = [] count = len(list) while count != 0: if len(list[i]) != check: temp.append(list[i-1]) check = len(list[i]) else: i = i + 1 count = count - 1 print temp |
这是我的通用功能。它返回一个文件路径列表,而不是文件名,因为我发现这更有用。它有几个可选的参数,使其通用。例如,我经常把它与
1 2 3 4 5 6 7 8 9 10 11 12 | import os import fnmatch def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False): """Return a list of the file paths matching the pattern in the specified folder, optionally including files inside subfolders. """ match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch walked = os.walk(folder) if subfolders else [next(os.walk(folder))] return [os.path.join(root, f) for root, dirnames, filenames in walked for f in filenames if match(f, pattern)] |
对于Python 2:pip安装rglob
1 2 3 | import rglob file_list=rglob.rglob("/home/base/dir/","*") print file_list |
一位聪明的老师曾经告诉我:
When there are several established ways to do something, none of them is good for all cases.
因此,我将为问题的一个子集添加一个解决方案:通常,我们只想检查文件是否匹配开始字符串和结束字符串,而不必进入子目录。因此,我们需要一个返回文件名列表的函数,例如:
1 | filenames = dir_filter('foo/baz', radical='radical', extension='.txt') |
如果要先声明两个函数,可以这样做:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def file_filter(filename, radical='', extension=''): "Check if a filename matches a radical and extension" if not filename: return False filename = filename.strip() return(filename.startswith(radical) and filename.endswith(extension)) def dir_filter(dirname='', radical='', extension=''): "Filter filenames in directory according to radical and extension" if not dirname: dirname = '.' return [filename for filename in os.listdir(dirname) if file_filter(filename, radical, extension)] |
这个解决方案可以很容易地用正则表达式进行概括(如果您不希望模式总是停留在文件名的开头或结尾,那么您可能需要添加一个
我将提供一个示例一行程序,其中可以提供sourcepath和文件类型作为输入。代码返回带有csv扩展名的文件名列表。使用。如果需要返回所有文件。这也将递归地扫描子目录。
根据需要修改文件扩展名和源路径。
从指定文件夹(包括子目录)中获取所有文件。
1 2 3 4 | import glob import os print([entry for entry in glob.iglob("{}/**".format("FILE_PATH"), recursive=True) if os.path.isfile(entry) == True]) |
要显示完整路径和带扩展名的筛选器,请使用,
1 2 | import os onlyfiles = [f for f in os.listdir(file) if len(f) >= 5 and f[-5:] ==".json" and isfile(join(file, f))] |
根据扩展名/文件类型中的".+"字符更改数字5