递归计算Linux目录中的文件

Recursively counting files in a Linux directory

如何递归计算Linux目录中的文件？

我发现这一点：

1	find DIR_NAME -type f \| wc -l

但当我运行它时，它返回以下错误。

find: paths must precede expression: |

相关讨论

这应该有效：

1	find DIR_NAME -type f \| wc -l

说明：

-type f只包括文件。
|将find命令的标准输出重定向到wc命令的标准输入。
wc(字数的缩写)计算输入(docs)的换行数、字数和字节数。
只计算换行数。

笔记：

用.替换DIR_NAME以执行当前文件夹中的命令。
您还可以删除-type f以在计数中包含目录(和symlinks)。
如果文件名可以包含换行符，则此命令可能会过度计数。

解释您的示例不起作用的原因：

在您展示的命令中，您不使用"管道"(|连接两个命令，而是使用外壳不识别为命令或类似命令的断条(|)。这就是为什么你会收到错误信息。

相关讨论

对于当前目录：

1	find -type f \| wc -l

相关讨论

如果您希望对当前目录下每个目录中的文件数进行细分：

1
2
3
4

for i in $(find . -maxdepth 1 -type d) ; do
echo -n $i":" ;
(find $i -type f | wc -l) ;
done

当然，这可以在同一条线上进行。括号中说明了应该监视谁的输出wc -l(本例中为find $i -type f)。

相关讨论

你可以使用

$ tree

安装树包后

1	$ sudo apt-get install tree

(在Debian/Mint/Ubuntu Linux机器上)。

该命令不仅单独显示文件的计数，还分别显示目录的计数。选项-l可用于指定最大显示级别(默认情况下，该级别是目录树的最大深度)。

通过提供-a选项，也可以包括隐藏文件。

相关讨论

在我的电脑上，在公认的答案中，rsync比find | wc -l快一点。例如，您可以这样计算/Users/joe/中的文件：

1
2
3
4
5
6

[joe:~] $ rsync --stats --dry-run -ax /Users/joe/ /xxx

Number of files: 173076
Number of files transferred: 150481
Total file size: 8414946241 bytes
Total transferred file size: 8414932602 bytes

第二行的文件数为150481，在上面的示例中。作为奖励，您还可以获得总大小(以字节为单位)。

评论：

第一行是文件、目录、符号链接等的总数，这就是为什么它大于第二行的原因。
--dry-run选项(简称-n选项)对于不实际传输文件很重要！
/xxx参数可以是任何空文件夹或不存在的文件夹。这里不要使用/。
我使用了-x选项来"不跨越文件系统边界"，这意味着如果您为/执行它，并且您连接了外部硬盘，它将只计算根分区上的文件。

相关讨论

结合这里的几个答案，最有用的解决方案似乎是：

1 2	find . -maxdepth 1 -type d -print0 \| xargs -0 -I {} sh -c 'echo -e $(find"{}" -printf" " \| wc -l)"{}"' \| sort -n

它可以处理一些奇怪的事情，比如文件名，包括空格、括号甚至新行。它还按文件数对输出进行排序。

您可以在-maxdepth之后增加该数字，以便对子目录进行计数。请记住，这可能需要很长的时间，特别是如果您有一个高度嵌套的目录结构和一个高-maxdepth数的组合。

由于unix中的文件名可能包含换行符(是的，换行符)，因此wc -l可能会计算出太多的文件。我会为每个文件打印一个点，然后计算这些点：

1	find DIR_NAME -type f -printf"." \| wc -c

相关讨论

如果您想知道当前工作目录中有多少个文件和子目录，可以使用这个一行程序

1	find . -maxdepth 1 -type d -print0 \| xargs -0 -I {} sh -c 'echo -e $(find {} \| wc -l) {}' \| sort -n

这将在GNU风格下工作，只需省略bsd-linux的echo命令中的-e(例如osx)。

相关讨论

如果要避免出现错误情况，请不要让wc -l查看换行的文件(它将计为2+个文件)。

例如，考虑这样一种情况：我们有一个文件，其中包含一个EOL字符

1
2
3
4
5
6
7

> mkdir emptydir && cd emptydir
> touch $'file with EOL(
) character in it'
> find -type f
./file with EOL(?) character in it
> find -type f | wc -l
2

由于至少gnu wc似乎没有读取/计数以空结尾的列表(文件除外)的选项，最简单的解决方案是不传递文件名，但每次找到文件时都会有一个静态输出，例如在与上述目录相同的目录中。

1
2
3

> find -type f -exec printf '
' \; | wc -l
1

或者如果你的find支持它

1
2
3

> find -type f -printf '
' | wc -l
1

要确定当前目录中有多少文件，请放入ls -1 | wc -l。它使用wc来计算ls -1输出中的行数(-l)。它不计算点文件。请注意，ls -l(在前面的示例中是"l"而不是"1")，我在本howto的以前版本中使用过，它实际上会给您一个比实际计数大的文件计数1。感谢卡姆·内贾德。

如果您只想计算文件数，而不想包括符号链接(只是您可以做的其他操作的一个例子)，您可以使用ls -l | grep -v ^l | wc -l(这是一个"l"而不是一个"1"，我们希望在这里有一个"long"列表)。grep检查以"l"(表示链接)开头的任何行，并丢弃该行(-v)。

相对速度："ls-1/usr/bin/wc-l"在卸载的486sx25上大约需要1.03秒(此计算机上的usr/bin/有355个文件)。ls -l /usr/bin/ | grep -v ^l | wc -l大约需要1.19秒。

来源：http://www.tldp.org/howto/bash-prompt-howto/x700.html

相关讨论

您可以使用命令ncdu。它将递归计算一个Linux目录包含多少文件。以下是输出示例：

enter image description here

它有一个进度条，如果您有许多文件，这很方便：

enter image description here

要在Ubuntu上安装它：

1	sudo apt-get install -y ncdu

基准测试：我使用https://archive.org/details/cv_corpus_v1.tar(380390个文件，11 GB)作为一个文件夹，其中一个必须计算文件的数量。

find . -type f | wc -l：约120s完成
ncdu约120s完成

相关讨论

如果需要递归计算特定文件类型，可以执行以下操作：

1	find YOUR_PATH -name '*.html' -type f \| wc -l

-l只是显示输出中的行数。

相关讨论

用BASH：

用()创建一个条目数组，用获取计数。

1	FILES=(./*); echo ${#FILES[@]}

好的，这不会递归地计算文件，但我想先显示简单的选项。一个常见的用例可能是创建文件的滚动备份。这将创建logfile.1、logfile.2、logfile.3等。

1	CNT=(./logfile*); mv logfile logfile.${#CNT[@]}

为了递归地获取文件计数，我们仍然可以用同样的方法使用find。

1	FILES=(`find . -type f`); echo ${#FILES[@]}

对于名称中包含空格的目录…(基于以上各种答案)--递归打印目录名，其中包含文件数：

1	find . -mindepth 1 -type d -print0 \| while IFS= read -r -d '' i ; do echo -n $i":" ; ls -p"$i" \| grep -v / \| wc -l ; done

示例(格式化为可读性)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

pwd
/mnt/Vancouver/Programming/scripts/claws/corpus

ls -l
total 8
drwxr-xr-x 2 victoria victoria 4096 Mar 28 15:02 'Catabolism - Autophagy; Phagosomes; Mitophagy'
drwxr-xr-x 3 victoria victoria 4096 Mar 29 16:04 'Catabolism - Lysosomes'

ls 'Catabolism - Autophagy; Phagosomes; Mitophagy'/ | wc -l
138

## 2 dir (one with 28 files; other with 1 file):
ls 'Catabolism - Lysosomes'/ | wc -l
29

使用tree可以更好地可视化目录结构：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

tree -L 3 -F .
.
├── Catabolism - Autophagy; Phagosomes; Mitophagy/
│   ├── 1
│   ├── 10
│   ├── [ ... SNIP! (138 files, total) ... ]
│   ├── 98
│   └── 99
└── Catabolism - Lysosomes/
├── 1
├── 10
├── [ ... SNIP! (28 files, total) ... ]
├── 8
├── 9
└── aaa/
└── bbb

3 directories, 167 files

man find | grep mindep
-mindepth levels
Do not apply any tests or actions at levels less than levels
(a non-negative integer). -mindepth 1 means process all files
except the starting-points.

EDOCX1(以下使用)来自https://unix.stackexchange.com/questions/48492/list-only-regular-files-but-not-directories-in-current-directory的答案2。

1
2
3
4

find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i":" ; ls -p"$i" | grep -v / | wc -l ; done
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Catabolism - Lysosomes: 28
./Catabolism - Lysosomes/aaa: 1

应用：我想找到数百个目录中的最大文件数(所有深度=1)[下面的输出再次格式化为可读性]：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

date; pwd
Fri Mar 29 20:08:08 PDT 2019
/home/victoria/Mail/2_RESEARCH - NEWS

time find . -mindepth 1 -type d -print0 | while IFS= read -r -d '' i ; do echo -n $i":" ; ls -p"$i" | grep -v / | wc -l ; done > ../../aaa
0:00.03

[victoria@victoria 2_RESEARCH - NEWS]$ head -n5 ../../aaa
./RNA - Exosomes: 26
./Cellular Signaling - Receptors: 213
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Stress - Physiological, Cellular - General: 261
./Ancient DNA; Ancient Protein: 34

[victoria@victoria 2_RESEARCH - NEWS]$ sed -r 's/(^.*): ([0-9]{1,8}$)/\2: \1/g' ../../aaa | sort -V | (head; echo ''; tail)

0: ./Genomics - Gene Drive
1: ./Causality; Causal Relationships
1: ./Cloning
1: ./GenMAPP 2
1: ./Pathway Interaction Database
1: ./Wasps
2: ./Cellular Signaling - Ras-MAPK Pathway
2: ./Cell Death - Ferroptosis
2: ./Diet - Apples
2: ./Environment - Waste Management

988: ./Genomics - PPM (Personalized & Precision Medicine)
1113: ./Microbes - Pathogens, Parasites
1418: ./Health - Female
1420: ./Immunity, Inflammation - General
1522: ./Science, Research - Miscellaneous
1797: ./Genomics
1910: ./Neuroscience, Neurobiology
2740: ./Genomics - Functional
3943: ./Cancer
4375: ./Health - Disease

sort -V是一种天然的种类。…所以，我在这些目录中的最大文件数是4375个文件。如果我离开pad(https://stackoverflow.com/a/55409116/1904943)，这些文件名——它们都是以数字命名的，从每个目录中的1开始——总共填充到5个数字，我应该没问题。

我编写了ffcnt来加速特定情况下的递归文件计数：支持扩展映射的旋转磁盘和文件系统。

它可以比基于ls或find的方法快一个数量级，但是ymmv。

1	tree $DIR_PATH \| tail -1

样品输出：

5309 directories, 2122 files

这里有许多正确的答案。再来一个！

1	find . -type f \| sort \| uniq -w 10 -c

其中.是要查找的文件夹，10是用来对目录进行分组的字符数。

你可以试试：

1	find `pwd` -type f -exec ls -l {} ; \| wc -l

相关讨论

这种过滤格式的替代方法计算所有可用的GRUB内核模块：

1	ls -l /boot/grub/*.mod \| wc -l

查找-类型F wc-l

找到。-F WC-L型

相关讨论

这会完全有效。简单简短。如果要计算文件夹中存在的文件数。

1	ls \| wc -l

相关讨论

1	ls -l \| grep -e -x -e -dr \| wc -l

长名单

筛选文件和目录

计数过滤的行否