关于unix:grep如何运行如此之快?

How does grep run so fast?

我对shell中GRIP的功能感到惊讶,以前我在Java中使用子字符串方法,但现在我使用GRIP来执行它,它在几秒钟内执行,它比我以前编写的Java代码快得多。(根据我的经验,我可能错了)

我说我还没弄清楚事情是怎么发生的?网络上也没有太多可用的内容。

有人能帮我吗?


考虑你的问题这是作者Mike Haertel的一份说明:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH
BYTE that it
does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first
for the final letter of the target string, and uses a lookup table to
tell it how far ahead it can skip in the input whenever it finds a
non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the
Boyer-Moore delta table entries in such a way that it doesn't need to
do the loop exit test at every unrolled step. The result of this is
that, in the limit, GNU grep averages fewer than 3 x86 instructions
executed for each input byte it actually looks at (and it skips many
bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data
after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO
LINES. Looking for newlines would slow grep down by a factor of
several times, because to find the newlines it would have to look at
every byte!

So instead of using line-oriented input, GNU grep reads raw data into
a large buffer, searches the buffer using Boyer-Moore, and only when
it finds a match does it go and look for the bounding newlines
(Certain command line options like
-n disable this optimization.)

这是一个从这里得到的信息。


给史蒂夫的答案很好

也许不太熟悉,但GREP总是很快地爬到一个长长的模式,而不是一个短暂的模式,因为在一个长长的模式中,Boyer-Moore可以滑雪前进,甚至更快地实现次线性速度:

Example:

1
2
3
4
5
6
7
8
9
10
# after running these twice to ensure apples-to-apples comparison
# (everything is in the buffer cache)

$ time grep -c 'tg=f_c' 20140910.log
28
0.168u 0.068s 0:00.26

$ time grep -c ' /cc/merchant.json tg=f_c' 20140910.log
28
0.100u 0.056s 0:00.17

长方形35%快!

How come?Boyer-Moore在输入单一坦克与滑板表格中的坦克进行比较之前,从模式条纹构造了一个滑雪向前的表格,如果没有一个错误的表格,则从最后一个坦克到第一个可能的长度滑雪。

这是一个视频解释博伊尔摩尔

另一个常见的误解(对于GNU GREP)是fgrepgrep快。Doesn't s t and for fast'(see the man page),and since both are the same program,and both use Boyer-Moore,there's no difference in speed between them when searching for fixed-strings without special chars.我使用fgrep的唯一原因是当有一辆特殊的RegEXP坦克(如.[])或*时,我不想被解释为这样。甚至在那之后,grep -F的更便携式/标准格式也比fgrep早。