Blazing Fast Grep

It was somewhat of an unexpected news and an accidental finding, when I found out today that perl-compatible grep is much faster compared to the default one. I was trying to grep a 145MB text-file

grep -i 'someword' largefile.log - 14 seconds
grep -iP 'someword' largefile.log - under 1 second
grep -iP 'someword.*?' largefile.log - under 1 second

Perl-compatible regexp search is orders of magnitude faster!

It is not surprising that the two modes may be using different algorithms, however... Since perl-compatible is more generic, complex and inclusive of the simpler cases, it makes you wonder why would they bother? Why not just default the simpler case onto the more generic, Perl-compatible one and have both of them fast? I guess - one more glaring example of over-engineering waste; in this case - in a Linux classics :) I, for one, am going to always use the "-P" option from now on.

P.S. On the same subject, for the hopeless geeks amongst you, it is not just much easier to write:

egrep -irl "someword" path

than

find path -name "someword" -exec grep -li "someword" {} \;

but former is much faster, too :)

You post about command line

You post about command line stuff and you must've known I would have something to say...

I believe you meant for the first word in your last example to be "find" and not "grep". Using "find ... -exec grep..." is going to be slower because find will spawn a new grep process for each found result. The "egrep -r" method (or "find ... -print | xargs...") will use just one grep process. I wrote more about this a couple years ago.

All that said, the -P option is completely new to me and I find it interesting that it's so much faster. Did you do any further research to understand why this is so?

good catch.

Good catch! Yes, the first word was meant to be "find". Corrected, thanks.

Yes, of course, I was just teasing in the P.S. :)

My assumption to why -P mode is so much faster is that they, probably, first wrote the default mode implementation with some brute-force algorithm, then somebody added the Perl-mode with a much better algorithm and nobody thought to compare the two?

jlznopnh

exeyxukz http://dbibfmas.com eyyogfvc entgsesf