Unexpected: timing a grep for all patterns
Sep. 6th, 2022 08:51 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Recently I had the need of grepping an input stream for lines that match multiple patterns — but not just any one of them (modern grep implementations can do that on their own), but all of them. So, an AND instead of OR. Tired of building pipelines of multiple greps on the command line, I automated that using a simple shell script.
#!/bin/sh
# grep stdin for lines that match all arguments
arg="$1"
shift
if [ "$*" ]; then
grep "$arg" | $0 "$@"
else
grep "$arg"
fi
This worked fine. But after a while I thought, why not do it right, and what language would be better for that than Perl?
#!/usr/bin/perl
# grep stdin for lines that match all arguments
use warnings;
use strict;
LINE: while (defined($_ = <STDIN>)) {
for my $pat (@ARGV) {
next LINE unless /$pat/;
}
print;
}
Now I have been mostly off Perl for a while now, doing things in Python. We all know Python is slower than Perl, and Perl is optimised for regular expression matching in particular. But I was curious by how much Python would lag behind, so I wrote that thing in Python, too.
#!/usr/bin/env python3
# grep stdin for lines that match all arguments
import re
import sys
def match_line(line):
for pat in sys.argv[1:]:
if not re.search(pat, line):
return False
return True
for line in sys.stdin:
if match_line(line):
print(line, end="")
So, now, time to do the measurements, starting with Perl, which I assumed to be the fastest.
$ time all-grep.pl a e i o u < /usr/share/dict/words > /dev/null
0.494 usr 0.008 sys 0m0.50s total 99.33 %
Half a second, well, yeah. So, now in Python.
$ time all-grep.py a e i o u < /usr/share/dict/words > /dev/null
0.209 usr 0.006 sys 0m0.21s total 98.85 %
Wot? This is faster, and not even by a small margin? Well, now in shell. This is bound to be the slowest, needing to create all those processes.
$ time all-grep.sh a e i o u < /usr/share/dict/words > /dev/null
0.069 usr 0.010 sys 0m0.03s total 100.00 %
Now, that came unexpected, shell being the fastest, much faster than Python, and Perl actually being last. So this is it:
M1 Pro | user | runtime |
---|---|---|
Shell | 0.069 | 0.03 |
Python | 0.209 | 0.21 |
Perl | 0.494 | 0.50 |
This is not a peculiarity of platform, macOS 12.5.1 on a Macbook M1 Pro. It is the same, even a bit more pronounced, on an Intel Xeon E2-1220 running Ubuntu 20.03:
Xeon | user | runtime |
---|---|---|
Shell | 0.013 | 0.00 |
Python | 0.146 | 0.15 |
Perl | 0.254 | 0.25 |
Well, then. Without trying to explain these results, I guess it is prudent not to assume speed or efficiency without measuring it first.