jyrgenn: Blurred head shot from 2007 (Default)
[personal profile] jyrgenn

Recently I had the need of grepping an input stream for lines that match multiple patterns — but not just any one of them (modern grep implementations can do that on their own), but all of them. So, an AND instead of OR. Tired of building pipelines of multiple greps on the command line, I automated that using a simple shell script.
#!/bin/sh
# grep stdin for lines that match all arguments

arg="$1"
shift

if [ "$*" ]; then
    grep "$arg" | $0 "$@"
else
    grep "$arg"
fi

This worked fine. But after a while I thought, why not do it right, and what language would be better for that than Perl?
#!/usr/bin/perl
# grep stdin for lines that match all arguments

use warnings;
use strict;

LINE: while (defined($_ = <STDIN>)) {
        for my $pat (@ARGV) {
                next LINE unless /$pat/;
        }
        print;
}

Now I have been mostly off Perl for a while now, doing things in Python. We all know Python is slower than Perl, and Perl is optimised for regular expression matching in particular. But I was curious by how much Python would lag behind, so I wrote that thing in Python, too.
#!/usr/bin/env python3
# grep stdin for lines that match all arguments

import re
import sys


def match_line(line):
    for pat in sys.argv[1:]:
        if not re.search(pat, line):
            return False
    return True


for line in sys.stdin:
    if match_line(line):
        print(line, end="")

So, now, time to do the measurements, starting with Perl, which I assumed to be the fastest.
$ time all-grep.pl a e i o u < /usr/share/dict/words > /dev/null

0.494 usr 0.008 sys 0m0.50s total 99.33 %

Half a second, well, yeah. So, now in Python.
$ time all-grep.py a e i o u < /usr/share/dict/words > /dev/null

0.209 usr 0.006 sys 0m0.21s total 98.85 %

Wot? This is faster, and not even by a small margin? Well, now in shell. This is bound to be the slowest, needing to create all those processes.
$ time all-grep.sh a e i o u < /usr/share/dict/words > /dev/null

0.069 usr 0.010 sys 0m0.03s total 100.00 %

Now, that came unexpected, shell being the fastest, much faster than Python, and Perl actually being last. So this is it:

M1 Prouserruntime
Shell0.0690.03
Python0.2090.21
Perl0.4940.50


This is not a peculiarity of platform, macOS 12.5.1 on a Macbook M1 Pro. It is the same, even a bit more pronounced, on an Intel Xeon E2-1220 running Ubuntu 20.03:

Xeonuserruntime
Shell0.0130.00
Python0.1460.15
Perl0.2540.25


Well, then. Without trying to explain these results, I guess it is prudent not to assume speed or efficiency without measuring it first.

Profile

jyrgenn: Blurred head shot from 2007 (Default)
jyrgenn

September 2022

S M T W T F S
    123
45 678910
11121314151617
18192021222324
252627282930 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags