Greg Hewgill (ghewgill) wrote,
Greg Hewgill
ghewgill

premature optimization

For (literally) years, I've been pulling monthly web server logs from my server with a simple script that I run by hand. I have been using the scp -C option, which is supposed to compress the data before sending it across the wire. I'm sure it does so, but it sure doesn't seem very effective. Have a look at these timings:

$ time scp -C occam.hewgill.net:logs/hewgill.com/access_log access_log
     4882.68 real        29.09 user        20.99 sys
$ time ssh occam.hewgill.net 'bzip2 -c logs/hewgill.com/access_log' >/dev/null
     859.47 real         0.50 user         0.22 sys
$ time ssh occam.hewgill.net 'gzip -c logs/hewgill.com/access_log' >/dev/null
     108.90 real         0.67 user         0.33 sys

These numbers are all counterintuitive. I would have expected that the scp -C option would have performed better (the man page claims that it uses the gzip algorithm, but why does it perform so badly in this case?). I would have expected that bzip2 would have been faster overall than gzip because there would be less data to transfer (in reality the server couldn't compress the data as fast as it was being sent).

This month's access_log file is just over 250 MB. The gzip compressed version is 6.2 MB. The bzip2 compressed version is 3.7 MB. It looks like it doesn't pay to wait the nearly an order of magnitude more time for a result that's less than twice as compressed.

This exercise has just reinforced the point that it pays to actually measure performance in different situations, instead of going with what "feels" right. Premature optimization is the root of all evil [Hoare].

Subscribe

  • 2013 in review

    2013 is the year when everything changed. The biggest event was the birth of our daughter Lily. She was born prematurely in Shanghai while we…

  • 2012 in review

    2012 has been fairly quiet. Maybe it just seems that way because I haven't actually written anything new in this blog since last year's annual…

  • new photo galleries

    I've been busy processing photo galleries from the last year (or two) and putting them online for your perusal. Vancouver 2010 Northland…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 8 comments