Back to Squid performance questions (LONG)

From: Clifton Royston <cliftonr@dont-contact.us>
Date: Mon, 21 Jun 1999 12:38:57 -1000 (HST)

  I was following the performance thread with a lot of interest. Can
someone with enough experience tell me if the following nuggets I've
gleaned from it are roughly correct?

1) A properly configured Pentium II-based *BSD box (single or
   dual-processor, 350+MHz CPU) running Squid with moderate tuning can
   reasonably be expected to put out 10 Mbit/s or 100 requests/sec
   total "down-stream" traffic. Over 12 Mbit/s or 120 requests/sec is
   probably not sustainable on a single Squid server, regardless of
   processor speed, RAM, or disk speed, so for over 10Mbit/s one should
   start clustering multiple servers.

2) The Squid box should be running no other applications, except its
   own caching nameserver to reduce DNS lookup overhead, and the
   minimum required for system maintenance. (Sshd or telnetd, cron,
   etc.)

3) Up to a certain (unknown?) point of diminishing returns, maximizing
   the available real cache RAM for Squid will be the most effective
   performance increase. The cache RAM setting should be tuned less
   than 1/3 * (real RAM - (OS and other RAM needs)) - e.g. for a 512MB
   RAM machine, the Squid cache parameter should be set no more than
   around 160MB. (160*3 = 480MB, leaving 32MB for the OS, etc.)

4) Correct use of any available OS file system tuning options is going
   to be the next most important factor in maximizing throughput. This
   would include enabling "softupdates" or any similar fast file system
   option available under your OS, setting the noatime/noaccesstime
   option on the file systems used for the cache spool, setting an
   "optimize for time" parameter in tunefs if available, and increasing
   the number of directory inodes cached in RAM.

5) A related factor is the performance gain from spreading out file
   access and seek times across multiple disk spindles, i.e. spreading
   the cache across many drives, maybe even across multiple SCSI
   controllers. In other words, achieved cache performance will be
   significantly greater (maybe nearly doubled) with 6 9Gb drives, vs.
   3 18Gb drives.

6) Finally, peak performance with a standard UNIX file system
   counter-intuitively requires the drives to be kept permanently
   partly empty. Given a certain size of drive or file system, the
   system will actually perform better, if Squid is told to use a
   maximum of 50% of that space rather than 80-90%, because the
   decreased load time from the half-empty file system will greatly
   outweigh the slightly increased number of hits on the fuller file
   system. (Optimum percent full = unknown?)
    
Have I got this right? In general the performance issues sound very
similar to the issues in tuning INN news servers for maximum
throughput.

This brings up a few additional "tweak" questions:

Rules of thumb for directory hashing:

  If you're dedicating a series of 9Gb drives for cache, how many
  top-level directories should each be broken into? Is it better to
  just go for 256 each, to minimize the size of the leaf directories,
  or is some smaller number optimal? Is there any advantage (as there
  are in some hashing schemes) to using a prime number of hash buckets
  (directories) or to using/avoiding powers of two?

Performance drop-off with using RAID 5 vs. standalone disks:

  Depending on the access patterns of particular applications, using a
  RAID system can lead to anything from a sharp increase in performance
  (due to the striping spreading sectors across drives), to a slight
  fall-off, to a sharp decline. However, the big benefit IMHO is that
  a failed disk can't take the SCSI bus down and hence can't take down
  the server. (The protection against loss of data obviously isn't
  very important for caching.) Normally with apps like Squid which do
  their own hashing to distribute workload across disks, the result is
  some decline in performance along with the increase in cost.

  If it's a slight performance decline, I'll take that in exchange for
  the reliability - as I do on our main news server - but if it's a
  huge decline, it might be cheaper to get the reliability by deploying
  multiple Squid servers with non-RAID disk systems. Anyone have any
  perspective on this?

  -- Clifton

-- 
 Clifton Royston  --  LavaNet Systems Architect --  cliftonr@lava.net
        "An absolute monarch would be absolutely wise and good.  
           But no man is strong enough to have no interest.  
             Therefore the best king would be Pure Chance.  
              It is Pure Chance that rules the Universe; 
          therefore, and only therefore, life is good." - AC
Received on Mon Jun 21 1999 - 16:26:10 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:46:57 MST