On Wed, 10 Dec 1997, Anthony Green wrote:
> Look at squids store log, and produce output on how much data in megabytes
> squid is releasing from the disk cache over a specified time period
Please find a Perl script attached. The script accepts store.log on the
standard input and measures the total swap* and release traffic along with
corresponding mean object sizes. If you want to measure release activity over
a week, just pipe all store.log* for a week through the script.
$ gunzip -c logs/nlanr/sv.store.19971020.log.gz | store.traffic.meter.pl
swap-in: count: 438224 ( 38.24 %) MBs: 3815 ( 74.11 %) Mean: 8.92 KB
swap-out: count: 396603 ( 34.61 %) MBs: 5148 ( 100.00 %) Mean: 13.29 KB
release: count: 311141 ( 27.15 %) MBs: 2264 ( 43.98 %) Mean: 7.45 KB
other: count: 0 ( 0.00 %) MBs: 0 ( 0.00 %) Mean: 0.00 KB
total: count: 1145968 ( 100.00 %) MBs: 11228 ( 218.08 %) Mean: 10.03 KB
See the comments in the script on why it shows 218.08% for total traffic
volume. Do not trust means too much, medians would differ quite a bit. Many
release entries have 0 length objects which may affect the statistics.
Also, I've just put together this script so it has not been tested
intensively or optimized for speed...
> I would like to somehow do this ...
> because with this information you can see if when the proxy cache is full and doing
> LRU replacement how much data it is actually discarding, and therefore judge if
> you need to add more disk cache for a busy proxy.
Let gurus on the list correct me if I am wrong, but I would not rely much on
the RELEASE traffic to estimate "optimal" cache size. If you look through the
logs, objects are often released for reasons _other_ than LRU replacement
(e.g., updates and "reloads"). And even if you filter out those "exceptions"
with a smart script, it is unlikely that you will guess how many hits you
have lost because of those objects being purged from the cache!
Estimating the "best" cache size is very tricky, IMHO. After a certain
[relatively small] threshold, cache "utilization" does not increase with the
cache size. That is, you are getting fewer and fewer hits per GB you adding.
Nevertheless, people continue to increase the size of their caches because,
they say, "it will payoff in a long term". In other words, you are bying disk
space once, but getting hits from it every day. Thus, to find the optimum
size you have to estimate the benefits you are getting from a single hit and
then calculate how much time it will take to pay for added disk capacity.
To estimate how many hits a given cache size generates, you probably need a
trace-driven program that will simulate LRU replacement and other things for
a given cache size (unless you know somebody who already maintains such a
cache in a similar environment). Any better ideas?
Alex.
#!/usr/local/bin/perl -w
use strict;
no integer;
#
# store traffic meter
# Alex Rousskov (rousskov@plains.nodak.edu)
#
# (count, volume) pairs
my @SwapInMeter = (0) x 2;
my @SwapOutMeter = (0) x 2;
my @ReleaseMeter = (0) x 2;
my @OtherMeter = (0) x 2;
# collect statistics
while (<STDIN>) {
my ($size) = (m|\s\d+/(\d+)\s|);
if (/ SWAPIN /) { &NoteAction(\@SwapInMeter, $size); }
elsif (/ SWAPOUT /) { &NoteAction(\@SwapOutMeter, $size); }
elsif (/ RELEASE /) { &NoteAction(\@ReleaseMeter, $size); }
else { &NoteAction(\@OtherMeter, $size); }
}
# get totals
my @TotalMeter = (
$ReleaseMeter[0]+$SwapInMeter[0]+$SwapOutMeter[0]+$OtherMeter[0],
$ReleaseMeter[1]+$SwapInMeter[1]+$SwapOutMeter[1]+$OtherMeter[1]);
# Note: we count "total" volume based on what was written into the cache;
# change that to $TotalMeter[1] if you think it makes more sense
my ($TotalCount, $TotalVolume) = ( $TotalMeter[0], $SwapOutMeter[1] );
# report results
&Report('swap-in', \@SwapInMeter);
&Report('swap-out', \@SwapOutMeter);
&Report('release', \@ReleaseMeter);
&Report('other', \@OtherMeter);
&Report('total', \@TotalMeter);
#exit
exit($TotalMeter[0] == 0);
#
# handy routines
#
sub NoteAction {
my ($meter, $objSize) = @_;
$meter -> [0]++;
$meter -> [1] += $objSize;
}
# Note: "means" are bad for skewed data ("medians" are much better)
sub Report {
my ($label, $meter) = @_;
printf("%-10s count: %10d ( %6.2f %%) MBs: %7d ( %6.2f %%) Mean: %5.2f KB\n",
"$label:",
$meter->[0], &Percent($meter->[0], $TotalCount),
$meter->[1]/(1024*1024), &Percent($meter->[1], $TotalVolume),
$meter->[1] / ($meter->[0] || 1) / 1024.);
}
sub Percent {
my ($part, $whole) = @_;
return -1 unless $whole;
return 100.*$part/$whole;
}
Received on Wed Dec 10 1997 - 08:58:44 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:54 MST