Re: Re: [squid-users] Squid high bandwidth IO issue (ramdisk SSD)

From: smaugadi <adi_at_binat.net.il>
Date: Tue, 4 Aug 2009 00:35:06 -0700 (PDT)

Dear adrian,
i will try an alternative disk controller and update the result.
Regards,
Adi.

Adrian Chadd-3 wrote:
>
> How much disk IO is going on when the CPU shows 70% IOWAIT? Far too
> much. The CPU time spent in CPU IOWAIT shouldn't be that high. I think
> you really should consider trying an alternative disk controller.
>
>
>
>
> adrian
>
> 2009/8/4 smaugadi <adi_at_binat.net.il>:
>>
>> Dear Adrian and Heinz,
>> Sorry for the delayed replay and thanks for all the help so far.
>> I have tried changing the file system (ext2 and ext3), changed the
>> partitioning geometry (fdisk -H 224 -S 56) as I read that this would
>> improve
>> performance with SSD.
>> I tried ufs, aufs and even coss (downgrade to 2.6). (By the way the
>> average
>> object size is 13KB).
>> And failed!
>>
>> From system monitoring during the squid degradation I saw:
>>
>> /usr/local/bin/iostat -dk -x 1 1000 sdb
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    4.00     0.00    72.00  
>>  36.00
>> 155.13 25209.75 250.25 100.10
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    4.00     0.00    16.00    
>> 8.00
>> 151.50 26265.50 250.50 100.20
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    3.00     0.00    12.00    
>> 8.00
>> 147.49 27211.33 333.33 100.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    4.00     0.00    32.00  
>>  16.00
>> 144.54 28311.25 250.25 100.10
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    4.00     0.00   100.00  
>>  50.00
>> 140.93 29410.25 250.25 100.10
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    4.00     0.00    36.00  
>>  18.00
>> 137.00 30411.25 250.25 100.10
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00    0.00    2.00     0.00     8.00    
>> 8.00
>> 133.29 31252.50 500.50 100.10
>>
>> As soon as the service time increases above 200MS problems start, also
>> the
>> total time for service (time in queue + service time) goes all the way to
>> 32
>> sec.
>>
>> This is from mpstat at the same time:
>>
>> 09:33:56 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>> %idle    intr/s
>> 09:33:58 AM  all    3.00    0.00    2.25   84.02    0.12    2.75    0.00
>> 7.87   9782.00
>> 09:33:58 AM    0    3.98    0.00    2.99   72.64    0.00    3.98    0.00
>> 16.42   3971.00
>> 09:33:58 AM    1    2.01    0.00    1.01   80.40    0.00    1.51    0.00
>> 15.08   1542.00
>> 09:33:58 AM    2    2.51    0.00    2.01   92.96    0.00    2.51    0.00
>> 0.00   1763.50
>> 09:33:58 AM    3    3.02    0.00    3.02   90.95    0.00    3.02    0.00
>> 0.00   2506.00
>>
>> 09:33:58 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>> %idle    intr/s
>> 09:34:00 AM  all    0.50    0.00    0.25   74.12    0.00    0.62    0.00
>> 24.50   3833.50
>> 09:34:00 AM    0    0.50    0.00    0.50    0.00    0.00    1.00    0.00
>> 98.00   2015.00
>> 09:34:00 AM    1    0.50    0.00    0.00   98.51    0.00    1.00    0.00
>> 0.00    544.50
>> 09:34:00 AM    2    0.50    0.00    0.00   99.50    0.00    0.00    0.00
>> 0.00    507.00
>> 09:34:00 AM    3    0.50    0.00    0.00   99.00    0.00    0.50    0.00
>> 0.00    766.50
>>
>> 09:34:00 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>> %idle    intr/s
>> 09:34:02 AM  all    0.12    0.00    0.25   74.53    0.00    0.12    0.00
>> 24.97   1751.50
>> 09:34:02 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00
>> 100.00   1155.50
>> 09:34:02 AM    1    0.00    0.00    0.50   99.50    0.00    0.00    0.00
>> 0.00    230.50
>> 09:34:02 AM    2    0.00    0.00    0.00  100.00    0.00    0.00    0.00
>> 0.00    220.00
>> 09:34:02 AM    3    0.00    0.00    0.50   99.50    0.00    0.00    0.00
>> 0.00    146.00
>>
>> 09:34:02 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
>> %idle    intr/s
>> 09:34:04 AM  all    1.25    0.00    1.50   74.97    0.00    0.00    0.00
>> 22.28   1607.50
>> 09:34:04 AM    0    5.47    0.00    5.47    0.00    0.00    0.00    0.00
>> 89.05   1126.00
>> 09:34:04 AM    1    0.00    0.00    0.00  100.00    0.00    0.00    0.00
>> 0.00    158.50
>> 09:34:04 AM    2    0.00    0.00    0.50   98.51    0.50    0.50    0.00
>> 0.00    175.50
>> 09:34:04 AM    3    0.00    0.00    0.00  100.00    0.00    0.00    0.00
>> 0.00    147.00
>>
>> Well, some times you eat the bear and some times the bears eat you.
>>
>> Do you have any more ideas?
>> Regards,
>> Adi.
>>
>>
>>
>>
>> Adrian Chadd-3 wrote:
>>>
>>> 2009/8/2 Heinz Diehl <htd_at_fancy-poultry.org>:
>>>
>>>> 1. Change cache_dir in squid from ufs to aufs.
>>>
>>> That is almost always a good idea for any decent performance under any
>>> sort of concurrent load. I'd like proof otherwise - if one finds it,
>>> it indicates something which should be fixed.
>>>
>>>> 2. Format /dev/sdb1 with "mkfs.xfs -f -l lazy-count=1,version=2 -i
>>>> attr=2
>>>> -d agcount=4"
>>>> 3. Mount it afterwards using
>>>> "rw,noatime,logbsize=256k,logbufs=2,nobarrier" in fstab.
>>>
>>>> 4. Use cfq as the standard scheduler with the linux kernel
>>>
>>> Just out of curiousity, why these settings? Do you have any research
>>> which shows this?
>>>
>>>> (Btw: on my systems, squid-2.7 is noticeably _a lot_ slower than
>>>> squid-3,
>>>> if the object is not in cache...)
>>>
>>> This is an interesting statement. I can't think of any specific reason
>>> why there should be any particular reason squid-2.7 performs worse
>>> than Squid-3 in this instance. This is the kind of "works by magic"
>>> stuff which deserves investigation so the issue(s) can be fully
>>> understood. Otherwise you may find that a regression creeps up in
>>> later Squid-3 versions because all of the issues weren't fully
>>> understood and documented, and some coder makes a change which they
>>> think won't have as much of an effect as it does. It has certainly
>>> happened before in squid. :)
>>>
>>> So, "more information please."
>>>
>>>
>>>
>>> Adrian
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803136.html
>> Sent from the Squid - Users mailing list archive at Nabble.com.
>>
>>
>
>

-- 
View this message in context: http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803612.html
Sent from the Squid - Users mailing list archive at Nabble.com.
Received on Tue Aug 04 2009 - 07:35:17 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 04 2009 - 12:00:03 MDT