Re: [squid-users] Re: Cache Windows Updates ONLY

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 13 Apr 2014 20:56:15 +1200

On 13/04/2014 7:08 a.m., Nick Hill wrote:
> I have been ironing out issues with my windows updates set-up for
> Squid. I have been through my squid.conf file to de-cruft it.
>
> The following squid.conf should be self-documenting. I have found this
> works well in a multi-computer environment where you can expect a lot
> of Windows machines to perform updates. A computer shop is a good
> example. Of course, you will want to configure a DHCP server with a
> wpad.dat address so that your client machines will auto-configure to
> use your proxy.
>
> The principle difference between this and other configurations is that
> it will cache windows updates even where a query string operates on a
> cab, exe, or other non-dynamic response. I find the query string does
> not change the file contents. (I know - it is possible that it
> could...)
>
> The other feature is that Microsoft conveniently include SHA1 hashes
> in URLs for static content files. Often, these static content files
> will be found at differing locations, and will often be called with
> query strings! Web cache hell! This configuration represents the data
> internally to squid based purely on the SHA1 hash where available. If
> two content items really have a SHA1 match, then you can guarantee
> they are identical. Any successive file accesses from any of the
> windows update domains which match the general SHA1 pattern used in
> windows updates will generate a cache HIT, even where the URL is quite
> different, and irrespective of any cache-bashing query string.
>
> I will monitor the configurations over the next week. Empirically, so
> far, it all works!
> If anyone can see howlers, let me know. Thanks!
>
> #squid.conf file for Squid Cache: Version 3.4.4
> #compiled on Ubuntu with configure options: '--enable-async-io=8'
> '--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap'
> #'--enable-delay-pools' '--enable-underscores' '--enable-icap-client'
> '--enable-follow-x-forwarded-for' '--with-logdir=/var/log/squid3'
> #'--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536'
> '--with-large-files' '--with-default-user=proxy'
> #'--enable-linux-netfilter' '--enable-storeid-rewrite-helpers=file'
>
> #Recommendations: in full production, you may want to set debug
> options from 2 to 1 or 0.
> #You may also want to comment out strip_query_terms off for user privacy
>
> #Explicitly define logs for my compiled version
> cache_store_log /var/log/squid3/store.log
> access_log /var/log/squid3/access.log
> cache_log /var/log/squid3/cache.log
>
> #Lets have a fair bit of debugging info
> debug_options ALL,2
> #Include query strings in logs
> strip_query_terms off
>
> acl all src all
> acl windowsupdate dstdomain .windowsupdate.microsoft.com
> acl windowsupdate dstdomain .c.microsoft.com
> acl windowsupdate dstdomain .ws.microsoft.com
> acl windowsupdate dstdomain .update.microsoft.com
> acl windowsupdate dstdomain images.metaservices.microsoft.com
> acl windowsupdate dstdomain .download.windowsupdate.com
> acl windowsupdate dstdomain wustat.windows.com
> acl windowsupdate dstdomain swcdn.apple.com
> acl windowsupdate dstdomain data-cdn.mbupdates.com
> acl QUERY urlpath_regex cgi-bin \?
>
> #I'm behind a NAT firewall, so I don't need to restrict access
> http_access allow all
>
> #Uncomment these if you have web apps on the local server which auth
> through local ip
> #acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
> #http_access deny to_localhost
>
> visible_hostname myclient.hostname.com
> http_port 3128
>
> #Always optimise bandwidth over hits
> cache_replacement_policy heap LFUDA
> #200Mb max object if not windowsupdate
> maximum_object_size 200000 KB
> #Set these according to your file system
> cache_dir ufs /home/smb/squid/squid 70000 16 256
> coredump_dir /home/smb/squid/squid
>
> refresh_pattern -i
> microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
> 43200 80% 43200 override-lastmod override-expire ignore-reload
> ignore-must-revalidate ignore-private
> refresh_pattern -i
> windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
> 43200 80% 43200 override-lastmod override-expire ignore-reload
> ignore-must-revalidate ignore-private
> refresh_pattern -i
> windows.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
> 43200 80% 43200 override-lastmod override-expire ignore-reload
> ignore-must-revalidate ignore-private

Did your tests find any actual benefits in these "override-lastmod
override-expire ignore-reload ignore-must-revalidate ignore-private"
settings ?

My tests earlier showed the reload-into-ims option was all that was
needed to make update caching behave nicely. It is also the only one of
those options which produces RFC compliant behaviour by the proxy.

> #Default refresh patterns last if no others match
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern . 0 20% 4320
>
> #Directive sets I have been experimenting with
> #override-lastmod override-expire ignore-reload ignore-must-revalidate
> ignore-private
> #reload-into-ims
>
> #Windows updates use a lot of range requests. The only way to deal with this
> #in Squid is to fetch the whole file as soon as requested
> range_offset_limit -1 windowsupdate
> quick_abort_min -1 KB windowsupdate
>
> #Windows update files are HUGE! I have set this to 6Gb.
> #A recent (as of Apr 2014) windows 8 update file is 4Gb
> maximum_object_size 6000000 KB windowsupdate

NP: Squid understands byte units whenever you see "KB" being used in config.

So:
 maximum_object_size 200 MB
 maximum_object_size 6 GB

Which is the first "howler". That directive deoes not take an access
list and only last value set matters. So adding " windowsupdate" to the
6GB line and setting the 200MB value are both just useless text in the
config file.

>
> #My internet connection is not just used for Squid. I want to leave
> #responsive bandwidth for other services. This limits D/L speed
> delay_pools 1
> delay_class 1 1
> delay_access 1 allow all
> delay_parameters 1 1200000/1200000

It is better to use QoS controls in the system network settings that
limit Squid (usually by PID number) than applying a class-1 delay pool
to everything.

>
> #We use the store_id helper to convert windows update file hashes to bare URLs.
> #This way, any fetch for a given hash embedded in the URL will deliver
> the same data
> #You must make your own /etc/squid3/storeid_rewrite instructiosn at end.
> #change the helper program location from
> /usr/local/squid/libexec/storeid_file_rewrite to wherever yours is
> #It is written in PERL, so on most Linux systems, put it somewhere
> convenient, chmod 755 filename
> store_id_program /usr/local/squid/libexec/storeid_file_rewrite
> /etc/squid3/storeid_rewrite
> store_id_children 10 startup=5 idle=3 concurrency=0
> store_id_access allow windowsupdate
> store_id_access deny all
>

concurrency=0 is bad. Although I see this is due to a lack of
concurrency in the helper. Thats a bug which should get fixed.

> #We want to cache windowsupdate URLs which include queries
> #but only those queries which act on an installable file.
> #we don't want to cache queries on asp files as this is a genuine server
> #side query as opposed to a cache breaker
> acl wupdatecachablequery urlpath_regex
> (cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|appxbundle|esd)\?
>
> #Deny caching for URLs matching query but not windowsupdate
> cache deny QUERY !windowsupdate
> #Deny caching for URLs matching query and windowsupdate but not cachable updates
> cache deny QUERY windowsupdate !wupdatecachablequery

What does this help with exactly? Current Squid are prefectly capable of
caching despite query-string presence.
In fact we recommend dropping acl QUERY entirely and adding this right
above the '.' refresh_pattern:
 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

>
> #Given windows update is un-cooperative towards third party
> #methods to reduce network bandwidth, it is safe to presume
> #cache-specific headers or dates significantly differing from
> #system date will be unhelpful
> reply_header_access Date deny windowsupdate
> reply_header_access Age deny windowsupdate

The "given" actually is not true IME. So not a safe assumption.

Bad behaviour in the HTTP/1.1 revalidation by clients is a common side
effect of the override-* and ignore-* options being used on refresh_pattern.
 The overrides used above make Squid ignore the caching boundary
conditions about when objects become stale or expire. So the client
fetch can a) MISS earlier than necessary, or b) HIT on a stale object
with headers indicating it is obsolete well before delivery time -
client DO resolve that by re-fetching with a forced reload. In (a)
refreshing uses full-object bandwidth more frequently than necessary, in
(b) repairing the corrupted objects costs 2x bandwidth a normal MISS
would have cost.

When reload-into-ims is used Squid translates annoying reload behaviour
into friendlier refresh behaviour. At worst Squid is required to do a
revalidation (almost no cost in bandwidth) to update the timestamps on
content delivered to the client. Avoiding problem (b) above entirely is
well worth that (very small) extra time delay on occasional WU.

Caching and revalidation seems in my experience to be performed properly
by the windows update tools. At least in WindowsXP SP2 and Windows 7
which I have tested on.

>
> #Put the two following lines in /etc/squid3/storeid_rewrite ommitting
> the starting hash
> #^http:\/\/.+?\.ws\.microsoft\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
> http://wupdate.squid.local/$1
> #^http:\/\/.+?\.windowsupdate\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
> http://wupdate.squid.local/$1
>

Amos
Received on Sun Apr 13 2014 - 08:56:47 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 13 2014 - 12:00:05 MDT