Re: [squid-users] Strange misses of cacheable objects [SOLVED]

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 22 Apr 2014 16:43:47 +1200

On 22/04/2014 11:04 a.m., Anatoli wrote:
> OK, found the problem. All the "problematic" objects are from multi-IP
> domains and sometimes the browser resolves them and sends the request to an
> IP that is not in the list (this is for intercept mode).
>
> So, in the browser with http_watch I see that the request for
> http://www.googleadservices.com/pagead/conversion_async.js is sent to
> 173.194.118.122, but in nslookup with set debug option I see:
>
> Name: pagead.l.doubleclick.net
> Addresses: 173.194.118.45
> 173.194.118.58
> 173.194.118.57
> Aliases: www.googleadservices.com
>
> The IP resolved by the browser is not in the list!
>
> So, squid interprets this as a destination IP forgery and doesn't cache the
> response. This behavior is documented at host_verify_strict option. By
> default it's set to off, that's why it's difficult to discover the reason.
> If you set it to on and try to download a problematic object, squid will
> return URI Host Conflict (409 Conflict) and in the access.log you'll see
> TAG_NONE/409 (additionally, with increased debug levels, you'll also see
> security alerts).

The beta releases optimistically had strict verification enabled by
default. Sadly, we had to disable it by default due to a high number of
issues seen with Google and Akamai hosted sites.

>
> This should partly explain the numerous complaints about more-than-expected
> misses.
>
> This is actually a problem, as the IP mismatches are not due to an
> artificially crafted request, but a normal functioning of the DNS and
> different levels of its caching. The reason for IP mismatch should be the
> frequency of DNS updates for these multi-IP domains. Actually, you can see
> with nslookup in debug mode that www.googleadservices.com has the default
> TTL of just 5 min, cdn.clicktale.net of 2 min, google.com of 1 min 25 sec
> and global.ssl.fastly.net of 25 sec. When I restart DNS Client service, I
> get a HIT from squid for almost all of the originally published problematic
> objects without any security alerts, until the IP discrepancies start to
> appear again.
>
> So, it looks like the destination IP forgery check should be relaxed somehow
> (for example, with /24 mask as the majority of the mismatches in the IPs are
> in the last octet) or squid should cache for a long time all the IPs for all
> the domains, just for this forgery check.
>

Unfortunately we are walking a very thin line in the security already
between safe/unsafe actions.

NP: It took over 2 years with multiple people getting involved and
counter-checking each other on use-cases and testing on live traffic to
reach the state we have today. So do not be discouraged by what I'm
about to say below.

> Another (at least as a temporary workaround) option would be to disable this
> check completely as it actually poses very little risk for a correctly
> configured squid with trusted clients. At the same time, an untrusted client
> could request a virus for some known file via his own host and make squid
> this way cache and distribute an infected file to the rest of the clients.

This is not an option. The biggest hurdle resolving this vulnerability
is that *all* clients can be hijacked or subverted - so there are no
trusted clients at all.

>
> The best option, I think, would be for the requests considered as forgery to
> overwrite the destination IP provided by the client with one of the resolved
> IPs for the domain in the Host field (like with client_dst_passthru off).
>

Doing this action is the vulnerability described in CVE-2009-0801.
Any client can send a forged Host header and cause the proxy to resolve
the IP to be a different one. Bypassing *firewall* IP-level protections.

How do you know the Host header contains accurate data?
There are only two guarantees:
 1) that the client was *definitely* fetching from the TCP IP:port.
 2) that the IP:port in #1 does *not* match the server DNS records.

The implication are that this is either a hijacking, or the server moved.

> And here is a patch for this. Please note I haven't done extensive security
> issues verifications,

Please do that before posting patches to bypass security restrictions.
Particularly security restrictions which are so obviously annoying to
many people. We don't exactly like being annoying so there is always a
good reason for it when we are.
<snip>

>
> After applying this patch the hit rate increased significantly for all types
> of objects, not only for those that match refresh_pattern options. No more
> random misses, than hits, then misses again.

NOTE: All clients behind your network are now vulnerable to a 15 line
javascript, or 6 line flash applet which can be embeded in any web page.
All it takes is one client with scripting enabled to run it and the
entire network is hijacked.

As you found already at least one of the major sources of verification
failures is an advertising service (googleadservices). Given that ad
services commonly present scriptlets written by unknown third-parties...

There are infections out there which use this vulnerability. Also a
forwarding loop DoS is just as easy to trigger as cache corruption and
has far more immediate side effects - this effect is used by at least
one security scanning software (by Trend Micro) to detect vulnerable
proxies [by crashing them].

Since you seem to have the ability to find and make patches:
 The only way we know of to safely cache these files is to add the
destination IP+port of the server where the object was fetched to the
cache key. That is expected to raise the HIT ratio somewhat by allowing
"bad" clients to get HITs without corrupting anything for "good"
clients. Lack of time to focus on it has been the main blocker in adding
that.
 Note this will still cause some extra MISS when the DNS used by Squid
and the client are out of sync - as the "bad" objects get cached one for
each untrusted origin.

Also Note that the verification should not place any restrictions to HIT
on content already in the cache. A "bad" fetch can safely be delivered a
HIT cache by an earlier "good" fetch. So sites which are cache friendly
to begin with have a much reduced likelihood of encountering a MISS from
this problem even if the do move IPs.

Unfortunately there are prices to be paid for violating protocols (in
this case TCP). Extra MISS on some traffic is one of them. Just like
losing the ability to authenticate users.

>
> Still, the adobe .exe file was not caching. So I decided to continue the
> investigations and finally found what the problem was.
>
> With adequate debug_options enabled, squid was saying that the object size
> was too big (I've added the CL (Content-Length), SMOS (store_maxobjsize) and
> EO (endOffset) variables to the log line).
>
> 2014/04/21 00:35:35.429| store.cc(1020) checkCachable:
> StoreEntry::checkCachable: NO: too big (CL = 33560984; SMOS = 4194304; EO =
> 268)
>
> Clearly, something was wrong with the maxobjsize, that was set in the config
> to 1Gb and the log was reporting it being set to 4Mb (what I discovered
> later to be the default value).
>
> After some additional research, I found that in the src/cf_parser.cci file
> (generated by make) there are 2 calls to the configuration initialization
> functions for almost all the configuration options - the first one is for
> the predefined (default) values and the second one for the config file
> values. There is a function parse_cachedir (defined in src/cache_cf.cc) that
> initializes the store data structure with the options related to the store
> (like maxobjsize), and it is called when the config parser finds cache_dir
> option in the config and it's not called again when it finds all other cache
> related options. So, if you put in your config something like this (like it
> was in mine):
>
> cache_dir aufs /var/cache 140000 16 256
> maximum_object_size 1 GB
>
> then the maximum_object_size option is processed and you see it at the
> cachemgr config page but it has no effect as the store data structure
> parameter maxobjsize was already initialized (with the default value) by
> parse_cachedir before parsing the "maximum_object_size 1 GB" line, so we
> have 4Mb (default) effective maximum_object_size.
>
> If we have a config with
>
> maximum_object_size 1 GB
> cache_dir aufs /var/cache 140000 16 256
>
> we get the effective maximum_object_size for the store set to 1Gb as
> expected.

Aha. Thank you for tracking this one down. That is a behaviour we have
been looking for for a while.
 I'm still a little unfamilar with the store internals though. Can you
please point me at the place you found the early initialization being done?

>
> There are warnings in the documentation that the order of config options is
> important, but it is only explained in the context of ACLs and other
> unrelated settings. In my opinion, this is a huge problem as it is nothing
> obvious what should precede what. There should be at least a note in the
> documentation for each option affected by the order of config processing and
> there should be a final "all effective values" output at squid
> initialization (maybe with -d 2 and higher) and of course cachemgr config
> page should show correct (effective) values.

Some people complain that dumping over 16KB to the logs (possibly
syslog) on each daemon startup is a bit unfriendly.

The cachemgr "config" report should contain all finalized configuration
settings. Unfortunately that does not show toggle-like and repeated
configuration values nicely.
If it is showing anything inacurate for the cache dir max-size=
parameters that is a bug that needs fixing.

>
> Now it is:
> maximum_object_size @ cachemgr config page: 2147483648 bytes
> Effective maximum_object_size: 4194304 bytes
>
> And a better solution would be to call parse_cachedir (and similar
> functions) at the end of the config file processing (an extremely simple fix
> in the src/cf_parser.cci generation).

FYI: parse_*() and similar *are* the config file processing.

>
> Now, with the patch and the "correct" order of maximum_object_size and
> cache_dir (put cache_dir after all the cache-related options, including
> memory cache ones), all "problematic" objects are cached as expected and
> there is a huge (like 10-fold on average and more than 100-fold for WU and
> similar) increase in the hit rate. Rock-solid caching!
>
> Regards,
> Anatoli
>

Cheers
Amos
Received on Tue Apr 22 2014 - 04:44:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 22 2014 - 12:00:06 MDT