(cross-posted to squid-users for more minds looking at the problem)
How does one get around the discrepancy caused by IP-based versus name-based
cache object storage when working in a hierarchy of mixed transparent and
proxy-based caches?
As a prime example, if I have a transparent cache implemented at my site,
that means that every object I have a theoretical object in my cache "looks"
like this after a transparently-delivered request (forgive my word wrapping):
--------
856211777.474 2205 206.131.27.68 TCP_MISS/200 1415 GET
http://206.79.203.152/news.html - DIRECT/206.79.203.152 text/html
--------
Now, if someone on my network has their browser "correctly" configured to
use my cache as a proxy, the will create a request that looks like this:
--------
856211822.733 5362 206.205.169.42 TCP_MISS/200 1415 GET
http://www.softwareforum.org/news.html - DIRECT/www.softwareforum.org
text/html
--------
So I have two different "objects" that contain the exact same data, but
their pointers are going to be different, and they'll take up disk space
twice in my cache.
I know how I can get this to work in a "broken" fashion - by putting an
"intercept" routine in my squid cache in front of proxy requests and turning
the name into a number, I can store things consistiently by number within the
cache. However, that may be seen as a sub-optimal solution. (Discussions of
RFC compliance with transparent proxy servers will be left out of this
message, though that war may force itself into any replies...)
I _could_ theoretically do an inverse lookup on the IP address, check to see
if it has a comparable forward (eg: is it correctly in-addr'ed?) and then
store based on the resultant name if successful. However, this also is
sub-optimal, since many web servers do not have inverses correctly
provisioned (eg: look at www.netscape.com's broken inverses as a glaring case
in point.)
Further complicating the matter, and this is the real heart of my question:
How do you communicate with a cache hierarchy of mixed-method caches?
Duplication of objects will be the end result unless someone has a magic
bullet that solves these problems. The "ugly" magic bullet is to turn
everything into IP addresses and then hope that your peer/parent caches have
a goodly population of objects that is IP address based. If not, there will
be significant duplication of objects if they are stored soley on object
name, and your transparency will disasterously detract from the overall
effectiveness of your cache in a hierarchy.
JT
Received on Tue Feb 17 1998 - 16:05:15 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:38:55 MST