Going back to the MD5 thoughts last week, which didn't seem conclusive:
One of the suggestions on the Todo list is 'Separate storage of headers
and content' (no reason given).
If this is done, I can see advantages in storing the content in a file
located by the MD5 checksum over itself, with a pointer to it held in
the headers file.
Advantages:
Identical content is only stored once (saving 10% of files in my cache).
Identical content is only transferred once if Content-MD5 header is
provided.
Less effort required with rewrite scripts for multiple mirror sites if
above.
Could perform HEAD request if HEAD contains Set-Cookie, but keep
content.
Disadvantages:
Large transfers have to be held in memory until complete. This happens
anyway - Squid has to detect if the transfer fails partway through and
not keep the file.
Content deletion dangling pointer problem. Only delete when none of the
headers suggest the file should be kept, and then delete all versions of
the headers for the file? Hard links might help but people use multiple
cache partitions. Delete headers and frequently prune for
headerless/logless files? Delete when any reference looks old and
re-request when file not found?
Few web servers (and no FTP servers!) supply Content-MD5. It might be
possible to persuade large archive sites to do so - at the moment,
nobody uses it as it costs them MIPS for no gain. Perhaps if Apache
could provide Content-MD5 only if requestor has a Via: line?
Extra MIPS needed. Most caches don't seem to exercise their CPUs heavily
though.
Can only use Content-MD5 header by aborting request - rather unfriendly
to web server. There's no IMS-like request.
Any comments? Is it worth it for a 10% storage improvement and only a
potential bandwidth improvement (though storage does correspond to
bandwidth or we wouldn't be caching anyway)? It just seems a waste to be
holding multiple copies of things.
Ian Redfern (redferni@logica.com).
Received on Tue Jul 29 2003 - 13:15:43 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:24 MST