Re: CARP for squid 1.2 from Eric Stern on 1998-05-31 (squid-dev)

From: Eric Stern <estern@dont-contact.us>
Date: Sun, 31 May 1998 15:49:31 -0400 (EDT)

--MimeMultipartBoundary
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Sun, 31 May 1998, Alex Rousskov wrote:

> On Sun, 31 May 1998, Eric Stern wrote:
>
> > CARP (Cache Array Routing Protocol) intelligently divides load
> > up between a number of proxy servers.
>
> Just to clarify:
>
> CARP divides the load absolutely un-intelligently. The redirection is based
> on a hash value of a URL. For _example_, URLs with MD5 in the first 30% of
> MD5 range [0, 0.30*2^128] go to proxy1, next 50% go to proxy2, and last 20%
> [0.80*2^128, 2^128] go to proxy3.

Well, I consider it intelligent in that it is consistant, and doesn't just
blindly hand off requests in a round-robin fashion, or some other
similarly mindless technique. I guess you could say that it divides load,
rather than simply spreading it.

> In general, CARP redirector knows nothing about the popularity (frequency of
> accesses) of URLs and _actual_ load on the proxies. Thus, CARP does not do
> "smart" load balancing, resource allocation and access control, and such. If
> the load on individual proxies changes asynchronously OR if you guessed the
> "carp-load-factor"s wrong, CARP may overload one proxy while the other will
> be under-loaded.

True. I've pondered having it automatically fudging the load factor during
operation. You couldn't have it changing all the time or it would probably
defeat the entire mechanism, but it may be possible to have it change the
load factors based on actual metrics (response time?) if it detects a
consistant imbalance over a period of time. Or, possibly even better, the
sibling caches could report their current load to the CARP redirector
periodically, helping it adjust the load factors.

> However, as most simple but general ideas, CARP is a very good solution for
> dividing a stream of requests between proxies when both
> - capabilities of proxies are stable and known a priory
> (these capabilities are reflected in the carp-load-factor in the
> patch),
> - load variation on each of the proxies is synchronized
> or negligible.
>
>
>
> A few questions about the patch:
>
> After quick checking, I failed to find any modifications to
> cf.data.pre in the patch. Please consider moving your notes in README.CARP to
> that file.

Certainly.

> In README.CARP you use "carp-load-factor". The parser checks for
> "carp_load_factor". Also, the last parameter of strncasecmp() looks strange.

Always get the -'s and _'s mixed up. :) The strncasecmp() is wrong, that
came from cutting/pasting code. Fixed it.

> Can we use MD5s instead of computing hash values from scratch? This
> would (a) eliminate quite expensive loop through each character of a url, (b)
> improve distribution of hash values (sum of characters with a shift is not a
> good hashing function). Does CARP specs fix the method of computing that hash
> value?

I considered doing that, but CARP does specify the hash function, so I
implemented it. However, unless you need to interoperate with other CARP
products, you don't really need to stick to spec. I think I'll add a
compile time option to use MD5's or CARPhash.

> Current implementation uses CARP to select among siblings and no
> parents or non-carp siblings are allowed in squid.conf. The idea, as I
> understand it, is to use CARP-enabled Squid as a CARP redirector only rather
> than a caching proxy. I am not sure, but it seems to me that using CARP for

Thats right. In fact, I'm working on a product that is exactly that (a
redirector only), which is why I wrote the CARP patch in the first place.

> selecting parents makes sense as well. That is, CARP could be used in a more
> general form in a _caching_ proxy, similar to the round-robin option... What
> do you think?

I could see using CARP to select parents if you were previously using
round-robin (in this case each parent is equally "expensive"), but
probably not anything else. ie using CARP instead of picking the closest
parent seems silly.

In fact, i can picture a situation where there is no front-end, just 3
caches, where the client load is divided up using some other mechanism. In
this case, you could configure each unit to use CARP, and the hash
function would select either itself, or one of the 2 other caches.
This would eliminate duplication of data in the caches, and each cache
would know which sibling to fetch an object from without doing an ICP
query. This situation is covered in the white paper if I recall correctly.
It shouldn't be too hard to add.

/-----------------------------------------------------------------------/
/ Eric Stern - PacketStorm Technologies - (519) 837-0824 /
/ http://www.packetstorm.on.ca /
/ WebSpeed - a transparent web caching server - available now! /
/-----------------------------------------------------------------------/

--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:50 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:48 MST