ACL based selection of cache_dir from Andres Kroonmaa on 2000-09-20 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 20 Sep 2000 12:06:09 +0200

What do you think of allowing cache_dir selection to be controlled
by configuration and specifically ACL's?

Ideally, we should also be able to select cache_dir based on object
size, but for now this is not possible.

This would allow for some quite interesting options.

Disks are pretty fast at outer areas of spindle, and when seeks are
limited. Large disks become cheeper, so that it is soon not uncommon
to have squid boxes with 150GB of disks. Problem is that squid consumes
alot of ram per object and would need about 2-4GB of ram.
This can be worked around if average object size in squid is pretty large.
Yet as large objects are relatively rare, it is desirable to keep them
separate from typical web objects.
Or, it is desirable to be able to split large objects and small objects
into different cache_dirs.

regex-based acl's are easy to implement to select a cache_dir, and it
alone is already quite helpful feature.
We here have cheep local bandwidth and expensive international. I guess
this is very common. People tend to use cache for everything, and we
try hard to keep cache from getting polluted by local objects. Yet it
would be nice to have some store for local objects too, but it should
be strictly controlled.

We could have ACL that would match object and place it into separate
cache_dir of limited size. Or we could configure most hot stuff to be
placed on a ramdrive.

We could reserve .com objects a separate cache_dir, and keep the rest
separate, on less optimal places on disks (inner parts of a spindle)
Having less frequent references, they have less impact on overall
performance, yet keeping most hot stuff tightly together we can
increase performance for most often used stuff.

Also, on very large disks, (like 73GB), we can force known large
objects (mp3, zip, exe, cab, etc) to be placed on vast inner area of
disk, increasing average object size, and keeping those large files
for longer time. infrequent access to them, and their large size
reduces performance impact of not being cached by OS filesystem.

Ideally, we should be able to make difference between objects of
<16KB and >16KB. This could allow us to force small objects into
squid-fs that is optimised for small objects (like fifo-fs) and
let the rest be handled by UFS.
In ideal case, we wouldn't even need to bother with fragmentation
of squid-fs, we simply use several fs's with differing block sizes,
and let squid place objects of optimal size to corresponding FS.
We could make 1 FS with blocksize of 512b, another with 2KB, one
more with 8KB, etc. We wouldn't even need to handle subblock
fragments, multiblock objects, etc. We'd have direct mapping
between filenumber and object location on disk.

To implement, we need to buffer at least some amount of object
before we start swapout. Ideally, this size should be option
in config file. We should buffer this amount of object in ram,
and if it exceeds max size, we start swapout to FS that is
configured to handle large files. If object fits in its full
into this ram buffer, we can use ACL's to match min/max size
by which cache_dir selection will be made, and swapout done
in one shot into an optimal FS.

comments and critisism welcome.

I'm planning to implement ACL'based selection, and while looking
at the code, I'm seeing several things to solve.
First I'd need to add squid.conf directives, cache_dir_access
seems logical for ACL's. Currently, FS type is configured on the
same line as cache_dir. Maybe we should split this, as fifo-fs
has no L1/L2 configuration, and we might want other different
config directives depending on FS type. For eg. starting and
ending disk block for direct-mapped fs, max and min object size,
etc. Therefore, maybe it is reasonable to add cache_dir_conf
directive that defines all the specifics for the cache_dir?
Or should we make configuration directives optional on the
cache_dir line as in cache_peer configuration?

Another problem is disk loadsharing. Currently disk selection
builds a list of cache_dirs and loadshares between them. When
adding ACL selection to that, should we build a list of matched
cache_dirs before loadsharing, or after loadsharing list, or
even both before and after?

How would placing objects on different cache_dirs of different
sizes and content interact with replacement policies? I'm planning
to play with this on 2.3, or should I reconsider and try with 2.4?

Ability to force objects on different disks could easily result
in installations with dozens of cache_dirs. How good is squid
at handling very many cache_dirs?

thanks,

------------------------------------
Andres Kroonmaa <andre@online.ee>
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Wed Sep 20 2000 - 04:08:46 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:37 MST