Re: apache style squid?

From: Michael O'Reilly <michael@dont-contact.us>
Date: 09 Oct 1997 10:12:00 +0800

"Andres Kroonmaa" <andre@ml.ee> writes:
> > being used. I'm proposing ONLY using thread_create() to create a pool of
> > threads. mutex() locks can then be used to control scheduling of those
> > threads when their context areas have been updated.
>
> I think keeping few thousands of threads stand-by will create lots
> of wasteful
> overhead as they are still considered by scheduler even if idling.

Only if it's a REALLY bad scheduler. If they are sleeping on some
mutex, then they shouldn't be on the run queue.

> Right. There's the difference depending on thread type. Here it
> came up as usermode threads vs kernel threads, that's (un)bound
> threads in solaris terms. Usermode thread switch is very fast,
> basically as fast as library call, while kernel level thread switch
> is as fast as context switch, but it is still faster than process
> switch, I think.

Hmm. The only different between switching processes and switching
kernel threads is that a kernel may not need to load a different VM
map. A pretty small difference at best.

Yes, usermode context switch are fast, but kernel threads vs processes
are about the same on all the ones I've seen measured.

> I'm afraid single control thread will be a bottleneck and quite
> limiting in other ways. Basically you serialize access
> algoritmically, thus loose concurrency. There should be other ways
> possible to minimize lock contention without loosing concurrency.

It's only a bottleneck if
        a) it's relatively slow and
        b) you need to wait for the return value

Neither of these should be the case here as far as I can see.

 
> > Like I said, EXCLUSIVE rights to on object for a sub thread. Any
> > readers piggy backing on can just poll for updates or select on
> > the incoming
> > socket and get woken up when more MAY be there. They can just read the
> > "size of mem_obj" so far. It's an atomic write by the writer. If
> > they get in

    
> Don't be sure. even "Count++" is not atomic and can trash alot if
> not surrounded by mutexes.

It doesn't matter if it's not atomic, as long as the write is
atomic. There's only one writer so when doing
        x = var
        ++x
        var = x <----- atomic

only the last line needs to be atomic, and I don't know of any
machines that allow a context switch inbetween writing the bytes of a
word variable. :)

> Of course, in your case it would not
> matter for reader...But keep in mind that when 2 threads both do
> count++ at the same time without locks, the result may become
> count+1 instead of count+2. Although very unlikely, probabilty is
> not zero and will happen from time to time. eg of this could be
> object_lock, it would be fatal to miss here.

I think you're missing the point. There is only one writer. It's a
pre-condition. You simply set it up so there is only every one writer
for an object, and voila! it happens.

> > > I believe not.
> >
> > I believe so. Read above. Write locking is implicit as it is
> > controlled by the parent.
 
> I'm basing my thoughts on some experiments with threads, where I
> stress-tested interthread communications with locks and hit very
> high context-switch rates, then tried to avoid readlocks. After
> that I got some very bizarre headaches with threads crashing, even
> thread lib crashing taking along whole system. I tried to find
> algoritmic ways to solve this but always either loosed speed or
> stability. Of course, I'm no way cool programmer, thus happy to
> learn more.

In this case we get it easy because there is very little thread
communication.
 
> > Process model -
> > Heavy-weight.
> > Can kill machines under large loads.
> > Expensive context switching. Worse when we have to
> > add locking.
> > Need to carefully manage mmap() shared memory.
> > May not work on all OS's that don't do optimistic swap alloc.
> > Most portable mechanism.
> > Shared memory may cause portability issues.

> simple code. good news for contributors.
> No benefit - no speedup, perprocess limits are changed to per
> system process limits.
> IMHO - dead-end.

To be honest, I'm (like some other people here :) focused on my
situation, which is linux boxen. This means for me:
        processes are as fast as kernel threads.
        The upper limit is huge (8000 odd).
        I can do 45,000 context switches / second on a mid-range box.

This means for me that process model is definately viable..

> > Thread model (with locking)
> > Light weight.
> > Still have FD resource limitations.
> > Possible issue with locking killing benefits.
> > Portability problems.
> > Maps easily to a process model.
> > Doesn't require shared memory.
>
> also simple, although harder to understand and to be careful.
> perprocess limits still here.
> in theory fastest possible.

No, in theory there's no difference. The issues between kernel
threads / user threads / processes are _ALL_ issues of the underlying
OS. There is little inherent speed difference. It's purely the
implementation details.

Also, lockless models are almost always faster than locking
models. Locking is only done becuase lockless designs are normally
bloody complicated and thus difficult to get bugfree.
  
> > Mixed process/thread model
[ .. ]
> most complex.
[ .. ]
> My favourite. ;)

Note that as far as I can see, there is very little code difference
between the process / thread /kernel thread implimentations. If you
code for the hardest (process), then the rest will be trivial in the
extreme.

I say process is the hardest because it assumes the least amount of
sharing, and you need to explicitly track what's shared. If you
implement that, then the rest are a piece of cake. A little
infrastructure, but everything else that needs to be done has been
done.

Michael.
Received on Tue Jul 29 2003 - 13:15:43 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:26 MST