Re: ICAP connections under heavy loads

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Sep 2012 08:15:27 -0600

On 09/07/2012 02:32 AM, Alexander Komyagin wrote:
> OK. I agree. It sounds rather reasonable to avoid excess code complexity
> and CPU consuming in order to gain performance for the common case.

I am very glad that we are in agreement here. Will you work on a patch
to fix ConnOpener?

> However, as I stated earlier, the comm.cc problem (actually semantics
> problem) persists. I think it should be documented that second and
> subsequent calls to comm_connect_addr() do not guarantee connection
> establishment unless there was a correct select() notification.

Agreed: We should document what comm_connect_addr() does. However, does
it provide any guarantees even _after_ select() notification? According
to Stevens, the current code may still report that there is no problem
when in fact there is one (and we will detect it later during I/O), right?

Thank you,

Alex.

> On Thu, 2012-09-06 at 11:21 -0600, Alex Rousskov wrote:
>> On 09/06/2012 02:35 AM, Alexander Komyagin wrote:
>>> On Wed, 2012-09-05 at 09:59 -0600, Alex Rousskov wrote:
>>>> On 09/05/2012 09:27 AM, Alexander Komyagin wrote:
>>>>
>>>>> So you think that it's ok for comm_coonect_addr() to return COMM_OK if
>>>>> it was called before the appropriate select() notification. Am I right?
>>>>
>>>> Hard to say for sure since comm_coonect_addr() lacks an API description,
>>>> and there are at least three similar but different ways the function is
>>>> being used by Squid.
>>>>
>>>> One natural way to define this API is to say that it should return
>>>> COMM_OK if and only if the socket is connected (making any select
>>>> notifications irrelevant). However, this definition may be too
>>>> CPU-expensive and/or too unportable to support. And this level of
>>>> certainty may not actually be needed for current Squid needs!
>>>
>>> Maybe you're right, Alex. However, I would prefer this function to have
>>> the very strict semantics: it should return COMM_OK if and only if the
>>> socket is connected (just like you said). Because this way any
>>> upper-level code can rely on it.
>>
>> That would be my preference as well, but I would like to see the costs
>> of doing that first. If strict semantics is not currently needed and is
>> more CPU/portability-expensive than the current code, then we should not
>> perfect the code (beyond documentation). But again and again, I
>> recommend resolving the higher-level (ConnOpener) issue before
>> discussing any of this low-level stuff.
>>
>>
>>>> I would not be surprised if there is some gray area where we cannot
>>>> really tell for sure (without too much additional overhead or
>>>> portability risk) whether the async socket is connected. The Stevens
>>>> book seem to imply that much. Inside that gray area, the function
>>>> should probably return COMM_OK so that the rest of the code works: If we
>>>> guessed wrong, we will get failures during I/O, but the code has to deal
>>>> with those anyway.
>>>
>>> I guess those I/O failures would cost us CPU cycles
>>
>> I/O failures are rare exceptions. We need to optimize the common case or
>> at least not make it worse. The common case does not deal with timeouts
>> and errors.
>>
>>
>>> and make connection problems very-very-very hard to debug.
>>
>> Would not the error be the same, regardless of whether it is discovered
>> during "connect" time or during "write" time?
>>
>>
>>> Also all connection timeouts
>>> become useless if we can't tell for sure whether the socket is really
>>> connected.
>>
>> Why would they become useless? If Squid fails to connect (from Squid
>> point of view!) in X seconds, it should treat this as a timeout and
>> proceed accordingly. There will be vary rare events where Squid point of
>> view would differ from OS point of view, but I do not see (a) why we
>> should care about those very rare events and (b) how we can avoid them
>> completely without implementing Squid inside the kernel.
>>
>>
>>
>>>>
>>>> In other words, if you want to work on this, consider defining the API
>>>> based on current Squid needs and then make sure we support those minimum
>>>> requirements, keeping overheads and portability in mind. However, I
>>>> would _start_ by fixing the calling code first (as it may affect the
>>>> minimum requirements) -- your ConnOpener patch was a step in that direction.
>>>
>>> Amos wrote that calling code is actually OK.
>>
>> My interpretation is that Amos had explained what the code is doing. I
>> did not hear Amos being convinced that the code does what it should. And
>> I think that the code does not do what it should.
>>
>>
>> Cheers,
>>
>> Alex.
>>
>
Received on Fri Sep 07 2012 - 14:15:45 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 07 2012 - 12:00:10 MDT