Michael Pelletier wrote:
> On Mon, 15 Sep 1997, John Line wrote:
> > Duane's analysis of the original problem was that a connection problem
> > occurred at a stage in connection setup when Squid didn't have a timeout
> > set to allow it to clean up after problems, leaving things in a mess. It
> > looks as if the retry patch may introduce a similar problem.
>
> Duane, or someone, do you have the details of what was done to correct the
> problem originally?
Here are the patches which Duane sent me to evaluate, which were incorporated
in 1.NOVM.15 - this should make it clearer which changes were for the FD
problem. (Both are for http.c)
=====
Index: http.c
===================================================================
RCS file: /surf1/CVS/squid/src/http.c,v
retrieving revision 1.143.2.14
diff -w -u -r1.143.2.14 http.c
--- http.c 1997/07/11 21:51:17 1.143.2.14
+++ http.c 1997/07/17 23:23:26
@@ -918,6 +918,11 @@
comm_add_close_handler(httpState->fd,
httpStateFree,
(void *) httpState);
+ commSetSelect(httpState->fd,
+ COMM_SELECT_TIMEOUT,
+ httpReadReplyTimeout,
+ (void *) httpState,
+ Config.connectTimeout);
request->method = orig_request->method;
xstrncpy(request->host, e->host, SQUIDHOSTNAMELEN);
request->port = e->http_port;
Index: http.c
===================================================================
RCS file: /surf1/CVS/squid/src/http.c,v
retrieving revision 1.143.2.14
diff -w -u -r1.143.2.14 http.c
--- http.c 1997/07/11 21:51:17 1.143.2.14
+++ http.c 1997/07/25 22:25:28
@@ -1011,6 +1016,11 @@
comm_add_close_handler(httpState->fd,
httpStateFree,
(void *) httpState);
+ commSetSelect(httpState->fd,
+ COMM_SELECT_TIMEOUT,
+ httpReadReplyTimeout,
+ (void *) httpState,
+ Config.connectTimeout);
httpState->ip_lookup_pending = 1;
ipcache_nbgethostbyname(request->host,
httpState->fd,
=====
Duane noted "So when you're done, there should be a call to
commSetSelect(httpState->fd,
COMM_SELECT_TIMEOUT,
...
in both functions httpStart() and proxyhttpStart()."
> I thought I was taking care of timeouts correctly in
> the connection-retry patch, but maybe not. The timeouts I'm setting
> relate to the connection establishment, and I think that the connection
> timeout handler is set when the filehandle is first initialized, right?
I can't comment on that but the patches may answer the question. I tried
comparing Duane's patches with what the retry patch does, but the retry patches
appear to be working "at a lower level", updating timeouts in data structures
rather than as arguments to connection setup functions.
> Perhaps when the fd is being dup2()'d for the next attempt, something odd
> is happening.
Hmm... that rings alarm bells for me (on Solaris 2.5). A recent change in the
Apache web server hit problems with Solaris 2 because dup-ing sockets leads to
problems (in some unspecified circumstances, maybe not in general) - the
comments in the Apache source says
/* Solaris (probably versions 2.4, 2.5, and 2.5.1 with various levels
* of tcp patches) has some really weird bugs where if you dup the
* socket now it breaks things across SIGHUP restarts. It'll either
* be unable to bind, or it won't respond.
*/
I don't know if that is relevant, or totally unrelated. (Apache appears to dup
the socket FD using fcntl, though, not dup2 - don't know if that makes a
difference.)
Is anyone seeing the problem on anything other than Solaris 2?
> One important question is: when you see these stuck write fd's, do you
> see anything in the log file pertaining to connection retries on that
> address?
After my earlier messages, I tried 1.NOVM.16 + retry patch (Oskar Pearson's
version) for a few hours, but backed off to 1.NOVM.16 without the patch as it
rapidly became clear the problem was still there. cache.log does not show *any*
retries during the time I was running with the retry patch, but the problem
still happened...
John Line
-- University of Cambridge WWW manager account (usually John Line) Send general WWW-related enquiries to webmaster@ucs.cam.ac.ukReceived on Mon Sep 15 1997 - 07:50:02 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:06 MST