[PATCH] Master transaction state object

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 11 May 2013 03:05:18 +1200

Bumping this old design conversation back into gear. I would like to
start implementation and rollout of this master transaction code,
effective immediately.

Attached is a patch implementing a basic MasterXaction class. It is
created by TcpAcceptor when a new client connects and gets passed out
for use when constructing a ConnStateData object to manage the
connection. The ConnStateData object does need to retain its own
pointers to the data received in that initial transaction - so that when
receiving pipelined requests on a persistent connection it can spawn new
MasterXaction objects with the same TCP details as the original.

I've marked all the branching points where MasterXaction could extend
out of ConnStateData with XXX right now instead of rolling MasterXaction
down through the entire codebase. As you can see from this patch already
simply rolling it fully into ConnStateData is causing a lot of polishing
changes to happen. That is likely to be the norm rather than the
exception, so I'm intending to submit each individual step as a separate
patch on trunk with SourceLayout bundled alongside Xaction rollout.

I've also created an array in ConnStateData which will hold a vector of
the active transactions, one for each pipelined request on the
connection. This is intended to replace the "currentobject" list of
ClientSocketContext's currently used for pipeline state. In the final
product the MasterXaction object passed down to each child Job will be
the authoritative source of information about the transaction, and
ConnStateData will be freed up to pre-parse as many pipelined requests
as we want it to and juggle them as internal private state. I've also
polished the pipeline_prefetch code to use a #define macro which can be
altered at compile-time for experiments instead of hard-coded magic
numbers. And added a few XXX to start documenting what needs fixing there.

If you like the design I will commit before passing it on to the
sub-Jobs ConnStateData makes use of.

Amos

On 21/05/2011 5:16 a.m., Alex Rousskov wrote:
> On 05/20/2011 01:31 AM, Amos Jeffries wrote:
>
>> For starters. This is probably 3.3. We can continue to hack our way
>> around data passing limitations in 3.2. Although with Alex emphasis on
>> optimization going into 3.2, some of this may help that.
>>
>>
>> THE RANT:
>>
>> The more I've been looking at the client-side the more I see duplicated
>> copies and re-copies of transaction details. I know you all agree there
>> is too much of it.
>>
>> What I've seen myself...
>>
>> ACLChecklist - storing a copy of as many currently known transaction
>> details as people have asked to be checked so far.
>>
>> HttpRequest - storing copy of *almost* all transaction details.
>>
>> ClientHttpRequest - copy of the transaction details needed by client-side
>>
>> ConnSateData - copy of the transaction TCP details and some
>> HttpRequest details needed by pinning, 1xx and other weird state handling.
>>
>> AccessLogEntry - copy of all most known transaction details.
> There is also the opposite side of the same coin -- code that does not
> care about the HTTP request details but cares about some master
> transaction information is currently forced to deal with the whole
> HttpRequest structure, increasing the number of dependencies and
> complications associated with changing the request.
>
>
>> ... and by "copy" I mean complete duplicate. xstrdup() galore etc.
>> With a bunch of code doing nothing but checking the local copy is up to
>> date with whatever other copy or function parameter it sourced the
>> detail from. A bunch of other code *assuming* that its getting the right
>> details (sometimes wrongly).
>>
>> * Have not yet got a good look at the reply handling path in detail yet.
>> Overall it seems to be using the request-path objects in reverse. So no
>> worse or better.
> Yes, the XXXs and TODOs associated with the lack of "master transaction"
> class exist on all Squid sides.
>
>
>> IDEAS:
>>
>> Note that ClientHttpRequest has a member copy of AccessLogEntry. This is
>> *already* available and unique on a per-request basis from the very
>> start of the HTTP request arrival and parsing. Persists across the whole
>> transaction lifetime and is used for logging at the end.
> I suspect it would be easier and overall better to implement and review
> a good master transaction class from scratch than to modify and existing
> class, especially such rotten class as AccessLogEntry.
>
> The focus of the master transaction class is on persistence
> (refcounting) and isolation of information (each module does not need to
> know about all others). This is very different from AccessLogEntry.
>
>
>> I propose that the first thing we do is clean up its internal structure
>> design. To make sure it has all the fields we will need in the net step.
>>
>> I propose then to rename as a general-purpose transaction storage area
>> (TransactionDetails?). To avoid people ignoring it as a "logging-only"
>> thing.
>>
>> I propose then to roll each step/object along the transaction pathway
>> to using it as their primary storage area for transaction details and
>> history.
>>
>> - incremental so can be done in the background for low impact starting
>> immediately.
>> - will soon lead to removal of several useless copies.
>> - will mean component/Jobs updated are guaranteed to have *all* details
>> for the current state of the transaction available should they need it.
> I would rather see a new MasterXact class added. It may have
> AccessLogEntry as a member, but it will not immediately inherit all the
> AccessLogEntry problems.
>

Ack. With my now slightly better knowledge of where and how
AccessLogEntry is used, I am thinking AccessLogEntry should probably
inherit from MasterXact. Such that it grows all the master classes
fields and the old rotten ALE structure can slowly be superceded.

>
>> NOTE: little fine-detail processing pathways like ident will only need
>> a selected refcount/cbdata/locked sub-child of the whole slab object.
>> This is fine and will help drop dependencies. Thus the proposed modular
>> hierarchy structure below.
> The proposed structure does not work for isolating the components from
> each other. And one cannot refcount/cbdata/lock a sub-structure.
>
> If we want to truly isolate components, we should probably follow this
> sketch:
>
> class Component1;
> class Component2;
> class Component3;
> class Component4;
>
> class MasterXaction: public RefCountable {
> public:
> Component1 *component1;
> Component2 *component2;
> Component3 *component3;
> Component4 *component4;
> ...
>
> const InstanceId<MasterXact> id;
> };
>
>
> The above can be compiled and linked without knowing anything about
> individual Component classes. The code that needs access to componentN,
> will include ComponentN.h and MasterXact.h. "ComponentN" is not a real
> name; the actual name will be specific to the component purpose, of
> course. For example, there may be AdaptationDetails, AuthenticationInfo,
> and similar components.
>
> Simple components that do not use complex data member types can be
> inlined to avoid additional memory allocation, like you show in your
> sketch below.
>
> An important goal here would be to eventually move everything not
> related to a single HTTP request from HttpRequest to MasterXaction and
> change most of the code not related to handling HTTP requests to use
> MasterXaction as the primary source of information.
>
> With the above sketch, it is not necessary to lock individual
> components. We can pass MasterXaction objects around instead.
>
> I am not sure exactly how to make the master transaction both refcounted
> and cbdata-safe, but it is probably possible.

I'm not sure if we need to CBDATA it. We can look at that later if we
find a reason to of course. We can RefCount and CbDataCount an object if
necessary, it is just super messy.

My current thinking is that we need AsyncJob to take MasterXact as a
parameter to constructor for all objects, each object then retains a
pointer to it and a pointer to its own state data ComponentX object. The
pointer to MasterXact is only necessary to pass on to child Jobs or
access other components data. For some Jobs this may involve abstracting
their member fields out into what you descrie as ComponentX objects
which can be pointed to be the MasterXaction and Job without RefCount loops.
  We can RefCount MasterXact and guarantee that it exists so long as all
Jobs using it are running.

>
>> To kick-start things this is what I've been thinking we need its
>> structure to look like:
>>
>> class TransactionDetails {
>>
>> class TimeDetails {
>> // all the timing and wait stats we can dream up.
>> // for the transaction as a whole.
>> // specific times stay in their own component.
>> } time;
>>
>> // Details about the TCP links used by this transaction.
>> class TcpDetails {
>> struct { FD, ip, port, eui, ident } client;
>> struct s_ { FP, ip, port, eui, ident } server;
>> vector<s_> serverHistory;
>> // NP: not sure if we want a server-side history
>> // if so it would go here listing all outbound attempts.
>> } tcp;
>>
>> class SquidInternalDetails {
>> // which worker/disker served this request?
>> ext_acl; // details from external ACL tested
>> auth; // details from proxy-auth helpers
>> status; // status flags hit/miss/peer/aborted/timeout etc
>> hier; // heirarchy details, HierarchyLogEntry
>> } squid;
>>
>> // Details about the ICP used by this transaction.
>> class IcpDetails {
>> icp_opcode opcode;
>> }
>>
>> // Details about the HTCP used by this transaction.
>> class HtcpDetails {
>> htcp_opcode? opcode;
>> }
>>
>> // Details about the HTTP used by this transaction.
>> class HttpDetails {
>> // currently to be used request/reply.
>> // points to the later specific objects
>> HttpRequestPointer request;
>> HttpReplyPointer reply;
>>
>> // specific state objects
>> HttpRequestPointer original_request; // original received
>> HttpRequestPointer adapted_request; // after adaptation
>>
>> HttpRequestPointer original_reply; // original received
>> HttpRequestPointer adapted_reply; // after adaptation
>> // NP: original reply may be nil if non-HTTP source.
>> // in which case...
>> HttpRequestPointer generated_reply; // pre-adaptation.
>> } http;
>>
>> // Details about the adaptation used by this transaction.
>> class AdaptationDetails {
>> { ...} icap; // icap state and history, pretty much as-is
>> {...} ecap; // ecap state if we find anythig to log.
>> } adapt;
>>
>> // Details about the FTP used by this transaction.
>> class FtpDetails {
>> vector<String> protoLog; // FTP msgs used in this fetch.
>> }
>>
>> // ... other entries similar to FTP for gopher, wais, etc.
>> }
>>
>> NOTE that "headers", "private" and "cache" are gone.
>> - "headers" blobs are part of HttpRequest (or should be)
>> - "private" is duplicate of HttpRequest details
>> - "cache" is split into whichever component is actually relevant for
>> the particular field.
> Individual parts sound like a good starting point to me, with the
> exception of SquidInternalDetails which has too vague scope and should
> probably be split or merged with others.
>
> Some fields may need stricter descriptions (e.g., there can be many
> adapted requests if there is an adaptation chain). And we will probably
> add more (e.g., request_sent and response_sent?).
>
> We should probably discuss specific fields later.
>
>
> Cheers,
>
> Alex.
>

Received on Fri May 10 2013 - 15:05:27 MDT

This archive was generated by hypermail 2.2.0 : Fri May 10 2013 - 12:00:08 MDT