Any Idea as to what could be the issue - since the last post 2 days ago the memory footprint has increased by 1.2GB to 9424592Kb.
With now 788561 hot objects (from 651556)
The number of StoreEntry-pool objects (which I assume is the number of real objects in memory cache) has even decreased (from 173746 to 173624).
I created a full dump of mgr:vm_objects and there I find 789431 KEYS (so in principle close to the number of hot objects).
The question is: could we infer some information from this output?
Here some statistics on those Objects:
Distribution of 1st line after KEY:
Count line1
718698 STORE_OK IN_MEMORY SWAPOUT_DONE PING_DONE
63156 STORE_OK IN_MEMORY SWAPOUT_DONE PING_NONE
7516 STORE_OK IN_MEMORY SWAPOUT_NONE PING_DONE
51 STORE_OK IN_MEMORY SWAPOUT_NONE PING_NONE
6 STORE_OK NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
3 STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
1 STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_NONE
Distribution of 2nd line after KEY:
Count line2
515372 REVALIDATE,CACHABLE,DISPATCHED,VALIDATED
237538 CACHABLE,DISPATCHED,VALIDATED
28944 CACHABLE,VALIDATED
7048 CACHABLE,DISPATCHED,NEGCACHED,VALIDATED
468 REVALIDATE,CACHABLE,DISPATCHED,NEGCACHED,VALIDATED
51 SPECIAL,CACHABLE,VALIDATED
5 RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
2 REVALIDATE,RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
2 CACHABLE,DISPATCHED,PRIVATE,FWD_HDR_WAIT,VALIDATED
1 DELAY_SENDING,RELEASE_REQUEST,PRIVATE,VALIDATED
Here the count of objects that have the same URL:
Obj_count number of URL occurences
720711 1
23276 2
2216 3
1134 4
588 5
283 6
214 7
111 8
72 9
70 10
81 11
37 12
30 13
21 14
4 15
2 16
3 17
5 18
10 19
1 20
1 21
1 22
2 25
2 28
(>1 would indicate a VARY policy is in place and we have multiple objects)
Objects with vary_headers in object: 40591
If I sum up the "inmem_hi:" values I get: 2918369522, so 2.9GB.
So it seems as if there must be some major overhead for those inmem objects...
If I look at "locks, * clients, * refs" and there specifically at the refs value I get the following distribution:
Obj_count ref_val
12240 0
592487 1
78355 2
25285 3
12901 4
8173 5
5787 6
4100 7
3143 8
2541 9
2318 10
1859 11
1725 12
1470 13
1275 14
1231 15
1042 16
867 17
853 18
723 19
643 20
669 21
631 22
574 23
496 24
469 25
431 26
423 27
464 28
394 29
357 30
368 31
350 32
315 33
330 34
280 35
299 36
239 37
264 38
218 39
...
1 65000
1 65017
1 65028
1 65074
1 65089
1 65183
1 65248
1 65299
1 65364
1 65521
As for Expiry-times - here the Expiry-time in days relative to "now" with the number of objects in cache:
Obj_count EXP days in the past
42511 -16267
12585 -6199
1 -209
1 -172
1 -171
2 -169
1 -157
1 -149
1635 -85
2233 -84
701 -83
388 -82
336 -81
234 -80
175 -79
139 -78
88 -77
85 -76
63 -75
82 -74
58 -73
48 -72
49 -71
49 -70
32 -69
50 -68
20 -67
25 -66
32 -65
49 -64
22 -63
39 -62
32 -61
32 -60
19 -59
13 -58
9 -57
10 -56
24 -55
14 -54
47 -53
24 -52
27 -51
24 -50
17 -49
36 -48
75 -47
38 -46
58 -45
61 -44
14 -43
55 -42
23 -41
27 -40
42 -39
53 -38
46 -37
68 -36
101 -35
52 -34
52 -33
35 -32
88 -31
39 -30
39 -29
58 -28
86 -27
77 -26
83 -25
83 -24
77 -23
79 -22
123 -21
123 -20
176 -19
128 -18
170 -17
141 -16
153 -15
144 -14
101 -13
122 -12
342 -11
220 -10
177 -9
212 -8
27001 -7
61767 -6
71550 -5
79084 -4
82293 -3
91091 -2
113077 -1
102432 -0
79068 0
13197 1
1 8
286 168
121 169
57 170
30 171
17 172
114 173
610 174
656 175
325 176
169 177
245 178
198 179
55 180
30 233
7 234
3 246
3 269
1 288
1 317
1 331
1 336
1 340
1 343
3 349
3 350
1 352
1 353
1 355
4 358
2 360
1 361
2 362
3 363
2 364
7 365
3 3376
1 3650
So of all the all the 789431 objects 694199 Objects have EX:... in the past - that is 88% of all objects!
And 65904 of those have an expiry date older than the start of the squid process.
LastUpdated distribution shows: the following distribution:
Obj_count last updated:
195 0
6056 -0
10468 -1
6085 -2
4321 -3
8896 -4
5172 -5
7925 -6
13158 -7
9479 -8
1368 -9
826 -10
681 -11
376 -12
2489 -13
63 -14
9305 -15
1912 -16
1804 -18
630 -19
2982 -20
11 -21
1171 -22
4629 -23
1 -24
7194 -25
275 -27
4 -28
12798 -29
3024 -30
5054 -32
...
1 -3288
1 -3290
1 -3307
6 -3327
4 -3374
33 -3375
3 -3381
25 -3390
2 -4547
164525 -16267
And Last referenced:
Objcount days ago
36823 0
140468 -0
127974 -1
104453 -2
86550 -3
87259 -4
79582 -5
77286 -6
49036 -7
In summary the way that I interpret it is:
* it seems as if the memory_overhead per cache_object is quite high
* there seem to be a lot of objects that have expired but have never been evicted from cache
* possibly eviction does not happen because the calculated cache size is only 2.9GB with 4GB being configured as Max memory...
So the question is: why do we underestimate memory_object sizes by a factor of aproximately 2?
Does this help with the analysis?
Thanks,
Martin
-----Original Message-----
From: Martin Sperl
Sent: Montag, 14. Juli 2014 13:37
To: Amos Jeffries; squid-users_at_squid-cache.org
Subject: RE: [squid-users] squid: Memory utilization higher than expected since moving from 3.3 to 3.4 and Vary: working
Hi!
I did a bit of an analysis of the data gathered so far.
Current status: 8236072KB of allocated memory by squid since restart of squid on the 8th, so about 5-6 days.
The following memory pools have most of an increase in the last 2 days (>100kB):
Type-date KB-20140712 KB-20140714 KB-Delta Cnt-20140712 Cnt-20140714 Cnt-Delta
Total 5629096 7494319 1865223 26439770 33704210 7264440
mem_node 2375038 3192370 817332 588017 790374 202357
4K Buffer 1138460 1499996 361536 284615 374999 90384
Short Strings 456219 606107 149888 11679188 15516319 3837131
16K Buffer 213120 323120 110000 13320 20195 6875
HttpHeaderEntry 312495 415162 102667 5714194 7591516 1877322
2K Buffer 249876 351226 101350 124938 175613 50675
8K Buffer 135944 182360 46416 16993 22795 5802
HttpReply 133991 178174 44183 490023 651607 161584
MemObject 114845 152713 37868 490004 651575 161571
Medium Strings 90893 120859 29966 727141 966866 239725
cbdata BodyPipe (39) 65367 88238 22871 440363 594443 154080
HttpHdrCc 41486 55327 13841 442515 590153 147638
32K Buffer 23584 35360 11776 737 1105 368
cbdata MemBuf (13) 30627 40726 10099 490026 651615 161589
LRU policy node 46068 49871 3803 1965553 2127797 162244
64K Buffer 1664 2240 576 26 35 9
Long Strings 1444 2007 563 2888 4014 1126
StoreEntry 173530 173746 216 1480781 1482628 1847
All of those have linear increases.
They also show similar "wavy" behavior - when one has a "bump" then some of the others also have a Bump.
So now there are several "groups":
* pools that stay constant (wordlist,...)
* pools that show variability like our traffic-curves (Comm::Connections)
* pools that increase minimally (starting at 80% of current KB 2 days ago) (ip_cache_entry, LRU_policy_node)
* pool that increases a bit (starting at 35% of current KB 2days ago) fqdncache_entry
* Pools that increase a lot (starting at below 20% of the currend KB 2 days ago) - which are (sorted from Biggest to smallest KB footprint):
** mem_node
** 4K Buffer
** Short Strings
** HttpHeaderEntry
** 2K Buffer
** 16K Buffer
** 8K Buffer
** Http Reply
** Mem Object
** Medium Strings
** cbdata BodyPipe (39)
** HttpHdrCc
** cbdata MemBuff(13)
** 32K Buffer
** Long Strings
So there must be something that links all of those in the last group together.
If you again look at the delta of the % between hours one can find that most of those show again a "traffic-curve" pattern in the increase (which is the wavy part I was talking about earlier)
All of the pools in this specific group show (again) similar behavior with similar ratios.
So it seems to me as we keeping too much information in our cache, which never gets evicted - as I had said earlier: my guess would be the extra info to manage "Vary" possibly related to some cleanup processes not evicting all the "related" objects in cache...
This is when I started to look at some other variation reported in other values.
So here the values of "StoreEntries" for the last few days:
20140709-020001: 1472007 StoreEntries
20140710-020001: 1475545 StoreEntries
20140711-020001: 1478025 StoreEntries
20140712-020001: 1480771 StoreEntries
20140713-020001: 1481721 StoreEntries
20140714-020001: 1482608 StoreEntries
These stayed almost constant...
But looking at " StoreEntries with MemObjects" the picture is totally different.
20140709-020001: 128542 StoreEntries with MemObjects
20140710-020001: 275923 StoreEntries with MemObjects
20140711-020001: 387990 StoreEntries with MemObjects
20140712-020001: 489994 StoreEntries with MemObjects
20140713-020001: 571872 StoreEntries with MemObjects
20140714-020001: 651560 StoreEntries with MemObjects
And "on disk objects":
20140709-020001: 1470163 on-disk objects
20140710-020001: 1472215 on-disk objects
20140711-020001: 1473671 on-disk objects
20140712-020001: 1475614 on-disk objects
20140713-020001: 1475933 on-disk objects
20140714-020001: 1476291 on-disk objects
(constant again)
And " Hot Object Cache Items":
20140709-020001: 128532 Hot Object Cache Items
20140710-020001: 275907 Hot Object Cache Items
20140711-020001: 387985 Hot Object Cache Items
20140712-020001: 489989 Hot Object Cache Items
20140713-020001: 571862 Hot Object Cache Items
20140714-020001: 651556 Hot Object Cache Items
So if you look at the finer details and traffic pattern we again see that traffic pattern for:
* storeEntries with MemObjects
* Hot Object Cache Items
And these show similar behavior to the pools mentioned above.
The other 2 types stay fairly constant and also decrease in count.
So maybe all this gives additional evidence which objects are using so much more memory.
Did this give any hints?
Do you want to see any other data gathered?
Martin
This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp
Received on Wed Jul 16 2014 - 08:54:25 MDT
This archive was generated by hypermail 2.2.0 : Wed Jul 16 2014 - 12:00:18 MDT