On Tue, Feb 18, 2014 at 9:28 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 02/18/2014 04:11 PM, Rajiv Desai wrote:
>
>> It seems like there is a bug where an object overwrites previously
>> written object:
>
> This is not really a bug -- hash collisions do happen, as you have
> discovered and Rock store resolves them by overwriting older entries if
> possible (or not caching the newer ones otherwise). The smaller your
> cache, the more collisions you will have for the same number of unique
> cache keys.
>
> How many total slots does your rock cache_dir have?
13107198 (200 GB with 16KB slots).
I think I was thrown off by the hit % in mgr:info stats. Are these
stats accurate when using 8 SMP workers?
Is the % underreported? I couldnt find an absolute value of
hits/misses from stats. :/
Any pointers?
>
>
>> Needs a better hash as pointed out by the TODO :)
>> I will send out a patch.
>
> Thank you. No hash function will eliminate collisions though. If
> somebody wants to eliminate side-effects of hash collisions on recently
> added objects, they would have to implement a different anchor
> allocation algorithm that resolves hash collisions through chains (for
> example). I doubt that would be easy though!
>
>
Yes, I realized that as I read the code further. I will try to give it
a shot but I agree that this will be non trivial.
Currently sharding data across multiple cache_dirs will reduce
collisions, correct?
>> Is the key guaranteed to be a null terminated string?
>
> No. The key is a 128-bit opaque buffer, essentially.
>
>
>> I intend to use std::tr1::hash
>
> Please keep in mind that the key is an MD5 sum already. See
> store_key_md5.cc. I am not a hashing expert by any means, but the MD5
> sum should not really need much further hashing.
>
> It is possible that the k[0]+k[1] sum in the code below should be
> removed in favor of either k[1] or k[0] alone, but again, this may not
> have any significant effect (except for any specific case where the two
> given URLs used to collide).
>
I tried k[0] ^ k[1] which seemed to provide fewer collisions (67
collisions in 39129 GETs).
Haven't quantified and compared i detail though.
>
>> <code>
>> sfileno
>> Ipc::StoreMap::anchorIndexByKey(const cache_key *const key) const
>> {
>> const uint64_t *const k = reinterpret_cast<const uint64_t *>(key);
>> // TODO: use a better hash function
>> return (k[0] + k[1]) % shared->limit;
>> }
>> </code>
>
>
> HTH,
>
> Alex.
>
Received on Wed Feb 19 2014 - 06:30:54 MST
This archive was generated by hypermail 2.2.0 : Wed Feb 19 2014 - 12:00:06 MST