Last time I presented a solution to the Thundering Herd problem: the Random Backoff, but noted a problem with solution. It is quite possible that when a cpu releases a contentious lock, waiting cpus will not wake up immediately. That is, it could take up to a second before a waiting lock tries to obtain the now-free lock. This solution is thus not optimal.

As a follow-up I’d like to present a SMP refinement that I believe solves this problem: isolating and separating all locks so that there is a single lock per cache line. Why do we do this?

Isolated & Separated Locks

Consider two SMP locks: L1 and L2, both on a single cache line. Note: on a 32 bit cpu, a word is 32 bits and a cache line is commonly 8 words, and a single cache line could thus hold 8 locks. If a cpu locks L1, then its cached copy of L1 is updated and the cache copy of L1 for the other cpus is invalidated. A waiting cpu will thus have to get the up to date copy of L1 from main memory. This is an expensive operation in time and bus traffic. If L2 is also on the same cache line, then L1 traffic will invalidate L2 cache copies. The L2 cache invalidates are not necessary for L2 cache coheriency and thus adversly effect system performance.

If we seperate L1 and L2 and place them on different cache lines, then L1 activity does not effect L2 activity and visa versa.

Memory Wasted

The downside of this solution is that memory is wasted. Optimally (as far as memory is concerned) we could pack 8 locks into a single cache line. So for every lock we are wasting 7 words. If we consider Moore’s Law and how the price of memory chips is constantly dropping, I think this solution is worth the price.

So how much does this solution help? Turns out that cache line isolation does not completely solve this problem. It improves the situation, but does not completely elliminate it. So in my next post, I will present another solution to the SMP Random Backoff problem.