This is my summary of Chapter 29, Lock-Based Concurrent Data Structures, from Operating Systems - Three Easy Pieces. Citation down below.

29.1 Concurrent Counters

Simple but Not Scalable

You can create a simple but not scalable concurrent counter by having one lock for the counter object and acquiring and releasing it every time a counter function is called. This is similar to how a monitor works: it acquires and releases at each object function’s start and end. Not scalable since lots of contention. We want perfect scaling: when more threads and more work is done, time taken doesn’t increase.

Scalable

Now, we can create a scalable concurrent counter. The counter object has a global count and associated lock, each CPU has a local count and an associated lock, and it has a threshold S. When incrementing, local lock is acquired and local count is incremented, and if local count >= S, take global count lock, update global count, and reset local count. Reading: read global count. Has performance-accuracy tradeoff. Smaller threshold means more accuracy but lower speed and other way around. Max accuracy difference = Number of CPUs * threshold.

More concurrency isn’t necessarily faster than a simple solution since concurrency takes time. If I’m unsure, implement both approaches and compare their performance.

29.2 Concurrent Linked Lists

First Design

The text focuses on insert and lookup. List has a global lock and you lock around the insert function and the lookup function. We can make the code better by calling insert’s malloc outside the lock and locking only the critical section. For lookup, only make one possible return path to reduce the number of locking and unlocking points, reducing number of bugs.

Having divergent rarely called code branches causes the most bugs in concurrent programming.

Watch out for control flows inside locked portions of code that lead to return paths or stopping execution, since all the state with locks must be undone at these points too. Best practice is to minimize this type of control flow.

Scaling Linked Lists

An idea to have the linked list design scale is to add hand-over-hand locking, aka lock copuling. Each node has a lock, and when you go to the next node you grab the next node’s lock and release the current nodes lock. However, this has high overheads from acquiring and releasing locks for each node and usually isn’t faster than having a single lock for the list. One approach for further exploration is a lock every few nodes.

29.3 Concurrent Queues

We can design a concurrent queue by making a lock for the head, a lock for the tail, and a dummy node that’s there no matter what. Enqueue takes tail lock and makes the tail the added element, while dequeue takes the head lock and modifies the head.

Bounded queues will be discuss in Chapter 30 - Condition Variables

29.4 Concurrent Hash Table

We can design a concurrent hash table by making each bucket our concurrent linked list. This design scales nearly perfectly while the concurrent linked list does not when measuring performance of thousands of inserts.

29.5 Summary

Avoid premature optimization: optimizing when there is no issue. Simple designs that are sufficient are always the best.

Survey on concurrent data structures: https://people.csail.mit.edu/shanir/publications/concurrent-data-structures.pdf

Citation: Operating Systems: Three Easy Pieces Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau Arpaci-Dusseau Books November, 2023 (Version 1.10)