The Optimization Game
Herb Sutter has written recently about optimizing parallel performance of a concurrent queue as part of his ongoing series of articles about concurrency and parallel programming. In his article, he shows how you can empirically test the performance of a parallel component (in this case, a concurrent queue) and then adjust the implementation to maximize the available parallelism from the solution.
How would you write a fast, internally synchronized queue, one that callers can use without any explicit external locking or other synchronization? Let us count the ways...or four of them, at least, and compare their performance. We'll start with a baseline program and then successively apply three optimization techniques, each time stopping to measure each change's relative performance for queue items of different sizes to see how much each trick really bought us.
from Measuring Parallel Performance: Optimizing a Concurrent Queue
I haven't finished reading through the article yet but I definitely think this is going to be another good read. He mentions the necessity of really avoiding races at any cost (as far as using spinlocks on a supposedly lock-free concurrent queue implementation through atomics) and I for one agree that in the interest of stability and correctness, there are some concessions that have to be made.
One other thing that he follows very prudently is the carpenter's advice which is: measure twice, cut once. He doesn't just go blindly guessing where and how performance improvements can be gained in the implementation, but rather use measurements and small incremental changes and sees what kind of performance gains are made.
More about my thoughts on the article in the coming post.



0 comments:
Post a Comment