The advent of multicore processors as the standard computing platform will force major changes in software design.
Despite the statement "It is easy to see that the lock-free stack is linearizable", in fact the lock-free stack implementation presented here is not linearizable and will not work reliably. Consider that the starting state of the stack 'stack' is a single node which we shall call 'A'. A call to stack.pop() occurs in thread #1. After line 17 of figure 3 the variable 'oldTop' will refer to node 'A' and the variable 'newTop' will be null. Between lines 17 and 18 the scheduler suspends execution of thread 1 and resumes execution of thread 2, which proceeds to execute the following sequence:
x = stack.pop();
y = new Node;
such that the stack now contains the original 'A' node whose 'next' field pointrs to a second node which we shall call 'B'. At this point the scheduler resumes the execution of thread #1. The compareAndSet operation on line 18 will find that 'top' is, indeed, equal to 'oldTop' (since it is not "still", but, rather, "once again" 'A') so it will proceed to replace it with 'newTop', in other words, null. Node 'B' will thus be lost. This might be detected between lines 18 and 19 with
assert(oldTop.next == newTop);
The basic problem here is that the stack data structure is not just the head of the list (the 'top' variable) but also all of the 'next' links in all of the nodes on the stack at any given time. Consequently, it is necessary to synchronize access to *all* of these variables, which is most likely a losing proposition.
Even if you allow that the node objects are only created by push() there is still a very real chance that the node created to hold the content (or "payload") of 'A' in this example will have the same object identity (i.e., ==) the node created by the push() that follows the push() for 'B'.
Thanks Roger. You are correct, this is not working code, which is why I make sure to say it is Java Pseudocode.
The problem you mention is the classic ABA problem with respect to the use of a CAS operation (see section dealing with ABA in the "Art of Multiprocessor Programming.") It is beyond the scope of the article to deal with ABA, as this would require defining the variables as atomicStampedReferences, which have a weird Java interface that would completely clutter the code and presentation.
I hope you will forgive me for not putting in a specific comment about this, another reader also emailed me about this issue. I did not realize it would be an issue given that this is not intended to be working code.
Thanks for your comment. -- Nir Shavit
Is there or will there be a paper describing the "Pool Made of Stacks" data structure?
Seems to me that this structure can also function as a NZI (non-zero indicator) with very little modification.
I am not sure if I misunderstood the argument made by Roger, but stack.push method takes the value, not the node itself as an argument. The node object is created by the push method and passed along to tryPush - this ensures that only the thread doing the push has the reference to that node object, and no other thread. Similarly, tryPop discards the node object, and returns only the value, and not the node itself. This further implies that during the entire lifetime of the stack, a node can appear at only one "position" in the stack - in other words, the node's "next" field never changes after the node is created and placed in the stack for the first time. So, the argument that a different "next" field results in the node B above being lost does not hold.
Of course, nodes don't just get created forever - they are reused since managed environments like Java have automatic memory management. But the very nature of the garbage collector ensures that a node (i.e. a memory location for that node object) can only be reused if there are no more references to that node.
So, as far as I can see, the code in the paper really is working code.
Sorry if I missed something, thanks -- Aleksandar Prokopec
You might be interested in my response to this article "Concurrent Stack Does Not Exist":
The following letter was published in the Letters to the Editor in the June 2011 CACM (http://cacm.acm.org/magazines/2011/6/108669).
Nir Shavit's article "Data Structures in the Multicore Age" (Mar. 2011) was a thrill to read; so, too, was the timely Viewpoint "Managing Time" by Peter J. Denning, the article "Understanding Scam Victims: Seven Principles for Systems Security" by Frank Stajano and Paul Wilson, the Technical Perspective "VL2" by Jennifer Rexford, and the article "VL2: A Scalable and Flexible Data Center Network" by Albert Greenberg et al.
I was especially intrigued by Shavit's exposition on concurrent objects. I first encountered lock-free and wait-free protocols in a 1993 article by Maurice Herlihy,(1) using load-linked and store-conditional instructions. Thanks now to Shavit for explaining additional techniques, including exchangers and balancers.
Also important is to point out that Shavit's pseudocode for the lock-free stack included an unstated restriction — that a node, once popped, never goes back on the stack — the ABA problem; see the related Communications blog page http://cacm.acm.org/magazines/2011/3/105308-data-structures-in-the-multicore-age/comments, as well as Herlihy's and Shavit's 2008 book.(2)
The key point is that concurrent data structures are more difficult than they might first seem, as I discussed in my own 1995 article(3) and that it is easy to introduce subtle bugs, especially when all important assumptions are not stated or enforced; in this case, that a node, once popped, never goes back on the stack.
Joseph P. Skudlarek
Lake Oswego, OR
(1) Herlihy, M. P. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (Nov. 1993).
(2) Herlihy, M. P. and Shavit, N. The Art of Multiprocessor Programming. Morgan Kaufmann, San Mateo, CA, 2008.
(3) Skudlarek, J. P. Notes on a methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (Jan. 1995).
The following letter was published in the Letters to the Editor in the May 2011 CACM (http://cacm.acm.org/magazines/2011/5/107681).
The lock-free pop operation Nir Shavit described in his article "Data Structures in the Multicore Age" (Mar. 2011) depends on the semantics of the Java implementation in an important way. The push operation allocates a new node object during the call, and it is this object that is placed on the stack. In addition, the assignment of oldTop at line 13 creates a reference to the top node, keeping it alive until the return from the function.
This is of interest because if any of these constraints is not true, the pop operation would not work. In particular, if one would naively implement a push-and-pop mechanism along these lines in a language like C++, and let the clients provide the object to be pushed, and returned that object to the clients when the pop occurred, the program would be wrong. This is because after fetching oldTop (line 13) and newTop (line 17) other threads could remove the top node, remove or push other nodes, then push the top node again. The compareAnd-Set would then succeed, even though newTop was no longer the correct new value. Similarly, if the implementation allocated a node in push, and freed it in pop, the program would be wrong because the freed-node storage might be reused in a subsequent push, leading to the same error.
The Java implementation also involves hidden costs, including allocation and garbage collection of the node objects and concurrency control required in the memory-allocation system to make it work. These costs must be considered, as they are essential to the correctness of the program. Be warned about not using apparently identical algorithms that do not satisfy the hidden constraints.
Yorktown Heights, NY
Displaying all 7 comments