"Algo.4" is a precursor (older, 2011 version) of Brown et al.'s mentioned above. (I don't know how much better or worse the 2014 version is). "Algo.26" is Herlihy's mentioned above; as you can see it gets trashed on updates, and much worse on the Intel CPUs used here than on the Sun CPUs from the original paper. "Algo.28" is ConcurrentSkipListMap from the JDK; it doesn't do as well as one might have hoped compared to other CAS-based skip list implementations. The winners under high-contention are "Algo.2" a lock-based algorithm (!!) described by Crain et al. in "A Contention-Friendly Binary Search Tree" and "Algo.30" is the "rotating skiplist" from "Logarithmic data structures for
multicores". "Algo.29" is the "No hot spot non-blocking skip
list". Be advised that Gramoli is a co-author to all three of these winner-algorithm papers. "Algo.27" is the C++ implementation of Fraser's skip list.
A skip list is equivalent to a randomly balanced binary search tree (RBST) in the way that is explained in more detail in Dean and Jones' "Exploring the Duality Between Skip Lists and Binary Search Trees".
Contra to what some claim above, you can have implementations of binary search trees (BST) that work well in concurrent programming. A potential problem with the concurrency-focused BSTs is that you can't easily get the same had guarantees about balancing as you would from a red-black (RB) tree. (But "standard", i.e. randomzided, skip lists don't give you these guarantees either.) There's a trade-off between maintaining balancing at all times and good (and easy to program) concurrent access, so relaxed RB trees are usually used when good concurrency is desired. The relaxation consists in not re-balancing the tree right away. For a somewhat dated (1998) survey see Hanke's ''The Performance of Concurrent Red-Black Tree Algorithms'' [ps.gz].
EDIT to add: Pugh's lock-based skip list, which was benchmarked in Fraser and Harris (2007) "Concurrent Programming Without Lock" as coming close to their own lock-free version (a point amply insisted upon in the top answer here), is also tweaked for good concurrent operation, cf. Pugh's "Concurrent Maintenance of Skip Lists", although in a rather mild way. Nevertheless one newer/2009 paper "A Simple Optimistic skip-list Algorithm" by Herlihy et al., which proposes a supposedly simpler (than Pugh's) lock-based implementation of concurrent skip lists, criticized Pugh for not providing a proof of correctness convincing enough for them. Leaving aside this (maybe too pedantic) qualm, Herlihy et al. show that their simpler lock-based implementation of a skip list actually fails to scale as well as the JDK's lock-free implementation thereof, but only for high contention (50% inserts, 50% deletes and 0% lookups)... which Fraser and Harris didn't test at all; Fraser and Harris only tested 75% lookups, 12.5% inserts and 12.5% deletes (on skip list with ~500K elements). The simpler implementation of Herlihy et al. also comes close to the lock-free solution from the JDK in the case of low contention that they tested (70% lookups, 20% inserts, 10% deletes); they actually beat the lock-free solution for this scenario when they made their skip list big enough, i.e. going from 200K to 2M elements, so that the probability of contention on any lock became negligible. It would have been nice if Herlihy et al. had gotten over their hangup over Pugh's proof and tested his implementation too, but alas they didn't do that.
First, you cannot fairly compare a randomized data structure with one that gives you worst-case guarantees.
Gramoli's conclusion is that's much easier to screw-up a CAS-based concurrent tree implementation than it is to screw up a similar skip list. And based on the figures, it's hard to disagree. His explanation for this fact is:
I think it's perhaps also worth summarizing here the fundamental ideas
of the "no hot spot"/contention-friendly (CF) skip list. It addapts an essential idea from the relaxed RB trees (and similar concrrency friedly data structures): the towers are no longer built up immediately upon insertion, but delayed until there's less contention. Conversely, the deletion of a tall tower can create many contentions;
this was observed as far back as Pugh's 1990 concurrent skip-list paper, which is why Pugh introduced pointer reversal on deletion (a tidbit that Wikipedia's page on skip lists still doesn't mention to this day, alas). The CF skip list takes this a step further and delays deleting the upper levels of a tall tower. Both kinds of delayed operations in CF skip lists are carried out by a (CAS based) separate garbage-collector-like thread, which its authors call the "adapting thread".
One of the more recent improvements on these is the so-called chromatic tree (basically you have some weight such that black would be 1 and red would be zero, but you also allow values in between). And how does a chromatic tree fare against skip list? Let's see what Brown et al. "A General Technique for Non-blocking Trees" (2014) have to say:
Overriding this difficulty was a key concern in Brown et al.'s recent work.
They have a whole separate (2013) paper "Pragmatic Primitives for Non-blocking Data Structures"
on building multi-record LL/SC compound "primitives", which they call LLX/SCX, themselves implemented using (machine-level) CAS. Brown et al. used this LLX/SCX building block in their 2014 (but not in their 2011) concurrent tree implementation.
The Synchrobench code (including all algorithms tested) is available at: https://github.com/gramoli/synchrobench.
The latest Brown et al. implementation (not included in the above) is available at http://www.cs.toronto.edu/~tabrown/chromatic/ConcurrentChromaticTreeMap.java Does anyone have a 32+ core machine available? J/K My point is that you can run these yourselves.
The difficulty in designing a tree that is lock-free stems from
the difficulty of modifying multiple references atomically. Skip lists
consist of towers linked to each other through successor pointers and
in which each node points to the node immediately below it. They are
often considered similar to trees because each node has a successor
in the successor tower and below it, however, a major distinction is
that the downward pointer is generally immutable hence simplifying
the atomic modification of a node. This distinction is probably
the reason why skip lists outperform trees under heavy contention
as observed in Figure [above].
The other way around, you can also have deterministic skip lists which guarantee worst case performance, cf. Munro et al.
with 128 threads, our algorithm outperforms Javas non-blocking skiplist
by 13% to 156%, the lock-based AVL tree of Bronson et al. by 63% to 224%, and a RBT that uses software transactional memory (STM) by 13 to 134 times