Java has a lock-free concurrent skip list in the ConcurrentSkipListMap class [1]. Herlihy et al. claim that existing skip list algorithms (including the Java implementation) are complicated, and instead provide a supposedly faster implementation [2].
The two techniques they use are:
- Optimistic traversal of lists without grabbing a lock: Only when it arrives at the item being seeked, it grabs the lock and validates that the list is unchanged.
- Lazy deletion: marking a node as deleted and removing it from the list later.
Redis uses a skip list (doubly linked) implementation for sorted sets, here are antirez's (BDFL) comments as to why from 7 years ago:
There are a few reasons:
1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
Data structures in Redis have evolved a bit in the last 7 years. One of the main workhorses is indeed something called a "ziplist". The implementation (especially the format description at the top) is worth a read: https://github.com/antirez/redis/blob/90a6f7fc98df849a9890ab...
For those who are familiar with pre-2.6 Redis code, there used to be something called zipmap, but it was deprecated in favor of ziplist.
I have not looked at Redis code in eons, but I think it's not exactly true - all the structures are there, it's just that redis will favor ziplist for the very small ones.
A ziplist is a list of compact varying length integers (or strings) where only the bits that make a difference are stored (I contributed the 24-bit version to Redis[1]). The argument is that a sequential search over a small data structure always wins over more complex things like hash tables, skiplists, etc.
As you get more data, Redis will switch to using skiplists or whatever - there is a config setting for that (-max-ziplist-entries and -max-ziplist-value).
Elaborating on gtrubetskoy comment, ziplists are doubly linked lists (of integers and strings) encoded into an array of bytes that stores the size of both the next and previous element so you can quickly find the location of the next/previous element. This makes ziplists space efficient and have great cache efficient, but expensive to insert into since you need to reallocate the array and shift a potentially large number of elements over every time you inset/delete. These properties make ziplists useful when they are small, but not so as they get larger, so Redis will switch to different implementations once the ziplists reach a certain size.
Interestingly, Redis lists are implemented as a double linked list of ziplists, where the size of each ziplists is bounded. This gives you good caching as the quicklist is being traversed, while limiting the cost of an insertion. Redis will also compress ziplists not near the edge in order to save space.
Disk access (reads and writes) still happen in chunks, even with SSDs. It's much more efficient to write entire chunks of data out than to write a few hundred bytes over a dozen different chunks.
The requirements for writing out a B-Trees to disk is in line with writing out chunks of data, which makes them very popular (and performant) with database implementations.
And this doesn't cover how performant CPUs are at performing operations over a contiguous list of data. The complexity may be O(n), but the implementation introduces a fractional constant which can outperform a O(log(n)) algorithm which involves cache misses.
> Disk access (reads and writes) still happen in chunks, even with SSDs. It's much more efficient to write entire chunks of data out than to write a few hundred bytes over a dozen different chunks.
To add, also memory access happens in chunks. Currently chunk (cache line) size is 64 bytes on almost all x86 and ARM CPUs. It's the smallest quantity of data you can read from or write to RAM.
CPUs can also efficiently prefetch when they detect sequential access patterns, so accessing large contiguous blocks of memory is preferable to achieve high performance.
It's a virtual linear log file which gets split occasionally by appending a sorted list of pointers to the end and "closing" that log file, along with occasional tasks repacking the log files together.
Individual records (either key/value or key/tombstone) have a pointer showing how far back they cast a "shadow" so you know how long you need to keep tombstones for.
Since only the active log file can ever have changes made, you can repack any two adjacent files together into a single compacted log file at any time, removing any duplicates or tombstones that don't cast a shadow outside the range being compacted.
While I really do like this article and the really interesting techniques it showed me, I'm bothered by the fact that they constantly mention performance but there's not a single benchmark on that page. It's not that I don't believe that these are huge theoretical gains, it's just that I'd like to see how these perform in practice compared to other approaches.
Skip lists are a hidden gem! I first learned about them from a GitHub developer at OSCon a few years ago, and at time they seemed like black magic.
It's a shame that universities (or at least mine) don't teach more interesting algorithms like this... it would make the algorithms/data structures coursework so much more interesting.
It was part of my normal university computer science data structures course. Was actually one of the structures we had to implement. Also, for all the "hard to get right" comments I remember them being much easier to write than RB Trees or B Trees.
Georgia Tech definitely teaches it in the intro data structures and algorithms course. I don't remember if we had to implement it, but it definitely came up in lecture.
WUSTL does teach Skip List. We spent a good two weeks learning about it. Intro, analysis, lab, test about Skip List. It's very cool and hard to get right.
UT Austin's multicore computing course had an optional project to benchmark concurrent skip lists or implement faster skip list algorithms. My group aimed to improve element access times by boosting the height of frequently accessed elements.
Ideas like SkipNet are what differentiates a hack like me from genuinely smart people who can look at the definition of in-memory data structure and spot properties useful to an Internet-sized system
Skip lists are probably the easiest way of getting O(log(n)) lookups on ordered lists.
Recently I got rid of huge bottleneck on an oldish piece of software by moving from a vanilla linked list to a skip list. And I did do many things suggested in the article (e.g. having a vector of fixed length for the pointers, and thus a fixed tallness).
Funnily enough, for my case, I managed to do without a RNG just fine. I just have an element that is of tallness 'k' every 2^(k*3) insertions (e.g. every 8th insertion is 1 level tall; every 64th insertion is 2 levels tall; and so on).
For my particular pattern of insertions, this proved to be more than enough, and and simplified things a little bit.
> Skip lists are probably the easiest way of getting O(log(n)) lookups on ordered lists.
I wouldn't say that. The algorithms for AVL/red-black/splay trees are not especially complex, and they contain much less subtlety than skip lists. If you just look them up on wikipedia and see the list of operations, it's not particularly hard to implement them. Skip lists, on the other hand, are very easy to get wrong (this is the premise of the linked article, after all).
Using a skip list is frequently the better choice for performance and memory reasons (especially if you're iterating through the list), but they are not necessarily simpler data structures.
I did a brief survey before deciding to go with skip lists, and found them to be easier to grasp and reason about.
(e.g. there is no need to balance trees; a skip list is very similar to a linked list, and linked lists are very simple).
I remember somebody saying that "in a sane world skip lists would always have been discovered before red-black trees".
This historical accident (skip lists were only discovered 1989; while rb-trees date back to the 1970s) is probably the reason why skip list use is not more widespread, and seen as somewhat exotic at times.
Binary search tree rotations are hard to implement correctly, but relatively easy to test that you have them right.
There are a lot of subtleties in any stochastic algorithm (like a skip list) that are much harder to correctly test.
While we're discussing "easy to implement" algorithms, I find Tries (aka radix trees) to be by far the easiest to implement for both ordered and unordered sets/maps. Naive implementations are very space inefficient though.
> The algorithms for AVL/red-black/splay trees are not especially complex
and can be coded once, then used in millions of programs. That whole point is largely moot.
A red-black tree gives you certain guarantees without any probabilistic arguments. Wikipedia: A skip list does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure.
A skip list algorithm which guards against the worst case will necessarily have to do some rebalancing that requires nodes to move among levels according to some heuristic.
Skip lists are not storage efficient unless the nodes are variably sized. If each node includes enough pointers to participate in any level, then you waste space. Variable size means we cannot draw nodes from a single, compact pool array of nodes for a skip list. Variable size also creates difficulties if we want to use this as an "intrusive container". (A container which works by inserting the necessary node links and other fields into the client's data structure, so that the client data structures directly link into the graph as nodes, rather than being separately allocated and referenced from it.)
This uses the C struct hack for variable node sizing, which makes it rely on malloc for the nodes. A client cannot pass in a pre-allocated node.
A mistake in this type of code can lead to a buffer overflow: accessing the n-th level of a node which only has k < n levels worth' of pointers.
If logic were added which moves a node from one skip level to another, that would require the entire node to be realloc'ed, if its number of links needs to increase. Realloc-ing can change its address; all existing pointers to the node must be rewritten.
Another consideration regarding memory: a binary tree with no parent pointers has two pointers per node. (Traversal then needs temp space due to to recursion or iteration with explicit stack.)
However, a binary tree can be traversed in O(n) in both directions.
A skip list with a singly-linked level zero cannot be traversed in reverse.
If we make it doubly linked to meet the requirement for reverse traversal then ... it has two pointers per node and then some. There goes your storage advantage.
Actually, you can. As per the article, you can view the skiplist as a tree when turned 1/8th.
So you can start at the head, put it on the stack, move to the next node referenced at the hightest level, put it on the stack, etc. just like you can start from the top of the tree, put it on the stack, move right, put it on the stack, etc.
The advantage of the skiplist over the tree is that you can traverse forward in O(1) space. But both are equally bad in reverse
The pseudocode + example given for the initial implementation of 'insert' doesn't seem right, no 'below' connection is made between the two '13' nodes.
The way to fix this would be to let 'insert' return the node inserted at the given level. There already are a few returns in the 'insert' pseudocode right now, but it doesn't seem like anything is being returned right now.
Something like this:
-- Recursive skip list insertion function.
define insert(elem, root, height, level):
if right of root < elem:
return insert(elem, right of root, height, level)
else:
if level = 0:
new ← makenode elem
old ← right of root
right of root ← new
right of elem ← old
return new
else:
if level ≤ height:
new ← makenode elem
old ← right of root
right of root ← new
right of elem ← old
below new ← insert(elem, below root, height, level - 1)
return new
else:
return insert(elem, below root, height, level - 1)
Or more simply:
define insert(elem, root, height, level):
if right of root < elem:
return insert(elem, right of root, height, level)
elif level > height:
return insert(elem, below root, height, level - 1)
else:
new ← makenode elem
old ← right of root
right of root ← new
right of elem ← old
if level > 0:
below new ← insert(elem, below root, height, level - 1)
return new
That first chunk will get split, 3 will get copied to new chunk, and list is restructured. I think the selling point here is you paid this price but get dividends on reduced cache misses in the future.
Interesting. Whether that approach is efficient depends entirely on the workload. That complicates the analysis. (And one of the major benefits of skiplists is that the analysis is supposed to be simple.)
I just came across skip lists as part of a sliding-window aggregator in Apache Samza's metrics system[0]. So cool! Being able to work with a range of values in somewhere between O(1)and O(log n) time, while still providing ordered key/value mappings, is great. Dug through java.util.concurrent looking for more structures, but this seems to be the most interesting of the bunch.
Something to note: you state that BST's have logarithmic time in order successor, but many implementations use threaded binary trees which provide an amortized constant successor operation. Great article though.
It sounds like they mean that, if you have all the data you want for your skiplist and all you need going forward is a read-only data structure for doing lookups, then you can put your data in order in a regular list and then build up the towers of skip links in deterministic fashion to get an optimally balanced distribution.
Doesn't seem like much of a trick, though. If you can do that, then you can just stick it all in an array and bsearch to find things.
I initially thought it might be referring to deterministic skip lists (Munro, Papadakis, Sedgewick, http://www.ic.unicamp.br/~celio/peer2peer/skip-net-graph/det...), but the "linking on traversal" part stumps me. Maybe it's referring to the top-down approach in section 3 of the paper.
A skip list node contains one key and has a random fanout; a B-tree node contains 1-N keys and has fanout equal to <number of keys> - 1.
A B-tree is guaranteed amortized log(N) lookup/insert/delete time, a skip-list has a high probability of a log(N) lookup/insert/delete time, but does not guarantee it.
The advantage of skip lists over b-trees is that they can have a smaller constant-factor, and are more cache friendly under many workloads.
Another nice thing about skip lists is that you can traverse them backwards without adding backpointers with a recursive algorithm and an O[logn] size stack.
I always say college was very practical learning as at different points in time I was able to kill a mouse with both algorithms (this very book) and computer architecture.
In Lisp, most people would just, you know, not use a list for that. You can do that, you know...
The only reason I asked about lispers is that lispers really like lisps, and I was curious if any of them had implemented skip lists for some reason, and if so, how.
Yes, I know. Do you? You made the suggestion that skip lists are inferior to ordered lists, not me. (good for "increasing access times on ordered lists") I'm still perplexed as to your rationale for that statement.
This was my understanding of that comment. How else could that be read? I am not able to think of another interpretation of that statement or a context when increased access times would be desired.
Skip lists are container data structures for keeping items in order, an alternative to various balanced trees like red-black.
The funny thing is how rarely you need that sort of thing if you have an unordered associative container, and a sorting function.
It's possible to combine a hash table with a list. E.g. the hash table references not the items directly, but the conses of a list in which those items are stored as the cdr. Then you have keyed random access to the list, as well as a particular order. Finding a spot where to insert is linear, of course.
A container which keeps items in order is useful primarily for real-time programming, for implementing a priority queue: the situation is that a dynamic set of items is in constant churn, and needs to be in order all the time (where the order is different from the order in which those items come to life or arrive or whatever). For performance, we don't want to just add an item to the end of the queue and then call sort.
If we don't care about this real time aspect, we can just accumulate items and then sort them. Even if we sort a list several times, that may not be of consequence if it is bounded to a small length. Unix treats directories like this: the "dents" are not ordered in any way: globbing expansion and the "ls" utility accumulate names and sort them. Unix could keep directories in sorted order (and there are, I think, some file systems which actually do).
There is an anti-pattern of abusing sorting when people don't have an associative container. An example of this is shell scripts which use sorting to help de-duplicate items, instead of an obvious awk job with an associative array. Another anti-pattern occurred in C++: for long time C++ had only std::map and std::set: ordered containers. These were used all the time, even when order was not required (Now there is std::unordered_map).
In the late 1990's, I wrote a pretty nice red-black tree data structure in C, which ended up used in various places, such as in ext2fs tools. one company I worked for used it for the basis of a "real-time database" for software running on a switch node for real time packet inspection and processing. The implementation had to be modified to allow for shared memory.
I don't miss that kind of container in everyday Lisp programming somehow. I have a several years old branch where that code is half-added to my Lisp dialect. There isn't much motivation to finish the job; the need for that just somehow doesn't arise. Most of the time, either you're dealing with dynamic sets which are unordered (by definition of "set", practically), or else with lists which are ordered, but don't have to be constructed by out-of-order insertions. A lot of the time, the order of a list doesn't actually follow any predicate: the only specification of order is that list instance itself and its actual order. (The order came from somewhere, but its reason is not necessarily known, only that it should be preserved.)
ANSI CL doesn't have any optimized ordered list container, only hashes. Why? It seems like nothing notable of that sort originated in any of the base dialects from which CL was formed.
Suppose you needed some real-time database for some service application: it would make a lot of sense to just pick one and make Lisp bindings to it.
The two techniques they use are:
- Optimistic traversal of lists without grabbing a lock: Only when it arrives at the item being seeked, it grabs the lock and validates that the list is unchanged.
- Lazy deletion: marking a node as deleted and removing it from the list later.
[1] https://docs.oracle.com/javase/8/docs/api/java/util/concurre...
[2] http://people.csail.mit.edu/shanir/publications/LazySkipList...