时三

我善良,并不代表我不邪恶.

Skip_list

  |  
 阅读次数

    Because the binary search depends on the characteristics of array random access, it can only be implemented with arrays. If the data is stored in a linked list, is it really impossible to use the binary search algorithm?

    In fact, we only need to modify the linked list to support a search algorithm similar to "binary search". We call the data structure after the transformation “Skip list”(跳表).

    It is a dynamic data structure with excellent performance in all aspects. It can support fast insertion, deletion, and search operations. It is not complicated to write, and can even replace Red-black trees.

How to understand Skip list

    For a single linked list, even if the data stored in the linked list is ordered, if we want to find a certain data in it, we can only traverse the linked list from beginning to end. This way the search efficiency will be very low, the time complexity will be very high, it is O(n).
1

    The two nodes extract a node to the upper level, and we refer to the extracted level as the index or index layer. The down in the figure represents the down pointer, pointing to the next level node.
2

    If we are looking for a node now, say 16 . We can traverse the index layer first. When traversing to the node with a value of 13 in the index layer, we find that the next node is 17, and the node 16 to be found must be between the two nodes. Then we descend to the original linked list layer by the down pointer of the index layer node and continue traversing. At this time, we only need to traverse 2 nodes, we can find the node with the value equal to 16. Thus, if you want to find 16 and need to traverse 10 nodes, you only need to traverse 7 nodes.

    From this example, we can see that after adding a layer of index, the number of nodes that need to traverse a node is reduced, which means that the search efficiency is improved. Then what if we add another level of index? Will efficiency increase more?

    Similar to the way the first level index is built before, we base the first level index on the two nodes and extract a node to the second level index. Now let’s look up 16 again, we only need to traverse 6 nodes, and the number of nodes that need to be traversed is reduced.
3

    The above example does not have a large amount of data, so even if a two-level index is added, the improvement in search efficiency is not obvious. The following figure is a linked list of 64 nodes. According to the above idea, a five-level index is established.
4

    As can be seen from the figure, when there is no index, the search 62 needs to traverse 62 nodes. Now only need to traverse 11 nodes, the speed is much improved! Therefore, when the length n of the linked list is relatively large, such as 1000 and 10000, the efficiency of the search will be very obvious after the index is built.

How fast is the query with Skip list?

    The execution efficiency of the algorithm can be measured by time complexity. We know that the time complexity of querying a data in a single linked list is O(n). So in Skip list with a multi-level index, what is the time complexity of querying a certain data?

    Let’s look at such a question. If there are n nodes in the list, how many levels of index will there be?

    According to the above, the number of nodes in the first level index is about n/2, the number of nodes in the second level index is about n/4, and the number of nodes in the third level index is about n/8. By analogy, that is, the number of nodes of the k-th index is 1/2 of the number of nodes of the k-1th index, and the number of the k-th index nodes is n / (2^k).

    Suppose the index has h levels, and the most advanced index has 2 nodes. By the above formula, n/(2^h)=2 can be obtained, thereby obtaining h=log2(n-1). If the original linked list layer is included, the height of the entire Skip list is log2n. When we query a certain data in the Skip list, if each layer has to traverse m nodes, the time complexity of querying a data in the Skip list is O(m * logn).

    What is the value of this m? According to the previous index structure, each level of our index only needs to traverse up to 3 nodes, that is, m = 3.

    Suppose the data we are looking for is x. In the k-th index, after we traverse to the y node, we find that x is greater than y and less than the subsequent z, so we drop from the k-th index through the down pointer of y. Go to the k-1 level index. In the k-1th index, there are only 3 nodes (including y and z) between y and z. Therefore, we only need to traverse 3 nodes in the k-1 level index, and so on, each level index only needs to traverse 3 nodes at most.
5

    Through the above analysis, we get m = 3. So the time complexity of querying arbitrary data in the Skip list is O(logn). The time complexity of this lookup is the same as the binary search. In other words, we actually implemented a binary search based on a single-linked list. However, the efficiency of this query is improved, provided that many levels of indexing are established.

Is the Skip list a waste of memory?

    Compared to a simple single linked list, Skip list needs to store multiple levels of index, which definitely consumes more storage space. How much extra storage does it take in the end?

    The spatial complexity analysis of Skip list is not difficult. Assuming that the original linked list size is n, the first level index has about n/2 nodes, the second level index has about n/4 nodes, and so on. When you increase the level, you cut it by half until you have 2 nodes left. If we calculate the number of nodes in each layer of index, it is a geometric sequence.
6

    The sum of the nodes of these levels of index iis n/2 + n/4 + n/8 + … + 8 + 4 + 2 = n-2. Therefore, the space complexity of the Skip list is O(n). That is, if we construct a single-linked list of n nodes into a Skip list, we need to use an additional storage space close to n nodes.

    If we have three or five nodes, draw a node to the superior index. Although the space complexity is still O(n), the index construction method of the two nodes above is to reduce the storage space of the index node by half.
7

Efficient dynamic insertion and deletion

    We know that in a single-linked list, once the position to be inserted is located, the time complexity of inserting the node is very low. It is O(1). However, in order to ensure the order of the data in the original linked list, we need to find the location to be inserted first, this search operation will be more time consuming.

    For a purely singly linked list, you need to iterate through each node to find the location of the insertion. However, for the Skip list, we have said that the time complexity of finding a node is O(logn), so here is to find the location where a certain data should be inserted. The method is similar, and the time complexity is also O(logn).
8

    If this node also appears in the index, in addition to deleting the nodes in the original linked list, we also delete the nodes in the index. Because the delete operation in the singly linked list needs to get the predecessor node of the deleted node, then the deletion is completed by the pointer operation. So when looking for the node to delete, be sure to get the precursor node. Of course, if we use a doubly linked list, we don’t need to consider this issue.

Skip list index update

    When we keep inserting data into the Skip list, if we don’t update the index, there may be a lot of data between the two index nodes. In extreme cases, Skip list will also degenerate into a singly linked list.
9
    As a dynamic data structure, we need some means to maintain the balance between the index and the original linked list size. That is to say, if there are more nodes in the linked list, the index nodes will be increased accordingly, avoiding complexity degradation, and performance degradation of lookup, insertion, and deletion operations.

    If you understand the balanced binary tree like red black tree and AVL tree, you know that they maintain the balance of the size of the left and right subtrees by rotating left and right, and Skip list maintains the aforementioned “balance” through random functions.

    When we insert data into the Skip list, we can choose to insert this data into the partial index layer at the same time. How to choose which index layer to join?

    We use a random function to determine which level of index to insert this node into. If the random function generates the value k, then we will add this node to the k-level index from the first level to the kth level.
10

    The choice of random function is very particular. From the probabilistic point of view, the index size and data size balance of Skip list can be guaranteed, and the performance is not degraded excessively.

Java code for Skip list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
public class SkipList {

private static final int MAX_LEVEL = 16;

private int levelCount = 1;

private Node head = new Node(); // 带头链表

private Random r = new Random();

public Node find(int value) {
Node p = head;
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
p = p.forwards[i];
}
}

if (p.forwards[0] != null && p.forwards[0].data == value) {
return p.forwards[0];
} else {
return null;
}
}

public void insert(int value) {
int level = randomLevel();
Node newNode = new Node();
newNode.data = value;
newNode.maxLevel = level;
Node update[] = new Node[level];
for (int i = 0; i < level; ++i) {
update[i] = head;
}

// record every level largest value which smaller than insert value in update[]
Node p = head;
for (int i = level - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
p = p.forwards[i];
}
update[i] = p;// use update save node in search path
}

// in search path node next node become new node forwords(next)
for (int i = 0; i < level; ++i) {
newNode.forwards[i] = update[i].forwards[i];
update[i].forwards[i] = newNode;
}

// update node hight
if (levelCount < level) levelCount = level;
}

public void delete(int value) {
Node[] update = new Node[levelCount];
Node p = head;
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
p = p.forwards[i];
}
update[i] = p;
}

if (p.forwards[0] != null && p.forwards[0].data == value) {
for (int i = levelCount - 1; i >= 0; --i) {
if (update[i].forwards[i] != null && update[i].forwards[i].data == value) {
update[i].forwards[i] = update[i].forwards[i].forwards[i];
}
}
}
}

// 随机 level 次,如果是奇数层数 +1,防止伪随机
private int randomLevel() {
int level = 1;
for (int i = 1; i < MAX_LEVEL; ++i) {
if (r.nextInt() % 2 == 1) {
level++;
}
}

return level;
}

public void printAll() {
Node p = head;
while (p.forwards[0] != null) {
System.out.print(p.forwards[0] + " ");
p = p.forwards[0];
}
System.out.println();
}

public class Node {
private int data = -1;
private Node forwards[] = new Node[MAX_LEVEL];
private int maxLevel = 0;

@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("{ data: ");
builder.append(data);
builder.append("; levels: ");
builder.append(maxLevel);
builder.append(" }");

return builder.toString();
}
}
}