Table of Contents
- Understanding Data Structures Basics
- Why Build a Custom Data Structure?
- Choosing a Custom Data Structure: Skip List
- Planning the Skip List Structure
- Implementing Core Components
- 5.1 Node Class
- 5.2 SkipList Class Initialization
- 5.3 Random Level Generation
- 5.4 Insert Method
- 5.5 Search Method
- 5.6 Delete Method
- Adding Advanced Features
- 6.1 String Representation (
__str__) - 6.2 Handling Duplicates
- 6.1 String Representation (
- Testing the Skip List
- Optimizations and Considerations
- Conclusion
- References
1. Understanding Data Structures Basics
Before diving into custom structures, let’s recap foundational concepts:
- Data Structure: A specialized format for organizing, processing, retrieving, and storing data.
- Key Operations: Insertion, deletion, search, traversal, and sorting.
- Performance Metrics: Time complexity (worst/average case) and space complexity.
Python’s built-in structures have tradeoffs:
- Lists: O(1) access by index, but O(n) insertion/deletion in the middle.
- Dictionaries: O(1) average search/insert, but unordered (prior to Python 3.7) and no inherent ordering for range queries.
- Sets: O(1) membership checks, but no duplicates and unordered.
When these tradeoffs don’t align with your needs (e.g., fast ordered search/insert), building a custom structure becomes necessary.
2. Why Build a Custom Data Structure?
You might build a custom data structure if:
- Performance: Built-in structures are too slow for your use case (e.g., high-frequency trading needs sub-millisecond lookups).
- Specialized Behavior: You need unique operations (e.g., a priority queue with dynamic reordering).
- Educational Purpose: To deepen understanding of algorithms and data structure internals.
3. Choosing a Custom Data Structure: Skip List
For this tutorial, we’ll implement a Skip List—a probabilistic data structure invented by William Pugh in 1989. It balances the simplicity of linked lists with the performance of balanced trees (e.g., AVL trees).
Why Skip Lists?
- Average O(log n) time for search, insertion, and deletion (same as balanced trees).
- Simplicity: Easier to implement than balanced trees (no complex rotation logic).
- Ordered: Maintains elements in sorted order, enabling range queries.
Use Cases: Databases (e.g., Redis uses Skip Lists for sorted sets), in-memory caches, and applications requiring fast ordered operations.
4. Planning the Skip List Structure
A Skip List consists of multiple levels of linked lists. The bottom level (level 0) is a sorted singly linked list. Higher levels act as “express lanes” for faster traversal, skipping over many nodes.
Key Components:
- Node: Stores a value and an array of
forwardpointers (one per level the node appears in). - Header: A dummy node at the start of the list to simplify traversal.
- Max Level: The maximum number of levels allowed (prevents excessive memory usage).
- Probability: The chance a node is promoted to a higher level (typically 0.5).
5. Implementing Core Components
Let’s build the Skip List step by step.
5.1 Node Class
Each node contains a value and a forward array (pointers to the next node at each level).
class Node:
def __init__(self, value, level):
self.value = value # Value stored in the node
self.forward = [None] * level # Forward pointers for each level
5.2 SkipList Class Initialization
The SkipList class manages the overall structure, including the header, max level, and probability.
import random
class SkipList:
def __init__(self, max_level=16, probability=0.5):
self.max_level = max_level # Maximum number of levels
self.probability = probability # Probability of promoting a node to higher level
self.header = Node(value=None, level=max_level) # Dummy header node
self.level = 1 # Current highest level with nodes (starts at 1)
5.3 Random Level Generation
Nodes are promoted to higher levels randomly to maintain balance. The random_level method generates a level for a new node:
def random_level(self):
level = 1
# Flip a coin (probability) until we exceed max_level or fail the coin flip
while random.random() < self.probability and level < self.max_level:
level += 1
return level
Why? This ensures higher levels have exponentially fewer nodes, keeping traversal efficient.
5.4 Insert Method
To insert a value:
- Traverse from the highest level down to level 0, tracking nodes where the new node will be inserted (
updatearray). - Generate a random level for the new node.
- Insert the node by updating
forwardpointers in theupdatearray.
def insert(self, value):
update = [None] * self.max_level # Tracks nodes to update at each level
current = self.header
# Step 1: Traverse to find insertion point
for i in range(self.level - 1, -1, -1):
# Move forward while next node's value is less than target
while current.forward[i] and current.forward[i].value < value:
current = current.forward[i]
update[i] = current # Save current node as update for level i
# Step 2: Generate random level for the new node
new_level = self.random_level()
# If new level is higher than current max level, update update and current level
if new_level > self.level:
for i in range(self.level, new_level):
update[i] = self.header
self.level = new_level
# Step 3: Create new node and update forward pointers
new_node = Node(value=value, level=new_level)
for i in range(new_level):
new_node.forward[i] = update[i].forward[i] # New node points to update's next
update[i].forward[i] = new_node # Update points to new node
5.5 Search Method
To search for a value:
- Start at the highest level and traverse forward until the next node’s value exceeds the target.
- Move down a level and repeat until level 0.
- If the next node at level 0 has the target value, return it; otherwise, return
None.
def search(self, value):
current = self.header
# Traverse from highest level down to level 0
for i in range(self.level - 1, -1, -1):
# Move forward while next node's value is less than target
while current.forward[i] and current.forward[i].value < value:
current = current.forward[i]
# Move to level 0's next node to check
current = current.forward[0]
# If found, return the value; else, None
if current and current.value == value:
return current.value
return None
5.6 Delete Method
To delete a value:
- Traverse the list to find the node and populate the
updatearray (similar to insertion). - If the node exists, update the
forwardpointers of theupdatenodes to skip the deleted node.
def delete(self, value):
update = [None] * self.max_level
current = self.header
# Step 1: Traverse to find the node to delete
for i in range(self.level - 1, -1, -1):
while current.forward[i] and current.forward[i].value < value:
current = current.forward[i]
update[i] = current
# Check if the node exists at level 0
current = current.forward[0]
if current is None or current.value != value:
print(f"Value {value} not found in the skip list.")
return
# Step 2: Update forward pointers to skip the node
for i in range(self.level):
# If update[i]'s next is not current, no change needed at this level
if update[i].forward[i] != current:
break
update[i].forward[i] = current.forward[i]
# Step 3: Update the current highest level if needed
while self.level > 1 and self.header.forward[self.level - 1] is None:
self.level -= 1
6. Adding Advanced Features
6.1 String Representation (__str__)
Visualizing the Skip List helps debugging. Add a __str__ method to print each level:
def __str__(self):
result = []
for i in range(self.level):
current = self.header.forward[i]
level_str = f"Level {i}: "
while current:
level_str += f"{current.value} -> "
current = current.forward[i]
level_str += "None"
result.append(level_str)
return "\n".join(result)
6.2 Handling Duplicates
To allow duplicates, modify the insert method’s traversal condition from < to <= (insert duplicates after existing values):
# In insert(): Change the while loop condition to allow duplicates
while current.forward[i] and current.forward[i].value <= value: # Use <= instead of <
current = current.forward[i]
7. Testing the Skip List
Let’s test insertion, search, deletion, and visualization:
# Test the Skip List
if __name__ == "__main__":
skip_list = SkipList(max_level=4)
# Insert values
values = [3, 6, 7, 9, 12, 19, 17, 26, 21, 25]
for val in values:
skip_list.insert(val)
print("Skip List after insertions:")
print(skip_list)
# Output (example; levels may vary due to randomness):
# Level 0: 3 -> 6 -> 7 -> 9 -> 12 -> 17 -> 19 -> 21 -> 25 -> 26 -> None
# Level 1: 3 -> 7 -> 12 -> 19 -> 25 -> None
# Level 2: 7 -> 19 -> None
# Level 3: 19 -> None
# Search test
print("\nSearch for 12:", skip_list.search(12)) # Output: 12
print("Search for 10:", skip_list.search(10)) # Output: None
# Delete test
skip_list.delete(12)
print("\nSkip List after deleting 12:")
print(skip_list)
# Level 0: 3 -> 6 -> 7 -> 9 -> 17 -> 19 -> 21 -> 25 -> 26 -> None
# (Higher levels may adjust if 12 was in them)
8. Optimizations and Considerations
- Dynamic Max Level: Instead of a fixed
max_level, adjust it based on the number of elements (e.g.,max_level = log2(n) + 1). - Probability Tuning: Using a probability higher than 0.5 (e.g., 0.75) increases levels but may slow traversal; 0.5 is standard.
- Memory Efficiency: Use a list of pointers per node instead of separate attributes to save space.
9. Conclusion
Building a Skip List from scratch demonstrates how custom data structures balance simplicity and performance. You’ve learned to:
- Design core components (nodes, levels, header).
- Implement key operations (insert, search, delete).
- Handle randomness to maintain efficiency.
Next, explore other structures like B-trees, LRU caches, or segment trees to expand your toolkit!
10. References
- Pugh, W. (1990). Skip Lists: A Probabilistic Alternative to Balanced Trees. Communications of the ACM.
- Python Official Documentation
- Redis Sorted Set Implementation (uses Skip Lists)
- Cormen, T. H., et al. (2009). Introduction to Algorithms (3rd ed.). MIT Press.