codelessgenie guide

How to Build a Custom Data Structure from Scratch in Python

Data structures are the backbone of efficient programming, enabling us to store, organize, and manipulate data effectively. Python provides a rich set of built-in data structures like lists, dictionaries, and sets, but there are scenarios where these tools fall short. For example, if you need faster insertion/deletion than a list, or more flexible indexing than a dictionary, a **custom data structure** might be the solution. Building a custom data structure from scratch is not just a coding exercise—it deepens your understanding of how data structures work under the hood and equips you to solve specialized problems. In this blog, we’ll walk through creating a **Skip List**—a probabilistic data structure that offers average O(log n) time complexity for search, insertion, and deletion (comparable to balanced trees like AVL or Red-Black trees but simpler to implement).

Table of Contents

  1. Understanding Data Structures Basics
  2. Why Build a Custom Data Structure?
  3. Choosing a Custom Data Structure: Skip List
  4. Planning the Skip List Structure
  5. Implementing Core Components
    • 5.1 Node Class
    • 5.2 SkipList Class Initialization
    • 5.3 Random Level Generation
    • 5.4 Insert Method
    • 5.5 Search Method
    • 5.6 Delete Method
  6. Adding Advanced Features
    • 6.1 String Representation (__str__)
    • 6.2 Handling Duplicates
  7. Testing the Skip List
  8. Optimizations and Considerations
  9. Conclusion
  10. References

1. Understanding Data Structures Basics

Before diving into custom structures, let’s recap foundational concepts:

  • Data Structure: A specialized format for organizing, processing, retrieving, and storing data.
  • Key Operations: Insertion, deletion, search, traversal, and sorting.
  • Performance Metrics: Time complexity (worst/average case) and space complexity.

Python’s built-in structures have tradeoffs:

  • Lists: O(1) access by index, but O(n) insertion/deletion in the middle.
  • Dictionaries: O(1) average search/insert, but unordered (prior to Python 3.7) and no inherent ordering for range queries.
  • Sets: O(1) membership checks, but no duplicates and unordered.

When these tradeoffs don’t align with your needs (e.g., fast ordered search/insert), building a custom structure becomes necessary.

2. Why Build a Custom Data Structure?

You might build a custom data structure if:

  • Performance: Built-in structures are too slow for your use case (e.g., high-frequency trading needs sub-millisecond lookups).
  • Specialized Behavior: You need unique operations (e.g., a priority queue with dynamic reordering).
  • Educational Purpose: To deepen understanding of algorithms and data structure internals.

3. Choosing a Custom Data Structure: Skip List

For this tutorial, we’ll implement a Skip List—a probabilistic data structure invented by William Pugh in 1989. It balances the simplicity of linked lists with the performance of balanced trees (e.g., AVL trees).

Why Skip Lists?

  • Average O(log n) time for search, insertion, and deletion (same as balanced trees).
  • Simplicity: Easier to implement than balanced trees (no complex rotation logic).
  • Ordered: Maintains elements in sorted order, enabling range queries.

Use Cases: Databases (e.g., Redis uses Skip Lists for sorted sets), in-memory caches, and applications requiring fast ordered operations.

4. Planning the Skip List Structure

A Skip List consists of multiple levels of linked lists. The bottom level (level 0) is a sorted singly linked list. Higher levels act as “express lanes” for faster traversal, skipping over many nodes.

Key Components:

  • Node: Stores a value and an array of forward pointers (one per level the node appears in).
  • Header: A dummy node at the start of the list to simplify traversal.
  • Max Level: The maximum number of levels allowed (prevents excessive memory usage).
  • Probability: The chance a node is promoted to a higher level (typically 0.5).

5. Implementing Core Components

Let’s build the Skip List step by step.

5.1 Node Class

Each node contains a value and a forward array (pointers to the next node at each level).

class Node:
    def __init__(self, value, level):
        self.value = value  # Value stored in the node
        self.forward = [None] * level  # Forward pointers for each level

5.2 SkipList Class Initialization

The SkipList class manages the overall structure, including the header, max level, and probability.

import random

class SkipList:
    def __init__(self, max_level=16, probability=0.5):
        self.max_level = max_level  # Maximum number of levels
        self.probability = probability  # Probability of promoting a node to higher level
        self.header = Node(value=None, level=max_level)  # Dummy header node
        self.level = 1  # Current highest level with nodes (starts at 1)

5.3 Random Level Generation

Nodes are promoted to higher levels randomly to maintain balance. The random_level method generates a level for a new node:

    def random_level(self):
        level = 1
        # Flip a coin (probability) until we exceed max_level or fail the coin flip
        while random.random() < self.probability and level < self.max_level:
            level += 1
        return level

Why? This ensures higher levels have exponentially fewer nodes, keeping traversal efficient.

5.4 Insert Method

To insert a value:

  1. Traverse from the highest level down to level 0, tracking nodes where the new node will be inserted (update array).
  2. Generate a random level for the new node.
  3. Insert the node by updating forward pointers in the update array.
    def insert(self, value):
        update = [None] * self.max_level  # Tracks nodes to update at each level
        current = self.header

        # Step 1: Traverse to find insertion point
        for i in range(self.level - 1, -1, -1):
            # Move forward while next node's value is less than target
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current  # Save current node as update for level i

        # Step 2: Generate random level for the new node
        new_level = self.random_level()

        # If new level is higher than current max level, update update and current level
        if new_level > self.level:
            for i in range(self.level, new_level):
                update[i] = self.header
            self.level = new_level

        # Step 3: Create new node and update forward pointers
        new_node = Node(value=value, level=new_level)
        for i in range(new_level):
            new_node.forward[i] = update[i].forward[i]  # New node points to update's next
            update[i].forward[i] = new_node  # Update points to new node

5.5 Search Method

To search for a value:

  1. Start at the highest level and traverse forward until the next node’s value exceeds the target.
  2. Move down a level and repeat until level 0.
  3. If the next node at level 0 has the target value, return it; otherwise, return None.
    def search(self, value):
        current = self.header

        # Traverse from highest level down to level 0
        for i in range(self.level - 1, -1, -1):
            # Move forward while next node's value is less than target
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]

        # Move to level 0's next node to check
        current = current.forward[0]

        # If found, return the value; else, None
        if current and current.value == value:
            return current.value
        return None

5.6 Delete Method

To delete a value:

  1. Traverse the list to find the node and populate the update array (similar to insertion).
  2. If the node exists, update the forward pointers of the update nodes to skip the deleted node.
    def delete(self, value):
        update = [None] * self.max_level
        current = self.header

        # Step 1: Traverse to find the node to delete
        for i in range(self.level - 1, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current

        # Check if the node exists at level 0
        current = current.forward[0]
        if current is None or current.value != value:
            print(f"Value {value} not found in the skip list.")
            return

        # Step 2: Update forward pointers to skip the node
        for i in range(self.level):
            # If update[i]'s next is not current, no change needed at this level
            if update[i].forward[i] != current:
                break
            update[i].forward[i] = current.forward[i]

        # Step 3: Update the current highest level if needed
        while self.level > 1 and self.header.forward[self.level - 1] is None:
            self.level -= 1

6. Adding Advanced Features

6.1 String Representation (__str__)

Visualizing the Skip List helps debugging. Add a __str__ method to print each level:

    def __str__(self):
        result = []
        for i in range(self.level):
            current = self.header.forward[i]
            level_str = f"Level {i}: "
            while current:
                level_str += f"{current.value} -> "
                current = current.forward[i]
            level_str += "None"
            result.append(level_str)
        return "\n".join(result)

6.2 Handling Duplicates

To allow duplicates, modify the insert method’s traversal condition from < to <= (insert duplicates after existing values):

# In insert(): Change the while loop condition to allow duplicates
while current.forward[i] and current.forward[i].value <= value:  # Use <= instead of <
    current = current.forward[i]

7. Testing the Skip List

Let’s test insertion, search, deletion, and visualization:

# Test the Skip List
if __name__ == "__main__":
    skip_list = SkipList(max_level=4)

    # Insert values
    values = [3, 6, 7, 9, 12, 19, 17, 26, 21, 25]
    for val in values:
        skip_list.insert(val)

    print("Skip List after insertions:")
    print(skip_list)
    # Output (example; levels may vary due to randomness):
    # Level 0: 3 -> 6 -> 7 -> 9 -> 12 -> 17 -> 19 -> 21 -> 25 -> 26 -> None
    # Level 1: 3 -> 7 -> 12 -> 19 -> 25 -> None
    # Level 2: 7 -> 19 -> None
    # Level 3: 19 -> None

    # Search test
    print("\nSearch for 12:", skip_list.search(12))  # Output: 12
    print("Search for 10:", skip_list.search(10))  # Output: None

    # Delete test
    skip_list.delete(12)
    print("\nSkip List after deleting 12:")
    print(skip_list)
    # Level 0: 3 -> 6 -> 7 -> 9 -> 17 -> 19 -> 21 -> 25 -> 26 -> None
    # (Higher levels may adjust if 12 was in them)

8. Optimizations and Considerations

  • Dynamic Max Level: Instead of a fixed max_level, adjust it based on the number of elements (e.g., max_level = log2(n) + 1).
  • Probability Tuning: Using a probability higher than 0.5 (e.g., 0.75) increases levels but may slow traversal; 0.5 is standard.
  • Memory Efficiency: Use a list of pointers per node instead of separate attributes to save space.

9. Conclusion

Building a Skip List from scratch demonstrates how custom data structures balance simplicity and performance. You’ve learned to:

  • Design core components (nodes, levels, header).
  • Implement key operations (insert, search, delete).
  • Handle randomness to maintain efficiency.

Next, explore other structures like B-trees, LRU caches, or segment trees to expand your toolkit!

10. References