Table of Contents
- How Binary Search Works: Core Logic
- 1.1 Prerequisites: Sorted Data
- 1.2 Step-by-Step Example
- Basic Implementations
- 2.1 Iterative Binary Search
- 2.2 Recursive Binary Search
- Time and Space Complexity Analysis
- Common Pitfalls to Avoid
- 4.1 Integer Overflow in Mid Calculation
- 4.2 Handling Duplicates
- 4.3 Off-by-One Errors
- Optimization Techniques
- 5.1 Mid Calculation: Avoid Overflow
- 5.2 Pre-Sorting for Multiple Searches
- 5.3 Interpolation Search (for Uniform Data)
- Advanced Use Cases
- 6.1 Finding the First/Last Occurrence of a Target
- 6.2 Searching in Rotated Sorted Arrays
- 6.3 Finding Square Roots
- Conclusion
- References
How Binary Search Works: Core Logic
Binary search is a divide-and-conquer algorithm that repeatedly divides a sorted dataset into two halves, eliminating the half that cannot contain the target value.
1.1 Prerequisites: Sorted Data
Binary search only works on sorted data. If the input is unsorted, you must sort it first (e.g., with Quicksort or Mergesort) before applying binary search. Sorting takes O(n log n) time, but binary search shines when you need to perform multiple searches on the same dataset (amortizing the sorting cost).
1.2 Step-by-Step Example
Let’s search for the target 23 in the sorted array: [2, 5, 8, 12, 16, 23, 38, 56, 72, 91].
Step 1: Initialize Pointers
low = 0(start of the array)high = len(arr) - 1 = 9(end of the array)
Step 2: Calculate Midpoint
Compute mid = (low + high) // 2 (integer division). For our array:
mid = (0 + 9) // 2 = 4. The element at mid is arr[4] = 16.
Step 3: Compare Mid with Target
- If
arr[mid] == target: Found! Returnmid. - If
arr[mid] < target: Target must be in the right half. Updatelow = mid + 1. - If
arr[mid] > target: Target must be in the left half. Updatehigh = mid - 1.
Here, 16 < 23, so set low = 4 + 1 = 5.
Step 4: Repeat Until Found or Exhausted
Now, low = 5, high = 9. Compute mid = (5 + 9) // 2 = 7. arr[7] = 56, which is > 23. Set high = 7 - 1 = 6.
Next iteration: low = 5, high = 6. mid = (5 + 6) // 2 = 5. arr[5] = 23 (target found!). Return mid = 5.
Basic Implementations
2.1 Iterative Binary Search
The iterative approach uses a loop to narrow the search space, avoiding recursion overhead.
def binary_search_iterative(arr, target):
low = 0
high = len(arr) - 1
while low <= high:
mid = low + (high - low) // 2 # Avoids overflow (explained later)
if arr[mid] == target:
return mid # Target found
elif arr[mid] < target:
low = mid + 1 # Search right half
else:
high = mid - 1 # Search left half
return -1 # Target not found
Why low <= high? The loop continues as long as there’s a valid subarray to search. When low > high, the target is not present.
2.2 Recursive Binary Search
The recursive approach splits the problem into smaller subproblems, with a base case to terminate recursion.
def binary_search_recursive(arr, target, low, high):
# Base case: target not found
if low > high:
return -1
mid = low + (high - low) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
return binary_search_recursive(arr, target, mid + 1, high) # Right half
else:
return binary_search_recursive(arr, target, low, mid - 1) # Left half
# Usage: binary_search_recursive(arr, target, 0, len(arr)-1)
Note: Recursion depth equals O(log n), which is safe for large n (e.g., log₂(1e9) ≈ 30). However, iterative is preferred for memory efficiency (no stack overhead).
Time and Space Complexity Analysis
Time Complexity: O(log n)
Each iteration/recursion halves the search space. For an array of size n, the maximum number of steps is log₂(n) (e.g., n=10 → log₂(10) ≈ 3.32 steps).
Space Complexity
- Iterative: O(1) (constant extra space for
low,high,mid). - Recursive: O(log n) (due to the call stack).
Common Pitfalls to Avoid
4.1 Integer Overflow in Mid Calculation
A classic mistake is computing mid as (low + high) // 2. If low and high are large (e.g., low = 1e9, high = 1e9), low + high may exceed the maximum integer value (overflow), causing incorrect results.
Fix: Use mid = low + (high - low) // 2. This avoids overflow by calculating the offset from low instead of summing low and high.
4.2 Handling Duplicates
If the array has duplicates (e.g., [2, 5, 5, 5, 8]), binary search may return any occurrence of the target, not necessarily the first or last.
Example: Searching for 5 in [2, 5, 5, 5, 8] could return index 1, 2, or 3. To find the first/last occurrence, we need modified logic (see Section 6.1).
4.3 Off-by-One Errors
- Forgetting to update
low/highcorrectly (e.g.,low = midinstead oflow = mid + 1). - Using
low < highinstead oflow <= high, which may skip the final validmid.
Optimization Techniques
5.1 Mid Calculation: Avoid Overflow
As discussed, mid = low + (high - low) // 2 is safer than (low + high) // 2. For example:
low = 1e9,high = 1e9:(low + high) = 2e9(may overflow 32-bit int), butlow + (high - low)//2 = 1e9(safe).
5.2 Pre-Sorting for Multiple Searches
If you need to search the same dataset multiple times, pre-sort it once (O(n log n) time) and then perform O(log n) searches. This is more efficient than linear search for large n and many queries.
5.3 Interpolation Search (for Uniform Data)
For uniformly distributed sorted data (e.g., [10, 20, 30, 40, 50]), interpolation search estimates the target’s position using:
mid = low + ((target - arr[low]) * (high - low)) // (arr[high] - arr[low])
This can achieve O(log log n) time in the best case but degrades to O(n) for skewed data. Use it only when data is uniform.
Advanced Use Cases
6.1 Finding the First/Last Occurrence of a Target
To handle duplicates, modify binary search to find the first or last occurrence:
First Occurrence
def find_first_occurrence(arr, target):
low = 0
high = len(arr) - 1
result = -1
while low <= high:
mid = low + (high - low) // 2
if arr[mid] == target:
result = mid # Track potential result
high = mid - 1 # Search left for earlier occurrences
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return result
Last Occurrence
Similar, but update low = mid + 1 when arr[mid] == target to search the right half.
6.2 Searching in Rotated Sorted Arrays
A rotated sorted array (e.g., [4, 5, 6, 7, 0, 1, 2]) has a pivot point where the order resets. To search here:
- Find the pivot (smallest element).
- Search the left or right subarray based on the target.
def search_rotated(arr, target):
low, high = 0, len(arr) - 1
while low <= high:
mid = low + (high - low) // 2
if arr[mid] == target:
return mid
# Left half is sorted
if arr[low] <= arr[mid]:
if arr[low] <= target < arr[mid]:
high = mid - 1
else:
low = mid + 1
# Right half is sorted
else:
if arr[mid] < target <= arr[high]:
low = mid + 1
else:
high = mid - 1
return -1
6.3 Finding Square Roots
Binary search can approximate the integer square root of a number x by searching in [0, x]:
def sqrt_binary_search(x):
if x == 0:
return 0
low, high = 1, x
while low <= high:
mid = low + (high - low) // 2
square = mid * mid
if square == x:
return mid
elif square < x:
low = mid + 1
else:
high = mid - 1
return high # Floor of the square root
Conclusion
Binary search is a workhorse algorithm for efficient data retrieval, with O(log n) time complexity that outperforms linear search for large datasets. By mastering its core logic, avoiding pitfalls like overflow, and adapting it to edge cases (duplicates, rotated arrays), you can solve a wide range of problems.
Key takeaways:
- Always ensure data is sorted before using binary search.
- Use
mid = low + (high - low) // 2to avoid integer overflow. - Iterative implementations are preferred for memory efficiency.
- Adapt binary search to find first/last occurrences or work with rotated data.
References
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.
- Wikipedia. (2023). Binary search algorithm.
- GeeksforGeeks. (2023). Binary Search - Data Structures and Algorithms Tutorials.
- LeetCode. (2023). Binary Search Problems.