Broken Binary Search

December 28, 2010 at 4:59 pm | Posted in Programming | 1 Comment
Tags:

While randomly googling for binary search, I came across this article where author (I guess one of the employees of Google) says that nearly all binary searches are mergesorts are broken. I don’t know much about Java but here is what he is showing on his blog:

1:     public static int binarySearch(int[] a, int key) {
2:         int low = 0;
3:         int high = a.length - 1;
4:
5:         while (low <= high) {
6:             int mid = (low + high) / 2;
7:             int midVal = a[mid];
8:
9:             if (midVal < key)
10:                 low = mid + 1
11:             else if (midVal > key)
12:                 high = mid - 1;
13:             else
14:                 return mid; // key found
15:         }
16:         return -(low + 1);  // key not found.
17:     }

Author says line 6: int mid =(low + high) / 2; is broken because if sum of low and high is greater than INT_MAX then it will overflow. From author himself:

This bug can manifest itself for arrays whose length (in elements) is 2^30 or greater (roughly a billion elements). This was inconceivable back in the ’80s, when Programming Pearls was written, but it is common these days at Google and other places.

He is right and I am not posting this to say what has already been said. I am interested in line 3: int high = a.length – 1;

Like I said I don’t know much about Java and if we write same line of code in C, it will overflow if a.length is long, unsigned long or anything larger than what size_t type can handle. It will give strange results (UB or Undefined Behavior). If having more than size_t elements is common for places like Google then I think line number 3 needs to be the cause of concern too, not just line number 6. You can know size of array in C with these methods:

const char* arrc = “123”;
int arri[] = {3,2,1};

first is an array of characters while 2nd is an array of integers. You can use sizeof(arri) / sizeof(arri[0]) to know number of elements in arri. To know number of elements in arrc you can use either strlen(arrc) or sizeof(arrc) / sizeof(*arrc) – 1. Both sizeof and strlen() return size_t type which is unsigned int and hence does not fit in int type. I have commented this on that blog, lets see what author says.


Copyright © 2010 Arnuld Uttre, Village – Patti, P.O – Manakpur, Tehsil – Nangal, Distt. – Ropar, Punjab (INDIA)

Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.

Copyright does not belong to Java code. Please ask for permission from the author of the following blog before using mentioned Java code: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

1 Comment »

RSS feed for comments on this post. TrackBack URI

  1. I think the line with

    int high = a.length – 1;

    isn’t a real problem. Afaik Java arrays are indexed as integers, so you would never overflow there (in Java). I’m not sure about other programming languages, but at least in C++ using std::vector I believe you’d get a warning about a mismatch of types (long vs. int) as opposed to quiet acceptance (such as in line 6).. (I haven’t looked this up though — never needed such a big array 🙂 )


Leave a comment

Blog at WordPress.com.
Entries and comments feeds.