The other day, I had an interview in which I was asked to "find an integer in a sorted array or integers if it was present." This is obviously a binary search, so I coded up this (but in C/C++):

def binarysearch(iter, val): lo, hi = 0, len(iter)-1 while lo <= hi: mid = lo + (hi - lo) / 2 if iter[mid] >= val : hi = mid - 1 else: lo = mid + 1 return (lo if iter[lo] == val else -1)

There are a couple of things that make this noteworthy:

- Calculation of
*mid*If you calculate the midpoint of the array in the more straightforward way (

*mid = (lo + hi) / 2*), then you risk integer overflow when working with fixed sized integers. - Use of a single comparison in the loop
It seems that you should actually test for an exact match on finding the element you are searching for. This is not so. More below.

- Returning
*iter[lo]*instead of*iter[mid]*It is not necessary to keep track of the last midpoint. When you fall out of the loop, either

*iter[lo]*is the desired value or the value was not found in the array. If it is found, then in an array that has multiple instances of the value,*lo*is the location of the first occurrence.

It was on point two that the interview went a bit astray. The interviewer wanted to see code that looked more like this:

def binarysearch(iter, val): lo, hi = 0, len(iter)-1 while lo <= hi: mid = lo + (hi - lo) / 2 if iter[mid] == val : return mid elif iter[mid] > val: hi = mid - 1 else: lo = mid + 1 return -1

And that's all very well -- it's a correct algorithm -- but it's not as efficient. In my algorithm, there is only one comparison per loop. In the other, there is either one if there is a match or two on a failure. For comparison purposes and to make this concrete, let's imagine we have an array or 63 elements and we look for every element of the array.

### Cost of my code

63 nodes searched * 6 comparisons per search 378 comparisons

### Cost of other code

1 nodes searched * 1 comparisons = 1 2 nodes searched * 3 comparisons = 6 4 nodes searched * 5 comparisons = 20 8 nodes searched * 7 comparisons = 56 16 nodes searched * 9 comparisons = 144 32 nodes searched * 11 comparisons = 352 579 comparisons

So contrary to the expectations of the interviewer, it is actually more efficient to search to the end of the array every time using fewer comparisons than to search less of the array with more comparisons.

I don't think this is a fair comparison; the two code snippets are doing things. Your algorithm is a lower bound, and the interviewer's algorithm is a strict binary search. As you stated above, the task was to find "an" integer in the array, not the first one.

Consider the case of n identical values in the array. Your algorithm will require log n iterations always. The interviewers will complete in 1 iteration if the value being searched for is in the array.

(BTW I tried your OpenID, couldn't get it to work...)

My code does not do anything extra to get the lower bound. In the most likely case, where the searched item is unique, this code is definitely better than the interviewer's solution. I tried out searching lists with substantial duplicates and the lower bound-style search outperforms up to around 30% duplicates.

Still your solution is INCORRECT. Try doing binarysearch([1], 2). Oops, IndexError: list index out of range. I wouldn't hire you.