Binary search coding test

| | No TrackBacks

The other day, I had an interview in which I was asked to "find an integer in a sorted array or integers if it was present." This is obviously a binary search, so I coded up this (but in C/C++):

def binarysearch(iter, val):
	lo, hi = 0, len(iter)-1
	while lo <= hi:
		mid = lo + (hi - lo) / 2
		if iter[mid] >= val :
			hi = mid - 1
		else:
			lo = mid + 1
	return (lo if iter[lo] == val else -1)

There are a couple of things that make this noteworthy:

  • Calculation of mid

    If you calculate the midpoint of the array in the more straightforward way (mid = (lo + hi) / 2), then you risk integer overflow when working with fixed sized integers.

  • Use of a single comparison in the loop

    It seems that you should actually test for an exact match on finding the element you are searching for. This is not so. More below.

  • Returning iter[lo] instead of iter[mid]

    It is not necessary to keep track of the last midpoint. When you fall out of the loop, either iter[lo] is the desired value or the value was not found in the array. If it is found, then in an array that has multiple instances of the value, lo is the location of the first occurrence.

It was on point two that the interview went a bit astray. The interviewer wanted to see code that looked more like this:

def binarysearch(iter, val):
	lo, hi = 0, len(iter)-1
	while lo <= hi:
		mid = lo + (hi - lo) / 2
		if iter[mid] == val :
			return mid
		elif iter[mid] > val:
			hi = mid - 1
		else:
			lo = mid + 1
	return -1

And that's all very well -- it's a correct algorithm -- but it's not as efficient. In my algorithm, there is only one comparison per loop. In the other, there is either one if there is a match or two on a failure. For comparison purposes and to make this concrete, let's imagine we have an array or 63 elements and we look for every element of the array.

Cost of my code

63 nodes searched * 6 comparisons per search
378 comparisons

Cost of other code

1 nodes searched * 1 comparisons = 1
2 nodes searched * 3 comparisons = 6
4 nodes searched * 5 comparisons = 20
8 nodes searched * 7 comparisons = 56
16 nodes searched * 9 comparisons = 144
32 nodes searched * 11 comparisons = 352
579 comparisons

So contrary to the expectations of the interviewer, it is actually more efficient to search to the end of the array every time using fewer comparisons than to search less of the array with more comparisons.

No TrackBacks

TrackBack URL: http://www.iwebthereforeiam.com/cgi-bin/mt/mt-tb.cgi/1506

I don't think this is a fair comparison; the two code snippets are doing things. Your algorithm is a lower bound, and the interviewer's algorithm is a strict binary search. As you stated above, the task was to find "an" integer in the array, not the first one.

Consider the case of n identical values in the array. Your algorithm will require log n iterations always. The interviewers will complete in 1 iteration if the value being searched for is in the array.

(BTW I tried your OpenID, couldn't get it to work...)

My code does not do anything extra to get the lower bound. In the most likely case, where the searched item is unique, this code is definitely better than the interviewer's solution. I tried out searching lists with substantial duplicates and the lower bound-style search outperforms up to around 30% duplicates.

Still your solution is INCORRECT. Try doing binarysearch([1], 2). Oops, IndexError: list index out of range. I wouldn't hire you.

Leave a comment

Verification (needed to reduce spam):

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.32-en

About this Entry

This page contains a single entry by Hugh Brown published on July 3, 2009 2:52 PM.

Metric stupidity was the previous entry in this blog.

Recent reading is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.