February 25, 2016

Binary search tricks

Probably array is the most popular collection of data. Everyone have used, is using and will use it. If array is sorted, this means you can simplify your life while processing array. I want to share some simple, but effective trick with binary search algorithm.

Imagine you have sequence of N numbers which splits space for N + 1 segments. Two of them on the edges are infinite. For instance, this is fuzzy thermometer.

The idea is to display single word described weather. In other words, this is function which accepts temperature as an argument (Celsius in this example) and returns string as a single word.

The simplest solution may looks like this:

The algorithm is - iterate over an array and get label by index of first suitable element. It works because of our array is sorted, and we know about this. Complexity of this method is linear - O(n). It's not a problem for 4 values in array, but in general it's not a good solution, everyone should keep this in mind.

Well, let's add another method which is much better that first one:

We're trying to find index of given value in array. What if this value is missed in array? The binarySearch() method will return value of -(insertion point) - 1, where insertion point is defined as the point at which the key would be inserted into the array: the index of the first element greater than the key, or a.length if all elements in the array are less than the specified key - exactly what we need!

Thus, we can check if the result is negative, and than invert it to positive value we needed. After that just return the label by index. The complexity of binary search algorithm is logarithmical - O(logn). This means, even with 1 million of elements you find result by not more than 20 iterations.

Let's test this new method:

Output is here:

We can see both methods works correctly. But second method looks much more elegant and works extremely faster on large arrays! Source code is available here. Thanks for reading!

NOTE: In case of exact matching the value with one from array, we associate it with left category (e.g. -10 is treated like 'ice'). If you want to associate it with right category you need to replace operator  <=  by operator  <  inside getLabel1(), and replace  binarySearch(values, value)  by  binarySearch(values, value + 1)  inside getLabel2(). In this case -10 will be treated like 'frosty'.