# Using the MATCH Function for a Linear Search and a Binary Search

The syntax of the Excel MATCH function is MATCH(lookup_value, lookup_array, [match_type]).  The 3rd argument of the MATCH function, match_type, is a number that can be 1, 0, or -1.  When it is 1 or -1 the range being searched (lookup_array) should be sorted and Excel carries out a binary search.  For a value of 1 the data should be sorted in ascending order and the returned result is the index into lookup_array corresponding to the largest value that is less than or equal to the lookup_value.  For a value of -1 the lookup_array should be in descending order and the returned index refers to the smallest value that is greater than or equal to the lookup_value.  When the 3rd argument is 0, the data do not need to be sorted and Excel executes a linear search, returning the index of an exact match.  In all cases, if no valid index satisfies the criterion MATCH returns the error #N/A.

Of interest are the performance aspects of the linear and the binary search.  Is the performance of a binary search sufficiently superior to a linear search that even when we want an exact match, it justifies sorting a large data set and using an additional formula to simulate an exact match?  Essentially something along the lines of the following: Suppose a 100,000 element unsorted lookup_array is in B with the lookup values in C starting with row 1.  Then, after sorting lookup_array in ascending order, enter in D1 =MATCH(C1,\$B\$1:\$B\$100000,1) and in E1 =IF(INDEX(\$B\$1:\$B\$100000,D1)=C1,D1,NA()).  Cell E1 will now have the same result as if we had used =MATCH(C1,\$B\$1:\$B\$100000,0).

Testing the MATCH function

Expectation and Result Summary

The Exact Match ‘Workaround’

The Result Details

The Code

## Testing the MATCH Function

To get started, I tested the MATCH function in three configurations:

1)       a binary search, i.e., match_type=1,

2)       a linear search, i.e., match_type=0, with the lookup_array in sorted order, and

3)       a linear search with the lookup_array containing unsorted data.

For each of those configurations, I tested 3 sets of lookup values, where each set had 10,000 values generated randomly.  The reason for using three data sets was to work around the possibility that a particular data set was somehow biased.  For each of these sets, I timed how long it took to enter the MATCH formula in all the relevant cells in one step and have Excel calculate the results.  I repeated this last step 3 times to identify any wrinkles in the process of entering the formula.

## Expectation and Result Summary

What did I expect?  Well, the performance of the binary search is of the order of log (N).  So, given that the smallest data set tested had 100,000 elements, I expected it to perform the best.  Also, it should scale very well as the number of data elements increased.  The performance of the linear search is of order N.  So, it should take twice as long if the data set being searched doubled in size.  I also expected that the performance of the linear search to be independent of whether the lookup_array was sorted or not.

As expected the binary search outperformed the linear search.  When searching 100,000 elements the binary search was about 80 times faster and when searching 1 million records it was about 1,650 times faster!  Also, the binary search took less than 10% longer searching a million records vs. searching 100,000 records (0.0486 seconds vs. 0.0534 seconds).  What was unexpected was that the linear search was faster while searching a sorted lookup_array vs. an unsorted array.  I don’t know why.  One possibility is that the random numbers were biased towards the lower values.  Of course, it was to eliminate such a bias that I generated 3 sets of lookup numbers.  So, I cannot account for the performance difference of the linear search when searching a sorted array vs. an unsorted array.

 Time (seconds) Lookup Range Binary Linear (Sorted) Linear (Unsorted) 100,000 0.0486 3.67 4.44 500,000 0.0512 41.58 44.59 1,000,000 0.0534 86.00 92.36

## The Exact Match ‘Workaround’

Given the performance of the binary search algorithm, it was almost a given that sorting an unsorted array and using 2 formulas to find an exact match would be faster than the linear search.  Implementing the algorithm resulted in speeds 10 to 50 times faster than for a linear search, the former for 100,000 records and the latter for 1,000,000 records.

 Simulated Exact Match Lookup Range Time (seconds) 100,000 0.38 500,000 0.92 1,000,000 1.68

## The Result Details

As mentioned above, I tested 3 configurations: Binary Search, Linear Search with the lookup_array sorted, and Linear Search with the lookup_array unsorted.  These algorithms were applied to 3 ranges of lookup_array with 100,000, 500,000, and 1,000,000 million records respectively.  For each, I generated 3 lookup_value data sets, each 10,000 elements large.

In the results below, HowMany indicates the size of the lookup_array, SearchCount the number of elements being looked up, UseBinary indicates if I was using a binary search and OrigSorted indicates whether the lookup_array was sorted or not.  For a binary search, the array had to be sorted and, consequently, the state was predetermined.

The Search Set column indicates the 3 data sets used for the lookup values (10,000 elements each).  The Iteration column is the 3 iterations of entering the MATCH function for the 10,000 elements.  The Time column is the time it took to get the formulas entered.  Excel was in Automatic Calculation mode.  The hardware configuration was a Core 2 Duo processor with 4GB of memory.  The software was Windows 7 64bit and Excel 2010 RC 64bit.

HowMany       SearchCount   UseBinary     OrigSorted

100000        10000        True          N/A

Search Set    Iteration      Time

1             1             6.640625E-02

1             2             0.0546875

1             3             0.046875

2             1             4.296875E-02

2             2             4.296875E-02

2             3             0.046875

3             1             0.046875

3             2             4.296875E-02

3             3             0.046875

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

100000        10000        False         True

Search Set    Iteration      Time

1             1             3.632813

1             2             3.605469

1             3             3.601563

2             1             3.609375

2             2             4.035156

2             3             3.757813

3             1             3.582031

3             2             3.597656

3             3             3.59375

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

100000        10000        False         False

Search Set    Iteration      Time

1             1             4.359375

1             2             4.347656

1             3             4.339844

2             1             4.34375

2             2             4.410156

2             3             4.472656

3             1             4.757813

3             2             4.570313

3             3             4.339844

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

500000        10000        True          N/A

Search Set    Iteration      Time

1             1             5.859375E-02

1             2             5.078125E-02

1             3             0.046875

2             1             5.078125E-02

2             2             5.078125E-02

2             3             5.078125E-02

3             1             5.078125E-02

3             2             0.0546875

3             3             0.046875

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

500000        10000        False         True

Search Set    Iteration      Time

1             1             41.57813

1             2             41.96484

1             3             42.10156

2             1             41.58984

2             2             41.92969

2             3             41.67578

3             1             41.17969

3             2             41.14844

3             3             41.01953

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

500000        10000        False         False

Search Set    Iteration      Time

1             1             44.25391

1             2             43.82422

1             3             44.24609

2             1             44.89844

2             2             44.41406

2             3             44.94922

3             1             44.875

3             2             44.40625

3             3             45.42578

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

1000000       10000        True          N/A

Search Set    Iteration      Time

1             1             0.0546875

1             2             0.0546875

1             3             5.078125E-02

2             1             0.0546875

2             2             5.078125E-02

2             3             5.078125E-02

3             1             5.859375E-02

3             2             0.0546875

3             3             5.078125E-02

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

1000000       10000        False         True

Search Set    Iteration      Time

1             1             86.00781

1             2             85.61328

1             3             85.59375

2             1             85.5625

2             2             85.76563

2             3             85.49219

3             1             86.49219

3             2             86.57422

3             3             86.88672

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

1000000       10000        False         False

Search Set    Iteration      Time

1             1             92.28516

1             2             92.25391

1             3             92.60547

2             1             91.91016

2             2             92.23828

2             3             92.67578

3             1             92.57031

3             2             92.53906

3             3             92.14063

>>> Done

Exact Match   HowMany       SearchCount

1000000       10000

Search Set    Iteration      Time

1             1             1.601563

1             2             1.722656

1             3             1.734375

2             1             1.597656

2             2             1.726563

2             3             1.71875

3             1             1.609375

3             2             1.726563

3             3             1.714844

>>> Done

Exact Match   HowMany       SearchCount

500000        10000

Search Set    Iteration      Time

1             1             0.859375

1             2             0.953125

1             3             0.953125

2             1             0.8359375

2             2             0.9570313

2             3             0.9648438

3             1             0.8398438

3             2             0.9648438

3             3             0.9609375

>>> Done

Exact Match   HowMany       SearchCount

100000        10000

Search Set    Iteration      Time

1             1             0.3085938

1             2             0.4140625

1             3             0.4101563

2             1             0.3203125

2             2             0.40625

2             3             0.4257813

3             1             0.3046875

3             2             0.4101563

3             3             0.3984375

>>> Done

## The Code

The code is modularized.  If someone wants to use it for more tests, it should be fairly easy to reuse components or change it.

Option Explicit

Option Base 0

Sub generateData(ByRef Where As Range, _

ByVal HowMany As Long, ByVal Sorted As Boolean, _

Optional ByVal SimData As Boolean = False)

If SimData Then Sorted = False

Dim Arr: ReDim Arr(HowMany - 1, 0)

Dim I As Long

For I = LBound(Arr, 1) To UBound(Arr, 1)

Arr(I, 0) = IIf(SimData, I * 2, I)

Next I

Dim Dest As Range: Set Dest = Where.Resize(HowMany, 1)

Dest.Value = Arr

If Not Sorted Then

Dest.Offset(0, 1).FormulaR1C1 = "=RAND()"

With Where.Parent

.Sort.SortFields.Clear

Order:=xlAscending, DataOption:=xlSortNormal

With .Sort

.SetRange Dest.Resize(, 2)

.MatchCase = False

.Orientation = xlTopToBottom

.SortMethod = xlPinYin

.Apply

End With

End With

Dest.Offset(0, 1).Delete Shift:=xlToLeft

End If

End Sub

Sub generateTargets(ByRef Where As Range, _

ByVal HowMany As Long, ByVal Rng As Long)

Dim Rslt() As Long: ReDim Rslt(HowMany - 1, 0)

Dim I As Long

For I = LBound(Rslt) To UBound(Rslt)

Rslt(I, 0) = Int(Rnd() * Rng)

Next I

Where.Resize(HowMany, 1).Value = Rslt

End Sub

Sub Test()

Cells.Delete

Const HowMany As Long = 1000000, SearchEleCount As Long = 10000, _

OrigSorted As Boolean = False, UseBinary As Boolean = False

Debug.Print "HowMany", "SearchCount", "UseBinary", "OrigSorted"

Debug.Print HowMany, SearchEleCount, UseBinary, IIf(UseBinary, "N/A", OrigSorted)

'MsgBox HowMany & "," & SearchEleCount & "," & UseBinary

generateData Cells(1, 2), HowMany, IIf(UseBinary, True, OrigSorted)

Dim I As Integer

Debug.Print "Search Set", "Iteration", "Time"

For I = 1 To 3

Dim Targets() As Long

Columns(3).Resize(, 2).Delete

generateTargets Cells(1, 3), SearchEleCount, HowMany

Dim J As Integer, AccumTime As Single

For J = 1 To 3

Dim StartTime As Single

StartTime = Timer

Cells(1, 4).Resize(SearchEleCount, 1).FormulaR1C1 = _

IIf(UseBinary, "=MATCH(RC[-1]," _

& Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

& ",1)", _

"=MATCH(RC[-1]," _

& Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

& ",0)")

Debug.Print I, J, Timer - StartTime

Next J

Next I

Debug.Print ">>> Done"

End Sub

The code to simulate the exact match via the sort and 2 formulas is below.

Sub sortAscending(ColRng As Range)

With ActiveSheet

.Sort.SortFields.Clear

DataOption:=xlSortNormal

With .Sort

.SetRange ColRng

.MatchCase = False

.Orientation = xlTopToBottom

.SortMethod = xlPinYin

.Apply

End With

End With

End Sub

Sub simulateExactSearch()

Const HowMany As Long = 100000, SearchCount As Long = 10000, _

OrigSorted As Boolean = False, UseBinary As Boolean = False

Debug.Print "Exact Match", "HowMany", "SearchCount"

Debug.Print , HowMany, SearchCount

Cells.Delete

generateData Cells(1, 2), HowMany, False, True

Dim SaveArr: SaveArr = Cells(1, 2).Resize(HowMany).Value

Debug.Print "Search Set", "Iteration", "Cumulative Time"

Dim I As Integer

For I = 1 To 3

Columns(3).Resize(, 3).Delete

generateTargets Cells(1, 3), SearchCount, HowMany * 2

Dim J As Integer

For J = 1 To 3

Dim StartTime As Single: StartTime = Timer

sortAscending Cells(1, 2).Resize(HowMany, 1)

Cells(1, 4).Resize(SearchCount, 1).FormulaR1C1 = _

"=MATCH(RC[-1]," _

& Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

& ",1)"

Cells(1, 5).Resize(SearchCount, 1).FormulaR1C1 = _

"=IF(INDEX(" _

& Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

& ",RC[-1])=RC[-2],RC[-1],NA())"

Debug.Print I, J, Timer - StartTime

Cells(1, 2).Resize(HowMany).Value = SaveArr

Next J

Next I

Debug.Print ">>> Done"

End Sub