Using the MATCH Function for a Linear Search and a Binary Search

The syntax of the Excel MATCH function is MATCH(lookup_value, lookup_array, [match_type]).  The 3rd argument of the MATCH function, match_type, is a number that can be 1, 0, or -1.  When it is 1 or -1 the range being searched (lookup_array) should be sorted and Excel carries out a binary search.  For a value of 1 the data should be sorted in ascending order and the returned result is the index into lookup_array corresponding to the largest value that is less than or equal to the lookup_value.  For a value of -1 the lookup_array should be in descending order and the returned index refers to the smallest value that is greater than or equal to the lookup_value.  When the 3rd argument is 0, the data do not need to be sorted and Excel executes a linear search, returning the index of an exact match.  In all cases, if no valid index satisfies the criterion MATCH returns the error #N/A.

Of interest are the performance aspects of the linear and the binary search.  Is the performance of a binary search sufficiently superior to a linear search that even when we want an exact match, it justifies sorting a large data set and using an additional formula to simulate an exact match?  Essentially something along the lines of the following: Suppose a 100,000 element unsorted lookup_array is in B with the lookup values in C starting with row 1.  Then, after sorting lookup_array in ascending order, enter in D1 =MATCH(C1,$B$1:$B$100000,1) and in E1 =IF(INDEX($B$1:$B$100000,D1)=C1,D1,NA()).  Cell E1 will now have the same result as if we had used =MATCH(C1,$B$1:$B$100000,0).

Testing the MATCH function

Expectation and Result Summary

The Exact Match ‘Workaround’

The Result Details

The Code

Testing the MATCH Function

To get started, I tested the MATCH function in three configurations:

1)       a binary search, i.e., match_type=1,

2)       a linear search, i.e., match_type=0, with the lookup_array in sorted order, and

3)       a linear search with the lookup_array containing unsorted data.

For each of those configurations, I tested 3 sets of lookup values, where each set had 10,000 values generated randomly.  The reason for using three data sets was to work around the possibility that a particular data set was somehow biased.  For each of these sets, I timed how long it took to enter the MATCH formula in all the relevant cells in one step and have Excel calculate the results.  I repeated this last step 3 times to identify any wrinkles in the process of entering the formula.

Expectation and Result Summary

What did I expect?  Well, the performance of the binary search is of the order of log (N).  So, given that the smallest data set tested had 100,000 elements, I expected it to perform the best.  Also, it should scale very well as the number of data elements increased.  The performance of the linear search is of order N.  So, it should take twice as long if the data set being searched doubled in size.  I also expected that the performance of the linear search to be independent of whether the lookup_array was sorted or not.

As expected the binary search outperformed the linear search.  When searching 100,000 elements the binary search was about 80 times faster and when searching 1 million records it was about 1,650 times faster!  Also, the binary search took less than 10% longer searching a million records vs. searching 100,000 records (0.0486 seconds vs. 0.0534 seconds).  What was unexpected was that the linear search was faster while searching a sorted lookup_array vs. an unsorted array.  I don’t know why.  One possibility is that the random numbers were biased towards the lower values.  Of course, it was to eliminate such a bias that I generated 3 sets of lookup numbers.  So, I cannot account for the performance difference of the linear search when searching a sorted array vs. an unsorted array.

 

Time (seconds)

Lookup
Range

Binary

Linear
(Sorted)

Linear
(Unsorted)

      100,000

0.0486

3.67

4.44

      500,000

0.0512

41.58

44.59

   1,000,000

0.0534

86.00

92.36

 

The Exact Match ‘Workaround’

Given the performance of the binary search algorithm, it was almost a given that sorting an unsorted array and using 2 formulas to find an exact match would be faster than the linear search.  Implementing the algorithm resulted in speeds 10 to 50 times faster than for a linear search, the former for 100,000 records and the latter for 1,000,000 records.

 

Simulated

Exact Match

Lookup

Range

Time

(seconds)

      100,000

0.38

      500,000

0.92

   1,000,000

1.68

 

 

The Result Details

As mentioned above, I tested 3 configurations: Binary Search, Linear Search with the lookup_array sorted, and Linear Search with the lookup_array unsorted.  These algorithms were applied to 3 ranges of lookup_array with 100,000, 500,000, and 1,000,000 million records respectively.  For each, I generated 3 lookup_value data sets, each 10,000 elements large.

In the results below, HowMany indicates the size of the lookup_array, SearchCount the number of elements being looked up, UseBinary indicates if I was using a binary search and OrigSorted indicates whether the lookup_array was sorted or not.  For a binary search, the array had to be sorted and, consequently, the state was predetermined.

The Search Set column indicates the 3 data sets used for the lookup values (10,000 elements each).  The Iteration column is the 3 iterations of entering the MATCH function for the 10,000 elements.  The Time column is the time it took to get the formulas entered.  Excel was in Automatic Calculation mode.  The hardware configuration was a Core 2 Duo processor with 4GB of memory.  The software was Windows 7 64bit and Excel 2010 RC 64bit.

HowMany       SearchCount   UseBinary     OrigSorted

 100000        10000        True          N/A

Search Set    Iteration      Time

 1             1             6.640625E-02

 1             2             0.0546875

 1             3             0.046875

 2             1             4.296875E-02

 2             2             4.296875E-02

 2             3             0.046875

 3             1             0.046875

 3             2             4.296875E-02

 3             3             0.046875

>>> Done

 

 

HowMany       SearchCount   UseBinary     OrigSorted

 100000        10000        False         True

Search Set    Iteration      Time

 1             1             3.632813

 1             2             3.605469

 1             3             3.601563

 2             1             3.609375

 2             2             4.035156

 2             3             3.757813

 3             1             3.582031

 3             2             3.597656

 3             3             3.59375

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 100000        10000        False         False

Search Set    Iteration      Time

 1             1             4.359375

 1             2             4.347656

 1             3             4.339844

 2             1             4.34375

 2             2             4.410156

 2             3             4.472656

 3             1             4.757813

 3             2             4.570313

 3             3             4.339844

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 500000        10000        True          N/A

Search Set    Iteration      Time

 1             1             5.859375E-02

 1             2             5.078125E-02

 1             3             0.046875

 2             1             5.078125E-02

 2             2             5.078125E-02

 2             3             5.078125E-02

 3             1             5.078125E-02

 3             2             0.0546875

 3             3             0.046875

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 500000        10000        False         True

Search Set    Iteration      Time

 1             1             41.57813

 1             2             41.96484

 1             3             42.10156

 2             1             41.58984

 2             2             41.92969

 2             3             41.67578

 3             1             41.17969

 3             2             41.14844

 3             3             41.01953

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 500000        10000        False         False

Search Set    Iteration      Time

 1             1             44.25391

 1             2             43.82422

 1             3             44.24609

 2             1             44.89844

 2             2             44.41406

 2             3             44.94922

 3             1             44.875

 3             2             44.40625

 3             3             45.42578

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 1000000       10000        True          N/A

Search Set    Iteration      Time

 1             1             0.0546875

 1             2             0.0546875

 1             3             5.078125E-02

 2             1             0.0546875

 2             2             5.078125E-02

 2             3             5.078125E-02

 3             1             5.859375E-02

 3             2             0.0546875

 3             3             5.078125E-02

>>> Done

 

HowMany       SearchCount   UseBinary     OrigSorted

 1000000       10000        False         True

Search Set    Iteration      Time

 1             1             86.00781

 1             2             85.61328

 1             3             85.59375

 2             1             85.5625

 2             2             85.76563

 2             3             85.49219

 3             1             86.49219

 3             2             86.57422

 3             3             86.88672

>>> Done

HowMany       SearchCount   UseBinary     OrigSorted

 1000000       10000        False         False

Search Set    Iteration      Time

 1             1             92.28516

 1             2             92.25391

 1             3             92.60547

 2             1             91.91016

 2             2             92.23828

 2             3             92.67578

 3             1             92.57031

 3             2             92.53906

 3             3             92.14063

>>> Done

 

 

Exact Match   HowMany       SearchCount

               1000000       10000

Search Set    Iteration      Time

 1             1             1.601563

 1             2             1.722656

 1             3             1.734375

 2             1             1.597656

 2             2             1.726563

 2             3             1.71875

 3             1             1.609375

 3             2             1.726563

 3             3             1.714844

>>> Done

Exact Match   HowMany       SearchCount

               500000        10000

Search Set    Iteration      Time

 1             1             0.859375

 1             2             0.953125

 1             3             0.953125

 2             1             0.8359375

 2             2             0.9570313

 2             3             0.9648438

 3             1             0.8398438

 3             2             0.9648438

 3             3             0.9609375

>>> Done

 

Exact Match   HowMany       SearchCount

               100000        10000

Search Set    Iteration      Time

 1             1             0.3085938

 1             2             0.4140625

 1             3             0.4101563

 2             1             0.3203125

 2             2             0.40625

 2             3             0.4257813

 3             1             0.3046875

 3             2             0.4101563

 3             3             0.3984375

>>> Done

 

The Code

The code is modularized.  If someone wants to use it for more tests, it should be fairly easy to reuse components or change it.

Option Explicit

Option Base 0

 

Sub generateData(ByRef Where As Range, _

        ByVal HowMany As Long, ByVal Sorted As Boolean, _

        Optional ByVal SimData As Boolean = False)

    If SimData Then Sorted = False

    Dim Arr: ReDim Arr(HowMany - 1, 0)

    Dim I As Long

    For I = LBound(Arr, 1) To UBound(Arr, 1)

        Arr(I, 0) = IIf(SimData, I * 2, I)

        Next I

    Dim Dest As Range: Set Dest = Where.Resize(HowMany, 1)

    Dest.Value = Arr

    If Not Sorted Then

        Dest.Offset(0, 1).FormulaR1C1 = "=RAND()"

        With Where.Parent

        .Sort.SortFields.Clear

        .Sort.SortFields.Add Key:=Dest.Offset(0, 1), SortOn:=xlSortOnValues, _

            Order:=xlAscending, DataOption:=xlSortNormal

        With .Sort

        .SetRange Dest.Resize(, 2)

        .Header = xlGuess

        .MatchCase = False

        .Orientation = xlTopToBottom

        .SortMethod = xlPinYin

        .Apply

            End With

            End With

        Dest.Offset(0, 1).Delete Shift:=xlToLeft

        End If

    End Sub

Sub generateTargets(ByRef Where As Range, _

        ByVal HowMany As Long, ByVal Rng As Long)

    Dim Rslt() As Long: ReDim Rslt(HowMany - 1, 0)

    Dim I As Long

    For I = LBound(Rslt) To UBound(Rslt)

        Rslt(I, 0) = Int(Rnd() * Rng)

        Next I

    Where.Resize(HowMany, 1).Value = Rslt

    End Sub

Sub Test()

    Cells.Delete

    Const HowMany As Long = 1000000, SearchEleCount As Long = 10000, _

        OrigSorted As Boolean = False, UseBinary As Boolean = False

    Debug.Print "HowMany", "SearchCount", "UseBinary", "OrigSorted"

    Debug.Print HowMany, SearchEleCount, UseBinary, IIf(UseBinary, "N/A", OrigSorted)

    'MsgBox HowMany & "," & SearchEleCount & "," & UseBinary

    generateData Cells(1, 2), HowMany, IIf(UseBinary, True, OrigSorted)

    Dim I As Integer

    Debug.Print "Search Set", "Iteration", "Time"

    For I = 1 To 3

        Dim Targets() As Long

        Columns(3).Resize(, 2).Delete

        generateTargets Cells(1, 3), SearchEleCount, HowMany

        Dim J As Integer, AccumTime As Single

        For J = 1 To 3

            Dim StartTime As Single

            StartTime = Timer

            Cells(1, 4).Resize(SearchEleCount, 1).FormulaR1C1 = _

                IIf(UseBinary, "=MATCH(RC[-1]," _

                    & Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

                        & ",1)", _

                    "=MATCH(RC[-1]," _

                        & Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

                        & ",0)")

            Debug.Print I, J, Timer - StartTime

            Next J

        Next I

    Debug.Print ">>> Done"

    End Sub

 

The code to simulate the exact match via the sort and 2 formulas is below.

Sub sortAscending(ColRng As Range)

    With ActiveSheet

    .Sort.SortFields.Clear

    .Sort.SortFields.Add Key:=ColRng, SortOn:=xlSortOnValues, Order:=xlAscending, _

        DataOption:=xlSortNormal

    With .Sort

    .SetRange ColRng

    .Header = xlGuess

    .MatchCase = False

    .Orientation = xlTopToBottom

    .SortMethod = xlPinYin

    .Apply

        End With

        End With

    End Sub

Sub simulateExactSearch()

    Const HowMany As Long = 100000, SearchCount As Long = 10000, _

        OrigSorted As Boolean = False, UseBinary As Boolean = False

    Debug.Print "Exact Match", "HowMany", "SearchCount"

    Debug.Print , HowMany, SearchCount

    Cells.Delete

    generateData Cells(1, 2), HowMany, False, True

    Dim SaveArr: SaveArr = Cells(1, 2).Resize(HowMany).Value

    Debug.Print "Search Set", "Iteration", "Cumulative Time"

    Dim I As Integer

    For I = 1 To 3

        Columns(3).Resize(, 3).Delete

        generateTargets Cells(1, 3), SearchCount, HowMany * 2

        Dim J As Integer

        For J = 1 To 3

            Dim StartTime As Single: StartTime = Timer

            sortAscending Cells(1, 2).Resize(HowMany, 1)

            Cells(1, 4).Resize(SearchCount, 1).FormulaR1C1 = _

                "=MATCH(RC[-1]," _

                    & Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

                    & ",1)"

            Cells(1, 5).Resize(SearchCount, 1).FormulaR1C1 = _

                "=IF(INDEX(" _

                    & Cells(1, 2).Resize(HowMany, 1).Address(True, True, xlR1C1) _

                    & ",RC[-1])=RC[-2],RC[-1],NA())"

            Debug.Print I, J, Timer - StartTime

            Cells(1, 2).Resize(HowMany).Value = SaveArr

            Next J

        Next I

    Debug.Print ">>> Done"

    End Sub