You are on the Home/Publications & Training/Case Studies/Array COUNTIF page
Google
Web This Site

COUNTIF for a large unsorted array and many searches

Download the add-in There are two add-ins in the zip file, the XLAM for 2007 or later and the XLA for 2003 or earlier.  The VBA project is *not* protected.  To install and load the add-in see Common Installation Instructions.

Ever wanted to count how often each of many numbers (say, 100,000 numbers) occurred in a range of many numbers (say, 100,000)?

Suppose the range to search is A1:A100000 and we want a count of how often each cell in B1:B100000 occurs.  The default approach might be to enter in cell C1 the formula =COUNTIF($A:$A,B1) and copy it down to C2:C100000.  This would give how often each element in B1:B10000 occurs in column A.  And, it will take an awfully long time to calculate – it took about 215 seconds on a multi-threaded 4 processor i7 CPU with Excel using all 8 processors full out.

 

By contrast the ArrCountIf array function below takes about 2 seconds and that too because of a limitation in the Excel VBA interface.  I borrowed the work of others, though both the sort function and the search function are easily written from scratch or found on other sites on the web.

There are a lot of building blocks that were needed to make this work.  In addition to the quick sort and the binary search routines, I also wrote a Transpose routine that both mimics Excel’s TRANSPOSE function and removes the 65,536 row limitation of the native function.  Finally, I added a few array support routines, one to calculate the number of dimensions of an array, and another the length of a given dimension of an array.

A major limitation that arose in developing the code below was that a User Defined Function (UDF) cannot return an array to Excel if the array has more than 65,536 rows.  So, the use of ArrCountIf is limited to ranges of that size.  To use it with 100,000 rows, it has to be entered twice: first, as an array formula in 65,536 cells, and then as another array formula in the remaining 34464 cells.

To use the ArrCountIf function with the model described above, select D1:D65536 and array enter the formula =ArrCountif($A$1:$A$100000,B1:B65536).  To array-enter a formula select the range that will contain the formula, enter the formula, and complete it with the CTRL+SHIFT+ENTER combination rather than just the ENTER or TAB key.  If done correctly, Excel will show the formula bracketed in curly brackets { and }.

Then, select D65537:D100000 and array enter the formula =ArrCountif($A$1:$A$100000,B65537:B100000).

The Main UDF

In a standard code module enter the code below.

Option Explicit

Option Base 0

 

    Function SearchUpDown(Arr, ByVal Targ, ByVal StartIdx As Long, _

            ByVal DirUp As Boolean)

        'After a successful binary search, we have the location of 1 _

         match.  However, there may be other matches in adjacent array _

         locations.  We find them through a linear search

        Dim J As Long, Done As Boolean, ThisCount As Long

        J = StartIdx

        Do

            Done = J = IIf(DirUp, UBound(Arr), LBound(Arr))

            If Not Done Then

                J = J + IIf(DirUp, 1, -1)

                If Arr(J) = Targ Then ThisCount = ThisCount + 1 _

                Else Done = True

                End If

            Loop Until Done

        SearchUpDown = ThisCount

        End Function

Function ArrCOUNTIF(Rng1, Rng2)

    'For large searches, ArrCOUNTIF is faster than COUNTIF because it first _

     sorts the array being searched.  Consequently, if we want to perform _

     the COUNTIF for many values searching a large range, ArrCOUNTIF is _

     worth the effort. On the flip side, the overhead of the sort and the _

     use of VBA will not be justified for small ranges.

    

    'Returns the result of COUNTIF(Rng2.Cell,Rng1) for each cell in Rng2 _

     For some strange reason Rng1 and Rng2 can be ranges as large as the _

     1,048,576 rows.  However, when returning the result, we cannot return _

     more than 65,536 elements. :(

   

    Dim Arr1, Arr2, Rslt() As Long

    'Dim StartTime As Single: StartTime = Timer

    Arr1 = Transpose(Rng1.Value)

    Arr2 = Transpose(Rng2.Value)

    ReDim Rslt(ArrLen(Arr2) - 1)

    QuickSort Arr1

    Dim I As Long

    For I = LBound(Arr2) To UBound(Arr2)

        Dim ThisRslt As Long

        ThisRslt = BinarySearch(Arr1, Arr2(I))

        If ThisRslt >= LBound(Arr1) Then

            Rslt(I - LBound(Arr2) + LBound(Rslt)) = _

                1 + SearchUpDown(Arr1, Arr2(I), ThisRslt, DirUp:=False) _

                + SearchUpDown(Arr1, Arr2(I), ThisRslt, DirUp:=True)

            End If

        Next I

    ArrCOUNTIF = Transpose(Rslt)

    'Debug.Print ArrLen(Arr1); ArrLen(Arr2); Timer - StartTime

    End Function

 

The Quick Sort support module

In a standard module, enter the code below.  It performs the quick sort needed to sort the range being searched.

Option Explicit

 

Private Sub noCheckQuickSort(ByRef SortArray, L, R)

    'Originally posted to an XL NG by Jim Rech

    Dim I, J, x, y

    I = L

    J = R

    x = SortArray((L + R) / 2)

 

    While (I <= J)

        While (SortArray(I) < x And I < R)

            I = I + 1

        Wend

        While (x < SortArray(J) And J > L)

            J = J - 1

        Wend

        If (I <= J) Then

            y = SortArray(I)

            SortArray(I) = SortArray(J)

            SortArray(J) = y

            I = I + 1

            J = J - 1

        End If

    Wend

    If (L < J) Then Call noCheckQuickSort(SortArray, L, J)

    If (I < R) Then Call noCheckQuickSort(SortArray, I, R)

    End Sub

 

Public Sub QuickSort(ByRef Arr, Optional L, Optional R)

    If InStr(1, TypeName(Arr), "(", vbTextCompare) < 1 Then Exit Sub

    If IsMissing(L) Then L = LBound(Arr)

    If IsMissing(R) Then R = UBound(Arr)

    If Not IsNumeric(L) Then Exit Sub

    If Not IsNumeric(R) Then Exit Sub

    If Int(L) <> L Then Exit Sub

    If Int(R) <> R Then Exit Sub

    If L >= R Then Exit Sub

    If L < LBound(Arr) Then Exit Sub

    If R > UBound(Arr) Then Exit Sub

    noCheckQuickSort Arr, L, R

    End Sub

 

The Transpose support module

In a standard code module enter the code below.  It mimics Excel’s TRANSPOSE function in VBA overcoming the limitation that the built-in function is limited to 65,536 rows.

Option Explicit

 

    Function OneDto2D(Arr)

        Dim I As Long, Rslt()

        ReDim Rslt(ArrLen(Arr) - 1, 0)

        For I = LBound(Arr) To UBound(Arr)

            Rslt(I - LBound(Arr) + LBound(Rslt), 0) = Arr(I)

            Next I

        OneDto2D = Rslt

        End Function

    Function TwoDto1D(Arr)

        Dim I As Long, Rslt()

        ReDim Rslt(ArrLen(Arr) - 1)

        For I = LBound(Arr) To UBound(Arr)

            Rslt(I - LBound(Arr) + LBound(Rslt)) = Arr(I, LBound(Arr, 2))

            Next I

        TwoDto1D = Rslt

        End Function

    Function TraditionalTranspose(Arr)

        Dim I As Long, J As Long, Rslt()

        ReDim Rslt(ArrLen(Arr, 2) - 1, ArrLen(Arr, 1) - 1)

        For I = LBound(Arr) To UBound(Arr)

            For J = LBound(Arr, 2) To UBound(Arr, 2)

                Rslt(J - LBound(Arr, 2) + LBound(Rslt, 1), _

                    I - LBound(Arr, 1) + LBound(Rslt, 2)) = Arr(I, J)

                Next J

            Next I

        TraditionalTranspose = Rslt

        End Function

Function Transpose(Arr)

    'This mimics Excel's the effect of _

     Application.WorksheetFunction.Transpose except that it works with _

     more than 65,536 rows.  The Excel Transposefunction does something _

     strange in that it transforms a 2D N-rows-1-column matrix into a _

     1D array of N elements.  In reverse, it transforms a 1D array of N _

     elements into a 2D N-rows-1-column matrix.

    Select Case NbrDim(Arr)

    Case 1: Transpose = OneDto2D(Arr)

    Case 2:

        If ArrLen(Arr, 2) = 1 Then Transpose = TwoDto1D(Arr) _

        Else Transpose = TraditionalTranspose(Arr)

    Case Is > 2:

        'The function is not defined for matrices with >2 dimensions

    Case Else:

        'The function is not defined for non-matrices

        End Select

    End Function

The Binary Search and Array Support module

In a standard code module enter the code below.  It implements the binary search function and includes functions for working with arrays.

Option Explicit

 

Function ArrLen(Arr, Optional ByVal aDim As Integer = 1)

    ArrLen = UBound(Arr, aDim) - LBound(Arr, aDim) + 1

    End Function

Function NbrDim(Arr)

    NbrDim = -1

    On Error GoTo XIT

    Dim I: I = 1

    Do While True

        Dim aLen: aLen = UBound(Arr, I)

        I = I + 1

        Loop

XIT:

    NbrDim = I - 1

    End Function

Function BinarySearch(Arr As Variant, Targ As Variant, _

        Optional ByVal inLow, Optional ByVal inHigh) As Long

    'Binary search code is easily found through a web search. _

     This implementation is based on http://www.devx.com/vb2themax/Tip/18913

    

    Dim Low As Long, High As Long

    If IsMissing(inLow) Then Low = LBound(Arr) Else Low = inLow

    If IsMissing(inHigh) Then High = UBound(Arr) Else High = inHigh

   

    BinarySearch = LBound(Arr) - 1

   

    If Low > High Or Low < LBound(Arr) Or High > UBound(Arr) Then _

        Exit Function

 

    Dim DescOrder As Boolean: DescOrder = (Arr(Low) > Arr(High))

        'A good faith guess

 

    Do

        Dim Mid As Long

        Mid = (Low + High) \ 2

        If Arr(Mid) = Targ Then

            BinarySearch = Mid

            Exit Do

        ElseIf ((Arr(Mid) < Targ) Xor DescOrder) Then

            Low = Mid + 1

        Else

            High = Mid - 1

            End If

        Loop Until Low > High

    End Function