You are on the Home/Publications & Training/Case Studies/VBA & web services page
Google
Web This Site

Using VBA to access the ’Net

Over the past few years there has been an increase in people who want to programmatically access information available over the Internet.  There are a few different ways of doing so with results available in a few different formats.  In this chapter I discuss some of the methods and formats.  The methods include the use of InternetExplorer and XMLHttp and the formats include the “human friendly” HTML as well as XML and JSON.  As part of the discussion of XML, I present a XSL stylesheet to convert a webservice XML output into a SpreadsheetML form that can then be opened directly by Excel.  By the end of the chapter, the reader will know how to develop fully functional code to support concurrent asynchronous VBA access to multiple web sites.

 Separating the messenger from the message. PAGEREF _Toc173749354 \h 1

Processing the human friendly message. PAGEREF _Toc173749355 \h 2

The machine-friendly message. PAGEREF _Toc173749356 \h 4

InternetExplorer. PAGEREF _Toc173749357 \h 6

InternetExplorer in Action. PAGEREF _Toc173749358 \h 7

The InternetExplorer with Events. PAGEREF _Toc173749359 \h 8

XMLHttp. PAGEREF _Toc173749360 \h 8

Key XMLHttp properties and methods. PAGEREF _Toc173749361 \h 9

Using XMLHttp to retrieve data in a synchronous manner. PAGEREF _Toc173749362 \h 10

Using XMLHttp to retrieve data asynchronously. PAGEREF _Toc173749363 \h 11

Properly encapsulating XMLHttp requests. PAGEREF _Toc173749364 \h 12

Using XMLHttp to POST data to a website. PAGEREF _Toc173749365 \h 16

Using XMLHttp to retrieve data in XML and JSON formats. PAGEREF _Toc173749366 \h 18

Submitting a form.. PAGEREF _Toc173749367 \h 20

Summary. PAGEREF _Toc173749368 \h 23

 

 

Recent Comments

From Bruce on Dec 2, 2011:

As always - great article and detail. There are so many restFul services out there, that I thought I would start up a rest/Excel library with the objective of populating a sheet in one line of code. Here''s how http://excelramblings.blogspot.com/2011/11/rest-json-excel-in-one-line-of-code.html

 

I haven't had a chance to check out Bruce's link but I've included it for reference.

From Bruce on October 19, 2011:

Here are some example excel downloads with json excel connversion capabilities.  

 http://ramblings.mcpher.com/Home/excelquirks/json

 

From J.M. on April 30, 2011:

Excellent page and tutorial
step by step - well explained - with good examples.
This is with much the best tutorial I have found on accessing Internet with VBA.

By the way... Google seems to have changed the way it returns its results...

I have found that responses come in a DIV with class=vsc so I changed your loop in googleanalyzer to:


For Each OneResultDIV In HTMLDoc.getElementsByTagName("DIV")
    If OneResultDIV.className = "vsc" Then
        Dim AnAnchor As HTMLAnchorElement
        Set AnAnchor = OneResultDIV.getElementsByTagName("a").Item(0)
        Debug.Print AnAnchor.href & ", " _
            & AnAnchor.innerText & ", " & OneResultDIV.innerText
        End If
    Next OneResultDIV

Hope this helps make better this already excellent page.

 

 

 

Separating the messenger from the message

It is important to remember that the method by which one retrieves information (the messenger, so to say) is not always tied to the message (the information that is delivered).  An analogy would be information stored in a database and the same presented for human consumption either in a form or a report.  The former is typically a relational database, something that is not really human friendly.  The latter is easy for a person to read but no one would store the entire form in the database.  The usual approach is to programmatically merge data retrieved from the database into the form template and display the result.

 

Processing the human friendly message

In the Internet world what one sees when using a web browser is a human friendly result.  For example, a search of Google for "tushar mehta" yields:

 

An expanded view of the first two results looks like:

The human brain easily distinguishes between different results by noting the whitespace between the individual results.  The underlying code, HTML in the case of a webpage, caters to this human friendly result.  If we were to look at it, it would quickly become obvious that formatting is closely intertwined with content, which makes parsing difficult.  Even if we did take the time and effort to do so and we did indeed isolate the content, we would always remain vulnerable to the web page designers changing the layout.

While the HTML document may look like a stream of almost incomprehensible text, it does have a structure to it.  We can programmatically explore the structure through the DOM, i.e., the Document Object Model.  We will see how to leverage it to the extent possible.  One attribute that can be used to identify each individual element in the DOM is the ID attribute but it is optional and not all developers use it.

For example, if we were to look at the Google result corresponding to the above search – and the result is definitely not human friendly since Google removes all whitespace – we would, through painstaking research, discover that the actual results are wrapped up in DIV containers:

<div id=res>    <!--a-->
         <div>
                 <div class=g> <!--Individual result here --></div>
                 ...for however many results Google returns on 1 page...
         </div>
         <div>
                 <!-- more divs for the result navigation at page bottom -->
         </div>
</div>
<!-- code for Google Pack ad and rest of page footer -->
 

There’s a whole bunch of stuff before the <div id=res> tag that we can safely ignore.  From the above we can deduce that if we look for a DIV with an ID of RES, then the first DIV element in it would contain the results, each in its own DIV element.

Further analysis of the HTML would reveal that each individual result contains one link (<a href ="…") for the first line of the result and a table for the descriptive text on the 2nd line and the remaining links:

<div class=g>

       <!--m-->

       <link rel="prefetch" href="http://www.tushar-mehta.com/">

       <h2 class=r>

              <a href="http://www.tushar-mehta.com/" target=nw class=l

onmousedown="return clk(0,'','','res','1','')">Home -- TM Consulting

</a>

       </h2>

       <table border=0 cellpadding=0 cellspacing=0>

              <tr>

                     <td class="j">

                           <font size=-1><b>Tushar</b>-<b>Mehta</b>.com provides Operations

Strategy and Web Strategy consulting.<br>

                           <span class=a>www.<b>tushar</b>-<b>mehta</b>.com/ - 44k - </span>

                           <nobr>

                           <a class=fl href= <!--link details deleted -->>Cached</a>

                            -

                           <a class=fl href=<!--link details deleted-->>Similar pages</a>

                           <span class=bl> - <a <!--link details deleted-->>Note this</a>

</span>

                           </nobr>

                           </font><!--n-->

                     </td>

              </tr>

       </table>

</div>

From the above, we can figure out that the first <a> tag is the main link and the only cell in the table contains the descriptive text that Google shows with each result together with the some other links.  The following subroutine implements our discussion.  Given an object of HTMLDocument (the HTML document), it uses the getElementById method to get a reference to the DIV element (type HTMLDivElement ) with the ID of Res.  Then, it uses the getElementsByTagName method of the DIV object to get a collection (type HTMLElementsCollection) of all embedded DIV elements and the Item property of the Collection object to reference the first nested DIV element.  Next, it loops through all the DIV elements in that element.  For each of those DIV elements, it gets a reference to the first A element (type HTMLAnchorElement).  Finally, it prints the href and innerText properties of the anchor element.

Sub GoogleAnalyzer(HTMLDoc As HTMLDocument)

    'Google returns each result as an individual DIV element _

     inside the first DIV element inside a DIV element with an _

     ID of Res. _

     Each result's DIV element has several hyperlinks _

     (<a href=...> tags) the first one containing the actual _

     hyperlink to the website

    Dim ResultDIV As HTMLDivElement, _

        AllResultsDIV As HTMLDivElement, _

        OneResultDIV As HTMLDivElement

    Set ResultDIV = HTMLDoc.getElementById("Res")

    Set AllResultsDIV = ResultDIV.getElementsByTagName("DIV").Item(0)

    For Each OneResultDIV In AllResultsDIV.getElementsByTagName("DIV")

        Dim AnAnchor As HTMLAnchorElement

        Set AnAnchor = OneResultDIV.getElementsByTagName("a").Item(0)

        Debug.Print AnAnchor.href & ", " & AnAnchor.innerText

        Next OneResultDIV

    End Sub

 

Of course, we still don’t know that those links contain and it would require some more processing if we wanted to identify a specific link.

By contrast, if we queried Yahoo! for a stock quote, it provides an ID attribute for each field, which makes it very easy to extract information from the DOM.  For example, price information for Microsoft (symbol MSFT) returns, buried amidst other HTML tags,

<small>At <span id="yfs_t10_msft">10:27AM ET</span>: </small>
<big><b><span id="yfs_l10_msft">29.69</span></b></big>
 

Essentially, the time and the quote both are in a SPAN element with a unique ID attribute.  As the code sample below demonstrates, this makes it a lot easier to extract information from the DOM.

Sub getYahooStockQuote(HTMLDoc As HTMLDocument, StockSymbol As String)

    Debug.Print HTMLDoc.getElementById("yfs_l10_" & StockSymbol).innerText

    End Sub

 

The machine-friendly message

 An alternative to the above messages would be to get the result in a computer-friendly format.  Typically, this would not contain formatting information and restrict itself to content.  In the context of the web, the most common such format is XML with JSON (Javascript Simplified Object Notation) often used as an easier-to-deal-with alternative.

In a subsequent section, we will look at how we would process information in XML format.  For the time being, we will just see the result of using a Yahoo! web service to query for “tushar mehta” first in XML and then in JSON.

Programmatically, one can query a Yahoo! web service with the url

"http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=tushar+mehta&results=2"

with the corresponding result:

<?xml version="1.0" encoding="UTF-8"?>

<ResultSet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

xmlns="urn:yahoo:srch"

xsi:schemaLocation="...long url deleted..." type="web"

totalResultsAvailable="353000"

totalResultsReturned="2"

firstResultPosition="1"

moreSearch=”...url deleted...">

<Result>

<Title>Home -- TM Consulting</Title>

<Summary>Tushar-Mehta.com provides Operations Strategy and Web Strategy consulting ... You are on the Home page. Welcome to Tushar Mehta Consulting. Multi-disciplinary. Skills ...</Summary>

<Url>http://www.tushar-mehta.com/</Url>

<ClickUrl>...long url deleted...</ClickUrl>

<DisplayUrl>www.tushar-mehta.com/</DisplayUrl>

<ModificationDate>1179212400</ModificationDate>

<MimeType>text/html</MimeType>

<Cache>

<Url>...cache url deleted...</Url>

<Size>44466</Size>

</Cache>

</Result>

 

<Result>...2nd result deleted...

</Result>

</ResultSet>

 

While the result may not be pretty to look at, it is a lot easier to programmatically analyze the above XML than it is to deal with the HTML DOM.

Yahoo! will also return JSON, something that we will explore only briefly.  While JSON may not be very useful in VB, it is elegantly easy to use in JavaScript.

{"ResultSet":

{"type":"web",

"totalResultsAvailable":353000,

"totalResultsReturned":2,

"firstResultPosition":1,

"moreSearch":"\/WebSearchService\/V1\/webSearch?query=tushar+mehta&amp;appid=YahooDemo&amp;region=us",

"Result":[

{"Title":"Home -- TM Consulting",

"Summary":"Tushar-Mehta.com provides Operations Strategy and Web Strategy consulting ... You are on the Home page. Welcome to Tushar Mehta Consulting. Multi-disciplinary. Skills ...",

"Url":"http:\/\/www.tushar-mehta.com\/",

"ClickUrl": ...ClickUrl deleted...",

"DisplayUrl":"www.tushar-mehta.com\/",

"ModificationDate":1179212400,

"MimeType":"text\/html",

"Cache":

{"Url": ...cache url deleted...,

"Size":"44466"}

},

{"Title":"tushar-mehta.com - Excel",

...details of 2nd result deleted

}

]

}

}

Essentially, the JavaScript Eval function converts the above into an associative array (something akin to a VB Collection or a Windows Scripting Dictionary).  An added benefit is that in JavaScript an associative array is also an object with each index becoming a property!

So far, we have seen both human friendly and machine friendly messages.  We also know that it is possible to programmatically extract information from each – albeit with varying degrees of difficulty.  Next, we look at the messengers.

InternetExplorer

Most people exploring the automation of web services probably start with the ActiveX control InternetExplorer (or its close cousin the WebBrowser, something that can be embedded in an userform).  Automating it is very similar to automating any Office product like PowerPoint, Word, or Excel itself.  The object has properties and methods with which one controls the object.  The code below instantiates InternetExplorer and navigates to a URL.  It isn’t quite a general purpose routine yet but we will address that later.  After navigating to the desired URL, the code ensures that the target page is loaded before proceeding to the analysis step.  Then, it uses a custom analyzer routine if the web page was a Google page or a Yahoo! Finance page.  Finally, it carries out necessary housekeeping.

Sub getResultsWithIE(ByVal URL As String)

        'URL contains the complete string to pass to IE, _

         e.g., http://www.google.com/search?q=tushar+mehta

    Dim IEApp As InternetExplorer   'Need a reference to Microsoft Internet Controls

    Dim HTMLDoc As HTMLDocument   'Need a reference to Microsoft HTML Object Library

   

    Set IEApp = New InternetExplorer

    IEApp.Visible = True    'Some things don't work unless it's visible

   

    IEApp.Navigate URL

   

    'Since the web page loads in parallel to this program, _

     i.e., asynchronously, we need to wait for it to complete loading

    Do

        DoEvents

        Loop Until IEApp.readyState = READYSTATE_COMPLETE

   

    'The IE application contains the HTML webpage as an object _

     in it.  The next statement creates a reference to it

    Set HTMLDoc = IEApp.Document

   

    'Use a custom analyzer module if we have one

    If InStr(1, URL, "google.com", vbTextCompare) > 0 Then

        GoogleAnalyzer HTMLDoc

    ElseIf InStr(1, URL, "finance.yahoo", vbTextCompare) > 0 Then

        getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

        End If

    IEApp.Quit

    Set IEApp = Nothing

    End Sub

And it would be used like so:

Sub testIE()

    getResultsWithIE ("http://www.google.com/search?q=" & Escape("tushar mehta"))

    End Sub

 

Notice the use of the Escape function?  When we submit data over the Internet, several characters cannot be sent as-is but must be encoded.  This is to avoid communication errors.  Languages, such as JavaScript, which are designed for web work, support Escape as a native function.  VB, on the other hand, does not and we must “roll our own.”

Function Escape(ByVal URL As String) As String

        'URLs cannot contain most special characters. _

         VBScript and JavaScript have built-in Escape functions. _

         In VB we have to write our own

    Dim I As Integer, BadChars As String

    BadChars = "<>%=&!@#$^()+{[}]|\;:'"",/?"

    For I = 1 To Len(BadChars)

        URL = Replace(URL, Mid(BadChars, I, 1), "%" & Hex(Asc(Mid(BadChars, I, 1))))

        Next I

    URL = Replace(URL, " ", "+")

    Escape = URL

    End Function

 

For another example of using InternetExplorer see Dick Kusleika’s
http://www.dicks-blog.com/archives/2004/04/30/translating-text/

InternetExplorer in Action

In the above section, the code that instantiated IE waited for it to fully load a page (with the Do…Loop statement).  This is because IE runs asynchronously with our VB code.  There can be instances when we don’t want to wait for the page to complete loading.  In fact, we can leverage both its asynchronous nature as well as the fact that IE is a standalone application designed to display webpages to a human.  Consider a scenario where we want to respond to user action by displaying information from different web pages.  For example, in an Excel worksheet with a column of stock symbols, as the user clicks in a different cell we want to show information about that stock.  The code to show the information in an instance of InternetExplorer is below.  It checks if it still has access to the asynchronously running IE.  If not, it instantiates a new copy.  In either case it uses InternetExplorer’s Navigate method to direct the browser to a specific web page.

Option Explicit

 

Dim IEApp As InternetExplorer   'Need a reference to Microsoft Internet Controls

Sub AsyncIE(ByVal URL As String)

        'URL contains the complete string to pass to IE, _

         e.g., http://www.google.com/search?q=tushar+mehta

    On Error Resume Next

    IEApp.Navigate URL

    If Err.Number <> 0 Then

        Set IEApp = Nothing

        Set IEApp = New InternetExplorer

        IEApp.Visible = True    'Some things don't work unless it's visible

        IEApp.Navigate URL

        End If

    If IEApp Is Nothing Then Exit Sub

    End Sub

 

In the worksheet, column B has stock symbols.  The worksheet SelectionChange event procedure below calls the AsyncIE procedure if it can identify a legitimate stock symbol.

Option Explicit

 

Private Sub Worksheet_SelectionChange(ByVal Target As Range)

    Dim StockCell As Range

    Set StockCell = Application.Intersect(Target.EntireRow, Target.Parent.Columns(2))

    If StockCell Is Nothing Then Exit Sub

    If StockCell.Cells.Count > 1 Then Exit Sub

    If IsEmpty(StockCell.Value) Then Exit Sub

    AsyncIE _

            "http://finance.yahoo.com/q?s=" & StockCell.Value

    End Sub

 

Obviously, one is not restricted to Yahoo! Finance and stock prices.  One could use the above to look up clients through an Intranet application.

The InternetExplorer with Events

In the first section on InternetExplorer the VB code waited until IE completely loaded the web page.  It was only after that that it proceeded to the analysis phase.  While that technique is OK for demonstration purposes it suffers from two potential problems.  First, while the VB code is waiting so must the user.  And, it could be a long wait.  The alternative is to leverage the events of the InternetExplorer object.  Just like Excel and other Office programs, IE supports several different events.  We will look at only one, the DocumentComplete event.  First in a class module named clsIE, we declare IE as a ‘WithEvents’ object, use the class Initialize event to instantiate IE, if necessary, add a Navigate method, and a DocumentComplete event procedure that will analyze the web page.

Option Explicit

 

Dim WithEvents IE As InternetExplorer

 

Private Sub Class_Initialize()

    On Error Resume Next

    IE.Visible = True

    If Err.Number <> 0 Then

        Set IE = New InternetExplorer

        IE.Visible = True

        End If

    End Sub

Public Sub Navigate(ByVal URL As String)

    IE.Navigate URL

    End Sub

Private Sub IE_DocumentComplete(ByVal pDisp As Object, URL As Variant)

    MsgBox TypeName(pDisp)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = pDisp.Document

    If InStr(1, URL, "google.com", vbTextCompare) > 0 Then

        GoogleAnalyzer HTMLDoc

    ElseIf InStr(1, URL, "finance.yahoo", vbTextCompare) > 0 Then

        getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

        End If

    End Sub

With the event procedure available to us, the code that responds to the user no longer has to wait for IE to complete loading the page.  It is now drastically simplified to:

Option Explicit

 

Dim oIE As clsIE

 

Sub demoIEWithEvents()

    Set oIE = New clsIE

    oIE.Navigate "http://www.google.com/search?q=" & Escape("tushar mehta")

    End Sub

Another advantage of this approach is that the client code (the demoIEWithEvents procedure above) does not have any IE specific code in it.  All the IE related code is encapsulated in the clsIE class module.

One can also trap events raised by objects inside the browser.  For more see http://msdn2.microsoft.com/en-US/library/aa752045.aspx.

XMLHttp

Use of InternetExplorer, as in the previous section, has four drawbacks.

First, it requires instantiating a separate application that was designed to display a web page to the user.

Second, and as a consequence, once the webpage loads we had to parse the DOM to extract the information of interest.  We would have a hard time interacting with a webservice that provides information in a computer-friendly form (XML, for example).

Third, the application becomes visible to the user however briefly.

Fourth, all the information we sent to the web server was in the URL itself.  Whatever parameters we wanted to pass (in the case of the Google example the search string and in the case of the Yahoo! example the stock symbol) were suffixed to the URL itself.  This is the same as using the GET method to process a client form.  The alternative is to use the POST method.  In this case, the parameters are in the message body.  So, what happens if we have a server that expects an incoming message to use the POST method?  It is possible to automate InternetExplorer in this case (see  REF _Ref173725876 \h Submitting a form) but the processing could get somewhat convoluted. 

We can address our concerns by using the XMLHttp object.  This ActiveX object was initially introduced by Microsoft and is one of the core components of AJAX.  The web browsers Firefox, Opera, and Safari also support this capability.  Of course, in this document we are interested in how we can use VB to interact with it.

In the next section we will look at key properties and methods of the XMLHttp object.  Then, we will look at four different scenarios of using the object.  First, we will duplicate the InternetExplorer example from above.  Second, we will process the server response asynchronously.  Third, we will use the POST method to retrieve data from a server.  Fourth, and finally, we will retrieve data in two different machine-friendly forms, XML and JSON.

Key XMLHttp properties and methods

The first step in using the XMLHttp object is to use the Open method to establish key parameters.

XMLHttp.open(strMethod, strUrl, varAsync, strUser, strPassword)

The open method opens a connection to the server but it doesn’t actually send any information across.

strMethod specifies the GET or POST method

strUrl is the URL of interest,

varAsync indicates whether the response will be process asynchronously (varAsync=TRUE) or synchronously,

and strUser and strPassword are for authentication.

 

Next, in the case of an asynchronous request, we specify the event handler.  This requires some special processing in VBA and is explained in a subsequent section.

XMLHttp.onreadystatechange = funcMyHandler

If the XMLHttp response is to be processed asynchronously (see the open method above), the onreadystatechange property specifies the event handler for the readystatechange event.  In VB this requires some special handling that we will discuss in a later section

 

Next, we send the request to the server with the Send method.

XMLHttp.send(varBody);

The send method actually sends the message to the server.  varBody is an optional message text.  It contains the parameters for the POST method.

 

In the case where we are processing the response asynchronously, we need to monitor the readyState of the request.

XMLHttp.readyState

The read-only readyState represents the state of the request.  When the request is complete, readyState will have a value of 4.

 

When the process is complete, use the status property to check how it was completed.

XMLHttp.status

The read-only status represents the HTTP status code returned by the request.  A value of 200 indicates it was processed successfully.  For other codes see the link below.

 

If the returned response is a HTML document or a JSON string, use the responseText property to retrieve it.

XMLHttp.responseText

The read-only responseText property contains the text of the response as a string.

 

On the other hand, if the returned response is a XML document, use the responseXML property to retrieve the XML object.

XMLHttp.responseXML

The read-only responseXML property returns an object containing the parsed XML document.  See a later section on how to process the contents.

 

With the introduction to the key properties and methods complete, we can now move on to the first of our XMLHttp requests.  For additional documentation on the XMLHttp object visit the Microsoft link at http://msdn2.microsoft.com/en-us/library/ms759148.aspx

Using XMLHttp to retrieve data in a synchronous manner

This, the first use of XMLHttp, essentially duplicates the InternetExplorer capability we saw above.  The code to use the ActiveX object is below.  Notice how much simpler it is compared to the InternetExplorer version.  The Open method’s 3rd argument is False to indicate the request should be processed synchronously.  This will make our code wait at the Send until the request completes processing.  Once that happens, the webpage’s HTML is in the responseText string property.  To use the HTMLDocument object, we simply create one and assign its body’s innerHTML property to responseText.  Now, we can reuse the analyzer modules from earlier.

Sub XMLHttpSynchronous(ByVal URL As String)

    Dim XMLHttpRequest As XMLHTTP

   

    Set XMLHttpRequest = New MSXML2.XMLHTTP

 

 

    XMLHttpRequest.Open "GET", URL, False

    XMLHttpRequest.send

   

    Dim HTMLDoc As New HTMLDocument

    HTMLDoc.body.innerHTML = XMLHttpRequest.responseText

    

    'Use a custom analyzer module if we have one

    If InStr(1, URL, "google.com", vbTextCompare) > 0 Then

        GoogleAnalyzer HTMLDoc

    ElseIf InStr(1, URL, "finance.yahoo", vbTextCompare) > 0 Then

        getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

        End If

 

    End Sub

 

Using XMLHttp to retrieve data asynchronously

There are many instances – almost all, some might say – when it does not make sense to wait for the web service to complete its response.  Rather, the code may move on to some other task in response to another user action and we may want to process the web service response when it actually completes.

In this case, our request to the server will not cause our code to wait at the Send method but simply continue on.  Obviously, we cannot process the result “in line” as in the last section.  Instead, we create a subroutine that is called as a callback function by our request.  This callback function should track the state of the request and process the result only when the request is completed successfully.

The XMLHttp object was designed to work with scripting languages and it is very easy to use it in that context.  For example, in JavaScript, given a function named funcMyHandler, one would simply use

XMLHttp.onreadystatechange = funcMyHandler

 

Of course, VBA doesn’t support the above construct in which we are doing the equivalent of the AddressOf operator, which is what we would use to pass the address of a callback function to an API.  But, we cannot use AddressOf in this context since the following will generate a compile-time error

XMLHttp.onreadystatechange = AddressOf funcMyHandler

 

So, what does one do if the AddressOf operator doesn’t work and the scripting technique doesn’t either?  It turns out that the onReadyStateChange property will accept any object and at run time call its default method.  So, we create a custom class, add a method to it and designate that method as the class’s default method.  But, wait.  We can’t assign a default anything to a class in VBA!  So, what next?  It turns out that while we cannot designate a default for a class through the Visual Basic Editor, there’s a workaround that we can use.

Start with the code to create the asynchronous call.  The XMLHttpMonitor is the class instance that we will use to implement our workaround.  There are only two changes in the interaction with the XMLHttp object: the specification of the onreadystatechange property and the change from False to True in the value of the third argument to the Open method.

Dim XMLHttpReq As MSXML2.XMLHttp

Dim XMLHttpMon As clsXMLHttpMonitor

 

Sub XMLHttpAsync(ByVal URL As String)

   

    Set XMLHttpReq = New MSXML2.XMLHttp

   

    Dim XMLHttpMonitor As clsXMLHttpMonitor

    Set XMLHttpMonitor = New clsXMLHttpMonitor

    XMLHttpMonitor.Initialize XMLHttpReq

   

    XMLHttpReq.OnReadyStateChange = XMLHttpMonitor

   

    XMLHttpReq.Open "GET", URL, True

    XMLHttpReq.send

   

    End Sub

 

The clsXMLHttpMonitor class contains:

Option Explicit

 

Dim XMLHttpReq As MSXML2.XMLHttp

 

Public Sub Initialize(ByVal uXMLHttpRequest As MSXML2.XMLHttp)

   Set XMLHttpReq = uXMLHttpRequest

End Sub

 

Sub ReadyStateChangeHandler()

    'Debug.Print XMLHttpReq.readyState

    If XMLHttpReq.readyState = 4 Then

        If XMLHttpReq.Status = 200 Then

            'Process the response here

        Else

            Debug.Print XMLHttpReq.Status & ", " & XMLHttpReq.responseText

            End If

        End If

    End Sub

As noted above, we still have to designate ReadyStateChangeHandler as the default method for this class.  To make that happen, do the following:

1)      Export and remove this class module.

2)      Open the exported file in a text editor like Notepad.  After the ‘Sub ReadyStateChangeHandler()’ line add a new line: ‘Attribute Value.VB_UserMemId = 0’

3)      Save the file and import it back into the VBA project.

For more details see Chip Pearson’s http://www.cpearson.com/excel/DefaultProperty.htm

Finally, to use the above, one would use

Sub testAsync()

    XMLHttpAsync "http://www.google.com/search?q=tushar+mehta"

    End Sub

For another example see Juan Pablo Gonzalez’s post at http://www.dailydoseofexcel.com/archives/2006/10/09/async-xmlhttp-calls/

For Microsoft’s take on the different ways to asynchronously process data with the XMLHttp see http://msdn2.microsoft.com/en-us/library/ms757030.aspx

Properly encapsulating XMLHttp requests

The method described in the previous section works fine when one tests it.  However, in a “production” environment it is very likely to fail.  The reason is that the caller (the consumer of the class) may initiate some task that reuses the XMLHttpMonitor before it is done processing an earlier request.  The first diagram below illustrates when the above code works safely.  Some user action triggers the code that initiates the asynchronous XMLHttp request. Since the completed asynchronous response is processed before the next user action, the code works as intended.

In the next sequence of events, shown below, a second user action triggers the asynchronous request before the first has finished processing, which means that the global variables XMLHttpReq and XMLHttpMon are still in use.  If we reuse them, we would lose the ability to process the response to the first request!

The other issue we should address is that the code that initiates the XMLHttp request must declare and support two globals: XMLHttpReq and XMLHttpMon.  In addition, the code and the objects are in the consumer domain, as it were, and we are dependent on individual developers to get correctly use the XMLHttpReq and XMLHttpMon objects.

It’s always a good idea to encapsulate such code and in this case it is relatively easy to do.  But before we do that there is one more item to consider.  Once we have all the code in a class module, we must provide some way for the consumer of the class to “tell” us how to process the response.  Obviously, we don’t want to change the code in the class module each time someone wants to call a new routine to process the response.  So, we implement a new “ResponseProcessor” variable and use the Application.Run method to call it when the request is completed.  A sample subroutine that acts as a response processor is below.  Note that it takes three parameters, the XMLHttp object, the original URL, and the original message.

Sub GoogleResponseProcessor(XMLHttpReq As XMLHttp, _

        ByVal URL As String, ByVal sMsg As String)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = New HTMLDocument

    HTMLDoc.body.innerHTML = XMLHttpReq.responseText

    GoogleAnalyzer HTMLDoc

    End Sub

 

To implement all of the above issues, the new clsXMLHttpMon will (1) have a read-only flag to indicate when it is available for a new task, (2) take care of all XMLHttpReq processing, and (3) call a user provided “ResponseProcessor” subroutine.  Note that the onreadystatechange property is now set to Me, i.e., the instantiation of the class.

While we are at it, we might as well take care of one more item.  Since we are encapsulating all the XMLHttp related code in a single class, we might as well take over responsibility for synchronous requests.

Option Explicit

 

Dim XMLHttpReq As MSXML2.XMLHttp, _

    bIAmAvailable As Boolean, _

    ResponseProcessor As String, _

    sURL As String, sMsg As String

 

Property Get IAmAvailable() As Boolean

    IAmAvailable = bIAmAvailable

    End Property

 

Sub ReadyStateChangeHandler()

    'Debug.Print XMLHttpReq.readyState

    If XMLHttpReq.readyState = 4 Then

        If XMLHttpReq.Status = 200 Then

            'Process the response here

            'hrefToHyperLink XMLHttpReq.responseText, ActiveCell

            Application.Run ResponseProcessor, XMLHttpReq, sURL, sMsg

            bIAmAvailable = True

        Else

            Debug.Print XMLHttpReq.Status & ", " & XMLHttpReq.responseText

            End If

        End If

    End Sub

 

Public Sub XMLHttpCall(ByVal ReqMethod As String, _

        ByVal URL As String, ByVal uResponseProcessor As String, _

        Optional ByVal AsyncCall As Boolean = True, _

        Optional ByVal uMsg As String = "")

    Set XMLHttpReq = New MSXML2.XMLHTTP

    If AsyncCall Then

        sURL = URL: sMsg = uMsg

        ResponseProcessor = uResponseProcessor

        XMLHttpReq.OnReadyStateChange = Me

        End If

    With XMLHttpReq

    .Open ReqMethod, URL, AsyncCall

    .send uMsg

    If Not AsyncCall Then Application.Run uResponseProcessor, XMLHttpReq, URL, uMsg

        End With

    End Sub

 

There’s one more task to take care of.  We need code to properly instantiate a new request taking into account that existing requests may still be pending.  Rather than make this the responsibility of every consumer of our class, we will create a ‘Manager’ class.  The task of this new class module, named clsXMLHttpManager, is the following: It either finds the first available XMLHttpMon object or creates a new one. Next, it passes along to this object the request it received.  So, once it finds (or creates) a new monitor, it acts as just a pass through routine.

Option Explicit

Option Base 0

 

Dim XMLHttpMon() As clsXMLHttpMonitor

 

Private Function findAvailMon() As clsXMLHttpMonitor

    Dim I As Integer, Done As Boolean

    I = LBound(XMLHttpMon)

    Do

        If XMLHttpMon(I) Is Nothing Then

            Done = True

        ElseIf XMLHttpMon(I).IAmAvailable Then

            Done = True

        Else

            I = I + 1

            Done = I > UBound(XMLHttpMon)

            End If

        Loop Until Done

    If I > UBound(XMLHttpMon) Then _

        ReDim Preserve XMLHttpMon(UBound(XMLHttpMon) + 1)

    Set XMLHttpMon(I) = New clsXMLHttpMonitor

    Set findAvailMon = XMLHttpMon(I)

    End Function

Public Sub XMLHttpCall(ByVal ReqMethod As String, _

        ByVal URL As String, ByVal uResponseProcessor As String, _

        Optional ByVal AsyncCall As Boolean = True, _

        Optional ByVal uMsg As String)

    Dim XMLHttpMon As clsXMLHttpMonitor

    Set XMLHttpMon = findAvailMon()

    XMLHttpMon.XMLHttpCall ReqMethod, URL, uResponseProcessor, AsyncCall, uMsg

    End Sub

 

Private Sub Class_Initialize()

    ReDim XMLHttpMon(0)

    End Sub

 

Now, the consumer of these classes need not worry about any of the details of calling XMLHttp.  One could initiate multiple calls to different websites without worrying about any timing issues.  Below are two calls,

1)      A Google search for “tushar mehta” with the response processed by the GoogleResponseProcessor subroutine.

2)      A Yahoo! Finance search for the stock price of IBM with the response processed by the YahooStockProcessor subroutine.

By the time we are done processing with the next two sections, we will test with four calls initiated in quick succession.  Below are the response processors to analyze the contents of the Google and Yahoo Stock responses.

Option Explicit

 

Dim XMLHttpManager As New clsXMLHttpManager

 

Sub testAsyncIntegrated()

    XMLHttpManager.XMLHttpCall "GET", _

        "http://www.google.com/search?q=tushar mehta", "GoogleResponseProcessor"

    XMLHttpManager.XMLHttpCall "GET", "http://finance.yahoo.com/q?s=ibm", _

        "YahooStockProcessor"

    End Sub

 

Sub GoogleResponseProcessor(XMLHttpReq As XMLHttp, _

        ByVal URL As String, ByVal sMsg As String)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = New HTMLDocument

    HTMLDoc.body.innerHTML = XMLHttpReq.responseText

    GoogleAnalyzer HTMLDoc

    End Sub

 

Sub YahooStockProcessor(XMLHttpReq As XMLHttp, _

        ByVal URL As String, ByVal sMsg As String)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = New HTMLDocument

    HTMLDoc.body.innerHTML = XMLHttpReq.responseText

    getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

    End Sub

 

 

Using XMLHttp to POST data to a website

There are instances when one needs to use the POST method to initiate a XMLHttp request.  The important thing to remember is that when using POST the arguments passed to the web server must be in the message and not in the URL itself.  For example, the babelfish.altavista.com language translation website expects requests that are POSTs.  The HTML source for that webpage looks like:

<form action="http://babelfish.altavista.com/tr" method="POST...other stuff...>
 <input type=hidden name=doit value="done">
 <input type=hidden name=intl value="1">
      <b class=m><font color=#000000>Translate a block of text</font></b>
<br>
 ...other stuff...
 <input type=hidden name=tt value="urltext" >
 <textarea rows="6" wrap=virtual cols="42" style="width:400" name="trtext""></textarea><br>
<nobr><select name="lp" style="font-size:0.8em;" tabindex="1" class="button">
 <option value="">Select from and to languages</option>
...some translation options...
 <option value="en_es">English to Spanish</option>
...other translation options...
</select>
</nobr>
  <input type="Submit" value="Translate" name="btnTrTxt">&nbsp;
</form>
 

From the above, we can figure out that the argument list we need is

Doit=done&intl=1&tt=urltext&trtext={text we want to translate}&lp={code for to and from languages}

Actually, it turns out that all we need are the tt, trtext, and lp arguments.

The URL that does the translation is http://babelfish.altavista.com/tr (see the form’s action property).  If we carry out the translation and look at the source of the returned page (see below), we would find that is very difficult to uniquely identify the result.  The best we can do is note that it is in the first DIV element in the first DIV element.  Given how getElementsByTagName returns results it will also be the 2nd element returned.

<!-- Content body -->
<div align=center>
<table width=800 cellpadding=0 cellspacing=0 border=0>
 <tr>
  <td valign=top><br class=lb><!-- qbox_translate -->
<style><!--
.wt {color:#FFF;}
--></style>
   <table ...stuff deleted...>
    <tr>
     <td width=10>&nbsp;&nbsp;&nbsp;</td>
     <td class=s valign=top><br class=lb>
       <table width=100%>
         <tr>
           <td class=m...stuff deleted...</td>
           <td align=right...stuff deleted...</td>
         </tr>
       </table>
       
           <table width=400 cellpadding=0 cellspacing=0 border=0>
            <tr><form action="http://www.altavista.com/web/results" method=get>
             <td valign=top><b class=m><font color=#0000000>
                 En espa&ntilde;ol:</font></b></td>
            </tr>    <tr>
             <td bgcolor=white class=s><div style=padding:10px;>hola</div></td>
 

To find what we are translating is a little easier (other than the fact that it is also something we passed along to the webserver in the first place).  It’s in an element with the name of trtext.  So, we can retrieve the textarea element through the getElementsByName method.

<textarea rows="6" wrap=virtual cols="42" style="width:400" name="trtext"">hello</textarea>
 

Before we can actually use the POST capability, it turns out we need to set the value of a request header.  That in turn requires that we use a more modern ActiveX object than the generic XMLHttp object we’ve been using all along.  Making these changes also demonstrates both the power and value of encapsulation.  We will change one and one routine only.  In the clsXMLHttpMonitor class module, change the XMLHttpAsync routine to

Public Sub XMLHttpCall(ByVal ReqMethod As String, _

        ByVal URL As String, ByVal uResponseProcessor As String, _

        Optional ByVal AsyncCall As Boolean = True, _

        Optional ByVal uMsg As String = "")

    Set XMLHttpReq = New MSXML2.XMLHTTP40

    If AsyncCall Then

        sURL = URL: sMsg = uMsg

        ResponseProcessor = uResponseProcessor

        XMLHttpReq.OnReadyStateChange = Me

        End If

    With XMLHttpReq

    .Open ReqMethod, URL, AsyncCall

    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"

    .send uMsg

    If Not AsyncCall Then Application.Run uResponseProcessor, XMLHttpReq, URL, uMsg

        End With

    End Sub

Remember the two calls we made in quick succession in the previous section?  We are now ready to add a third call, this one to translate Hello there from English to Spanish with a corresponding response processor for babelfish.altavista.com.

Sub testAsyncIntegrated()

    XMLHttpManager.XMLHttpCall "GET", _

        "http://www.google.com/search?q=tushar mehta", "GoogleResponseProcessor"

    XMLHttpManager.XMLHttpCall "GET", "http://finance.yahoo.com/q?s=ibm", _

        "YahooStockProcessor"

    XMLHttpManager.XMLHttpCall "POST", "http://babelfish.altavista.com/tr", _

        "BabelfishResponseProcessor", True, _

        "tt=urltext&trtext=hello+there&lp=en_es"

    End Sub

 

Sub BabelfishResponseProcessor(XMLHttpReq As XMLHttp, _

        ByVal URL As String, ByVal sMsg As String)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = New HTMLDocument

    HTMLDoc.body.innerHTML = XMLHttpReq.responseText

    Debug.Print HTMLDoc.getElementsByName("trtext").Item(0).innerText _

        & " translates as " _

        & HTMLDoc.getElementsByTagName("div").Item(1).innerText

    End Sub

 

For a discussion about the differences between GET and POST see http://www.cs.tut.fi/~jkorpela/forms/methods.html

Using XMLHttp to retrieve data in XML and JSON formats

So far we have seen how to retrieve data meant for human consumption and extract the information of interest to us by parsing the HTMLDocument object model.  In this section, we see how to interact with a web service that provides the response in a computer friendly form, i.e., either as a XML document or a JSON string.

As far as the client (i.e., the caller) is concerned all one has to do is hook up with a web service that provides the data in the desired format.  The Yahoo! web service is an ideal candidate since it provides results in both XML and JSON.

When the web service returns a XML document, it can be accessed either as an object with the responseXML property or as a string with the responseText property.  There are obvious advantages to retrieving the data as an object.  A XML document is structured like a tree and one can easily write a recursive routine to traverse the tree.  One can also use a XSL stylesheet to transform the XML document into some other form such as a HTML document or even a SpreadsheetML, the XML dialect developed by Microsoft to create Excel workbooks.

A XSL stylesheet is very much like a programming language that defines how each node in a XML document should be treated.  For example, XSL has for-each, select, and if statements.  The XSL to transform the Yahoo! search result into a HTML document and into the SpreadsheetML form are below.  Each of the two XSL stylesheets is a little more complicated than the introductory XSLs one finds on the web because of the presence of a default namespace in the returned XML document.  For more on XSL stylesheets search Google or see http://articles.techrepublic.com.com/5100-22-1044797.html and for more on dealing with default namespaces see http://www.topxml.com/people/bosley/defaultns.asp. To convert the returned XML document into a HTML document showing the result title, summary, and URL in tabular form, one would use:

<?xml version="1.0" encoding="ISO-8859-1"?>

 

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:lc="urn:yahoo:srch">

 

<xsl:template match="/lc:ResultSet">

  <html>

  <body>

    <h2>Yahoo Results</h2>

    <table border="1">

    <tr bgcolor="#9acd32">

      <th align="left">Title</th>

      <th align="left">Summary</th>

      <th align="left">URL</th>

    </tr>

    <xsl:for-each select="lc:Result">

    <tr>

      <td><xsl:value-of select="lc:Title"/></td>

      <td><xsl:value-of select="lc:Summary"/></td>

      <td><xsl:value-of select="lc:Url"/></td>

    </tr>

    </xsl:for-each>

    </table>

  </body>

  </html>

</xsl:template>

 

</xsl:stylesheet>

 

On the other hand, if we wanted to convert the XML object into a SpreadsheetML file, we would use the exact same VB code but a different stylesheet:

<?xml version="1.0" encoding="ISO-8859-1"?>

 

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:lc="urn:yahoo:srch">

 

<xsl:template match="/lc:ResultSet">

 

<?mso-application progid="Excel.Sheet"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"

 xmlns:o="urn:schemas-microsoft-com:office:office"

 xmlns:x="urn:schemas-microsoft-com:office:excel"

 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"

 xmlns:html="http://www.w3.org/TR/REC-html40">

 <Worksheet ss:Name="Sheet1">

  <Table >

   <Row>

    <Cell><Data ss:Type="String">Title</Data></Cell>

    <Cell><Data ss:Type="String">Summary</Data></Cell>

    <Cell><Data ss:Type="String">URL</Data></Cell>

   </Row>

    <xsl:for-each select="lc:Result">

    <Row>

      <Cell><Data ss:Type="String"><xsl:value-of select="lc:Title"/></Data></Cell>

      <Cell><Data ss:Type="String"><xsl:value-of select="lc:Summary"/></Data></Cell>

      <Cell><Data ss:Type="String"><xsl:value-of select="lc:Url"/></Data></Cell>

    </Row>

    </xsl:for-each>

  </Table>

 </Worksheet>

</Workbook>

</xsl:template>

 

</xsl:stylesheet>

 

That’s right!  The above XSL stylesheet converts the XML Yahoo! search results into an Excel workbook!

In the last section, we processed three XMLHttp requests in quick succession without worrying about timing issues.  Now, we add a fourth, this one to the Yahoo! search web service with the response returned in XML and processed by the XMLResponseProcessor subroutine.

Sub testAsyncIntegrated()

    XMLHttpManager.XMLHttpCall "GET", _

        "http://www.google.com/search?q=tushar mehta", "GoogleResponseProcessor"

    XMLHttpManager.XMLHttpCall "GET", "http://finance.yahoo.com/q?s=ibm", _

        "YahooStockProcessor"

    XMLHttpManager.XMLHttpCall _

        "GET", _

        "http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo" _

            & "&query=tushar+mehta&results=20", _

        "XMLResponseProcessor", True

    XMLHttpManager.XMLHttpCall "POST", "http://babelfish.altavista.com/tr", _

        "BabelfishResponseProcessor", True, _

        "tt=urltext&trtext=hello+there&lp=en_es"

    End Sub

 

In the XMLResponseProcessor (used to process XML responses from Yahoo!) below, we transform the returned XML object into a SpreadsheetML string (using the XSL stylesheet above), write it to a file in the temporary directory, and open it in Excel.

Sub XMLResponseProcessor(XMLHttpReq As XMLHttp, URL As String, sMsg As String)

 

    Dim XMLDoc As New DOMDocument40, XSLDoc As New DOMDocument40

    XMLDoc.async = False: XSLDoc.async = False

    XSLDoc.Load ThisWorkbook.Path & Application.PathSeparator & "yahoo_result_xl.xsl"

    Set XMLDoc = XMLHttpReq.responseXML

   

    Dim OutMsg As String

    OutMsg = XMLDoc.transformNode(XSLDoc)

   

    Dim I As Long, FName As String

    I = FreeFile()

    FName = Environ("temp")

    If FName = "" Then FName = "C:"

    FName = FName & Application.PathSeparator _

        & Format(Now(), "yymmddhhmmss") & ".xml"

    Open FName For Output As #I

    Print #I, OutMsg

    Close #I

   

    Application.Workbooks.Open FName

   

    End Sub

 

 

Submitting a form

The XMLHttp object is a very powerful way of submitting data to a web service or a web site.  A key requirement, of course, is knowing what information one needs to send.  In some cases – the Google, Yahoo!, and AltaVista examples above – we figured that out by looking at the underlying HTML code.  But what if HTML code was extremely complicated and very difficult to figure out?  In that case, we can automate the interaction to a limited extent if we fall back on the InternetExplorer object.  Since we can treat the HTML document as an object, we can refer to the form in it as well as the fields in the form through the object’s properties and methods.  We can actually fill in the form and submit it!  Let’s look at the Google web search form.  It looks like:

The corresponding HTML code for the form portion is:

<form id="sfrm" name="f" method="get" action="/search">
         {various fields deleted}
         <input type="text" name="q" id="q" size="55" maxlength="2048" value=""><br>
         <input type="submit" value="Google Search" name="btnG" id="btnG">
         {various fields deleted}
</form>
 

We can now navigate InternetExplorer to the Google home page, use the above information to fill out the form (the text field is named q) and submit the form named f by calling its submit method.

Sub GoogleFormSubmit(SearchText As String)

 

    Dim IE As InternetExplorer, URL As String, HTMLDoc As HTMLDocument

   

    Set IE = New InternetExplorer

   

    URL = "http://www.google.com"

    IE.Navigate URL

    IE.Visible = True

      

    Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE

        DoEvents

        Loop

  

    Set HTMLDoc = IE.Document

 

    HTMLDoc.getElementById("q").Value = SearchText

   

    HTMLDoc.getElementById("f").submit

  

    'Wait for the response

    Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE

        DoEvents

        Loop

    GoogleAnalyzer HTMLDoc

    IE.Quit

    End Sub

Sub testFormSubmit()

    GoogleFormSubmit "tushar mehta"

    End Sub

 

 

For another example of this method see http://www.mrexcel.com/board2/viewtopic.php?t=171188&highlight=web+form+submit

Now that we know how the process works, we will generalize it to handle any form.  As it turns out there are some forms that submit themselves as soon the value of a field changes.  One such is the Yahoo! sports look up by baseball players by team.  To start visit http://sports.yahoo.com/mlb, select the Players link, then select any team from the AL and NL team lists (I selected the Los Angeles Angels), and on the subsequent page select any player (I selected pitcher Dustin Moseley).  That brings up the page http://sports.yahoo.com/mlb/players/7507.  Of course, depending on the team and player you select, the page URL may be different.  On this page, on the right side is a ‘Roster’ label and a drop down list that reads ‘Select Player.’  If one selects a player, a new page for the selected player shows up immediately.  If we look at the underlying HTML, we find

<form name="select_player">
Roster:
<select name="player" class="yspgens" onchange="go()">
<option value="">Select Player</option>
<option value="/mlb/players/5289;_ylt=AlssIOVEMIXxlfiTUyL4wV.FCLcF">Anderson, Garret</option>
<!-- list of other players -->
<option value="/mlb/players/7742;_ylt=AhBvxZChvMMOfmuVg_90L3SFCLcF">Willits, Reggie</option>
</select>
</form>
 

Note that the form has no submit button.  As soon as we select a player, the change event occurs and that results in the go() function being called.  The go function loads the new player’s information.  So, unlike the Google example, we don’t have a form’s submit method to call.  The first attempt at implementing this lookup programmatically would be to simply change the value of the control and check if the change event fires.  We already know enough so that we can easily change the value of the drop-down control.  Once we establish a reference to the document in the InternetExplorer window through the HTMLDoc variable, we can use

HTMLDoc.getElementById("player").Value = {new value}

 

If we use the above, we will discover nothing happens.  Changing the value programmatically doesn’t trigger the change event.  It turns out we have to do so ourselves with the FireEvent method.  The statement below invokes the FireEvent method and provides it with the name of the event to fire.

HTMLDoc.getElementById("player").FireEvent "change"

 

So, if we want to write a generic subroutine that loads any form on any page and then sends it to the server, we need to (a) set an arbitrary number of fields to the desired values, and (b) either use the form’s submit method or fire an appropriate event for a particular control.

The subroutine below navigates to a specified URL, uses the values in the ParamArray to load the indicated fields, and sends the form to the server.  It expects the field values to be in pairs with the first element in the pair indicating the field ID and the 2nd the value.  If the ParamArray contains an even number of elements then the last pair contains the ID of a field and the name of the event to be fired.  If, on the other hand, the ParamArray contains an odd number of elements then the last element is the name of the form that we want to submit.

Option Explicit

 

Dim IE As InternetExplorer, HTMLDoc As HTMLDocument

Sub genericFormSubmit(ByVal URL As String, ParamArray Params() As Variant)

    'This routine navigates InternetExplorer to the specified URL. _

     Then, it fills out the fields on that page using the information in _

     Params.  Info in Params should be in pairs, the first element _

     specifying the field name and the next the value. _

     The data are submitted to the website in one of two ways. _

     If the last piece of information in Params is a paired set, _

     the first element of that pair is the name of an element and the 2nd _

     element is the name of that element's event that is to be fired. _

     If the last piece of information is a single element (i.e., not _

     paired), it is the name of the form that is to be submitted.

   

    On Error Resume Next

    IE.Navigate URL

    If Err.Number <> 0 Then

        Set IE = Nothing

        Set IE = New InternetExplorer

        IE.Visible = True

        IE.Navigate URL

        End If

      

    Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE

        DoEvents

        Loop

  

    Set HTMLDoc = IE.Document

    Dim I As Integer

    On Error GoTo SkipParams

    If UBound(Params) < LBound(Params) Then GoTo SkipParams

    For I = LBound(Params) To UBound(Params) - 2 Step 2

        HTMLDoc.getElementById(Params(I)).Value = Params(I + 1)

        Next I

    If UBound(Params) Mod 2 = 0 Then 'Params contains an odd number _

                                      The last is the name of the _

                                      form to submit

        HTMLDoc.getElementById(Params(UBound(Params))).submit

    Else    'Even number of elements in Params.  The last pair _

             is the name of an element and its event to fire

        HTMLDoc.getElementById(Params(UBound(Params) - 1)) _

            .FireEvent Params(UBound(Params))

        End If

SkipParams:

    'Don't wait for the response

    End Sub

 

And, we would use it as below.  The first call to genericFormSubmit uses the event trigger mechanism to send the form.  The second uses the form’s submit method.

Sub testGenericFormSubmit()

    genericFormSubmit "http://sports.yahoo.com/mlb/players/5562", _

        "Player", "/mlb/players/5862;_ylt=ArnSMzAjw92RL7FfNWGghpOFCLcF", _

        "Player", "onchange"

    genericFormSubmit "http://www.google.com", _

        "q", "tushar mehta", _

        "f"

    End Sub

 

Summary

In this chapter we looked at different ways to access information over the Internet using both the InternetExplorer and XMLHttp ActiveX objects.  We also saw how to use both in synchronous and asynchronous mode.  In terms of the data format we looked at HTML, XML, and JSON data.  For HTML based pages, we looked into how to understand the contents so that we could programmatically interact with the HTML page.  When receiving XML data we looked at how to use a XSL stylesheet to convert the XML into either a HTML based webpage or a SpreadsheetML document that could be opened in Excel itself.  We also looked at how to mimic the submission of a form in a web page as a method to deal with very complex pages that we could not easily decipher.