Infinite opportunities?  Endless problems?  Limited resources?  Talk to us!
Operations
Consulting
Software
Consulting
Publishing and
Training
Applied
Solutions
Excel add-ins
and tutorials
Charts Excel and VBA
Case Studies
VBA
tutorials
PowerPoint
Add-Ins
Miscellenia
You are on the Home/Publications & Training/Case Studies/VBA & web services page

Using VBA to access the ’Net

Over the past few years there has been an increase in people who want to programmatically access information available over the Internet.  There are a few different ways of doing so with results available in a few different formats.  In this chapter I discuss some of the methods and formats.  The methods include the use of InternetExplorer and XMLHttp and the formats include the “human friendly” HTML as well as XML and JSON.  As part of the discussion of XML, I present a XSL stylesheet to convert a webservice XML output into a SpreadsheetML form that can then be opened directly by Excel.  By the end of the chapter, the reader will know how to develop fully functional code to support concurrent asynchronous VBA access to multiple web sites.

Separating the messenger from the message. 1

Processing the human friendly message. 2

The machine-friendly message. 4

InternetExplorer. 6

InternetExplorer in Action. 7

The InternetExplorer with Events. 8

XMLHttp. 8

Key XMLHttp properties and methods. 9

Using XMLHttp to retrieve data in a synchronous manner. 10

Using XMLHttp to retrieve data asynchronously. 11

Properly encapsulating XMLHttp requests. 12

Using XMLHttp to POST data to a website. 16

Using XMLHttp to retrieve data in XML and JSON formats. 18

Submitting a form.. 20

Summary. 23

Separating the messenger from the message

It is important to remember that the method by which one retrieves information (the messenger, so to say) is not always tied to the message (the information that is delivered).  An analogy would be information stored in a database and the same presented for human consumption either in a form or a report.  The former is typically a relational database, something that is not really human friendly.  The latter is easy for a person to read but no one would store the entire form in the database.  The usual approach is to programmatically merge data retrieved from the database into the form template and display the result.

About this site

What's new

Google
Web
This Site
 

 

Processing the human friendly message

In the Internet world what one sees when using a web browser is a human friendly result.  For example, a search of Google for "tushar mehta" yields:

 

An expanded view of the first two results looks like:

The human brain easily distinguishes between different results by noting the whitespace between the individual results.  The underlying code, HTML in the case of a webpage, caters to this human friendly result.  If we were to look at it, it would quickly become obvious that formatting is closely intertwined with content, which makes parsing difficult.  Even if we did take the time and effort to do so and we did indeed isolate the content, we would always remain vulnerable to the web page designers changing the layout.

While the HTML document may look like a stream of almost incomprehensible text, it does have a structure to it.  We can programmatically explore the structure through the DOM, i.e., the Document Object Model.  We will see how to leverage it to the extent possible.  One attribute that can be used to identify each individual element in the DOM is the ID attribute but it is optional and not all developers use it.

For example, if we were to look at the Google result corresponding to the above search – and the result is definitely not human friendly since Google removes all whitespace – we would, through painstaking research, discover that the actual results are wrapped up in DIV containers:

<div id=res>    <!--a-->
         <div>
                 <div class=g> <!--Individual result here --></div>
                 ...for however many results Google returns on 1 page...
         </div>
         <div>
                 <!-- more divs for the result navigation at page bottom -->
         </div>
</div>
<!-- code for Google Pack ad and rest of page footer -->
 

There’s a whole bunch of stuff before the <div id=res> tag that we can safely ignore.  From the above we can deduce that if we look for a DIV with an ID of RES, then the first DIV element in it would contain the results, each in its own DIV element.

Further analysis of the HTML would reveal that each individual result contains one link (<a href ="…") for the first line of the result and a table for the descriptive text on the 2nd line and the remaining links:

<div class=g>

       <!--m-->

       <link rel="prefetch" href="http://www.tushar-mehta.com/">

       <h2 class=r>

              <a href="http://www.tushar-mehta.com/" target=nw class=l

onmousedown="return clk(0,'','','res','1','')">Home -- TM Consulting

</a>

       </h2>

       <table border=0 cellpadding=0 cellspacing=0>

              <tr>

                     <td class="j">

                           <font size=-1><b>Tushar</b>-<b>Mehta</b>.com provides Operations

Strategy and Web Strategy consulting.<br>

                           <span class=a>www.<b>tushar</b>-<b>mehta</b>.com/ - 44k - </span>

                           <nobr>

                           <a class=fl href= <!--link details deleted -->>Cached</a>

                            -

                           <a class=fl href=<!--link details deleted-->>Similar pages</a>

                           <span class=bl> - <a <!--link details deleted-->>Note this</a>

</span>

                           </nobr>

                           </font><!--n-->

                     </td>

              </tr>

       </table>

</div>

From the above, we can figure out that the first <a> tag is the main link and the only cell in the table contains the descriptive text that Google shows with each result together with the some other links.  The following subroutine implements our discussion.  Given an object of HTMLDocument (the HTML document), it uses the getElementById method to get a reference to the DIV element (type HTMLDivElement ) with the ID of Res.  Then, it uses the getElementsByTagName method of the DIV object to get a collection (type HTMLElementsCollection) of all embedded DIV elements and the Item property of the Collection object to reference the first nested DIV element.  Next, it loops through all the DIV elements in that element.  For each of those DIV elements, it gets a reference to the first A element (type HTMLAnchorElement).  Finally, it prints the href and innerText properties of the anchor element.

Sub GoogleAnalyzer(HTMLDoc As HTMLDocument)

    'Google returns each result as an individual DIV element _

     inside the first DIV element inside a DIV element with an _

     ID of Res. _

     Each result's DIV element has several hyperlinks _

     (<a href=...> tags) the first one containing the actual _

     hyperlink to the website

    Dim ResultDIV As HTMLDivElement, _

        AllResultsDIV As HTMLDivElement, _

        OneResultDIV As HTMLDivElement

    Set ResultDIV = HTMLDoc.getElementById("Res")

    Set AllResultsDIV = ResultDIV.getElementsByTagName("DIV").Item(0)

    For Each OneResultDIV In AllResultsDIV.getElementsByTagName("DIV")

        Dim AnAnchor As HTMLAnchorElement

        Set AnAnchor = OneResultDIV.getElementsByTagName("a").Item(0)

        Debug.Print AnAnchor.href & ", " & AnAnchor.innerText

        Next OneResultDIV

    End Sub

 

Of course, we still don’t know that those links contain and it would require some more processing if we wanted to identify a specific link.

By contrast, if we queried Yahoo! for a stock quote, it provides an ID attribute for each field, which makes it very easy to extract information from the DOM.  For example, price information for Microsoft (symbol MSFT) returns, buried amidst other HTML tags,

<small>At <span id="yfs_t10_msft">10:27AM ET</span>: </small>
<big><b><span id="yfs_l10_msft">29.69</span></b></big>
 

Essentially, the time and the quote both are in a SPAN element with a unique ID attribute.  As the code sample below demonstrates, this makes it a lot easier to extract information from the DOM.

Sub getYahooStockQuote(HTMLDoc As HTMLDocument, StockSymbol As String)

    Debug.Print HTMLDoc.getElementById("yfs_l10_" & StockSymbol).innerText

    End Sub

 

The machine-friendly message

 An alternative to the above messages would be to get the result in a computer-friendly format.  Typically, this would not contain formatting information and restrict itself to content.  In the context of the web, the most common such format is XML with JSON (Javascript Simplified Object Notation) often used as an easier-to-deal-with alternative.

In a subsequent section, we will look at how we would process information in XML format.  For the time being, we will just see the result of using a Yahoo! web service to query for “tushar mehta” first in XML and then in JSON.

Programmatically, one can query a Yahoo! web service with the url

"http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=tushar+mehta&results=2"

with the corresponding result:

<?xml version="1.0" encoding="UTF-8"?>

<ResultSet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

xmlns="urn:yahoo:srch"

xsi:schemaLocation="...long url deleted..." type="web"

totalResultsAvailable="353000"

totalResultsReturned="2"

firstResultPosition="1"

moreSearch=”...url deleted...">

<Result>

<Title>Home -- TM Consulting</Title>

<Summary>Tushar-Mehta.com provides Operations Strategy and Web Strategy consulting ... You are on the Home page. Welcome to Tushar Mehta Consulting. Multi-disciplinary. Skills ...</Summary>

<Url>http://www.tushar-mehta.com/</Url>

<ClickUrl>...long url deleted...</ClickUrl>

<DisplayUrl>www.tushar-mehta.com/</DisplayUrl>

<ModificationDate>1179212400</ModificationDate>

<MimeType>text/html</MimeType>

<Cache>

<Url>...cache url deleted...</Url>

<Size>44466</Size>

</Cache>

</Result>

 

<Result>...2nd result deleted...

</Result>

</ResultSet>

 

While the result may not be pretty to look at, it is a lot easier to programmatically analyze the above XML than it is to deal with the HTML DOM.

Yahoo! will also return JSON, something that we will explore only briefly.  While JSON may not be very useful in VB, it is elegantly easy to use in JavaScript.

{"ResultSet":

{"type":"web",

"totalResultsAvailable":353000,

"totalResultsReturned":2,

"firstResultPosition":1,

"moreSearch":"\/WebSearchService\/V1\/webSearch?query=tushar+mehta&amp;appid=YahooDemo&amp;region=us",

"Result":[

{"Title":"Home -- TM Consulting",

"Summary":"Tushar-Mehta.com provides Operations Strategy and Web Strategy consulting ... You are on the Home page. Welcome to Tushar Mehta Consulting. Multi-disciplinary. Skills ...",

"Url":"http:\/\/www.tushar-mehta.com\/",

"ClickUrl": ...ClickUrl deleted...",

"DisplayUrl":"www.tushar-mehta.com\/",

"ModificationDate":1179212400,

"MimeType":"text\/html",

"Cache":

{"Url": ...cache url deleted...,

"Size":"44466"}

},

{"Title":"tushar-mehta.com - Excel",

...details of 2nd result deleted

}

]

}

}

Essentially, the JavaScript Eval function converts the above into an associative array (something akin to a VB Collection or a Windows Scripting Dictionary).  An added benefit is that in JavaScript an associative array is also an object with each index becoming a property!

So far, we have seen both human friendly and machine friendly messages.  We also know that it is possible to programmatically extract information from each – albeit with varying degrees of difficulty.  Next, we look at the messengers.

InternetExplorer

Most people exploring the automation of web services probably start with the ActiveX control InternetExplorer (or its close cousin the WebBrowser, something that can be embedded in an userform).  Automating it is very similar to automating any Office product like PowerPoint, Word, or Excel itself.  The object has properties and methods with which one controls the object.  The code below instantiates InternetExplorer and navigates to a URL.  It isn’t quite a general purpose routine yet but we will address that later.  After navigating to the desired URL, the code ensures that the target page is loaded before proceeding to the analysis step.  Then, it uses a custom analyzer routine if the web page was a Google page or a Yahoo! Finance page.  Finally, it carries out necessary housekeeping.

Sub getResultsWithIE(ByVal URL As String)

        'URL contains the complete string to pass to IE, _

         e.g., http://www.google.com/search?q=tushar+mehta

    Dim IEApp As InternetExplorer   'Need a reference to Microsoft Internet Controls

    Dim HTMLDoc As HTMLDocument   'Need a reference to Microsoft HTML Object Library

   

    Set IEApp = New InternetExplorer

    IEApp.Visible = True    'Some things don't work unless it's visible

   

    IEApp.Navigate URL

   

    'Since the web page loads in parallel to this program, _

     i.e., asynchronously, we need to wait for it to complete loading

    Do

        DoEvents

        Loop Until IEApp.readyState = READYSTATE_COMPLETE

   

    'The IE application contains the HTML webpage as an object _

     in it.  The next statement creates a reference to it

    Set HTMLDoc = IEApp.Document

   

    'Use a custom analyzer module if we have one

    If InStr(1, URL, "google.com", vbTextCompare) > 0 Then

        GoogleAnalyzer HTMLDoc

    ElseIf InStr(1, URL, "finance.yahoo", vbTextCompare) > 0 Then

        getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

        End If

    IEApp.Quit

    Set IEApp = Nothing

    End Sub

And it would be used like so:

Sub testIE()

    getResultsWithIE ("http://www.google.com/search?q=" & Escape("tushar mehta"))

    End Sub

 

Notice the use of the Escape function?  When we submit data over the Internet, several characters cannot be sent as-is but must be encoded.  This is to avoid communication errors.  Languages, such as JavaScript, which are designed for web work, support Escape as a native function.  VB, on the other hand, does not and we must “roll our own.”

Function Escape(ByVal URL As String) As String

        'URLs cannot contain most special characters. _

         VBScript and JavaScript have built-in Escape functions. _

         In VB we have to write our own

    Dim I As Integer, BadChars As String

    BadChars = "<>%=&!@#$^()+{[}]|\;:'"",/?"

    For I = 1 To Len(BadChars)

        URL = Replace(URL, Mid(BadChars, I, 1), "%" & Hex(Asc(Mid(BadChars, I, 1))))

        Next I

    URL = Replace(URL, " ", "+")

    Escape = URL

    End Function

 

For another example of using InternetExplorer see Dick Kusleika’s
http://www.dicks-blog.com/archives/2004/04/30/translating-text/

InternetExplorer in Action

In the above section, the code that instantiated IE waited for it to fully load a page (with the Do…Loop statement).  This is because IE runs asynchronously with our VB code.  There can be instances when we don’t want to wait for the page to complete loading.  In fact, we can leverage both its asynchronous nature as well as the fact that IE is a standalone application designed to display webpages to a human.  Consider a scenario where we want to respond to user action by displaying information from different web pages.  For example, in an Excel worksheet with a column of stock symbols, as the user clicks in a different cell we want to show information about that stock.  The code to show the information in an instance of InternetExplorer is below.  It checks if it still has access to the asynchronously running IE.  If not, it instantiates a new copy.  In either case it uses InternetExplorer’s Navigate method to direct the browser to a specific web page.

Option Explicit

 

Dim IEApp As InternetExplorer   'Need a reference to Microsoft Internet Controls

Sub AsyncIE(ByVal URL As String)

        'URL contains the complete string to pass to IE, _

         e.g., http://www.google.com/search?q=tushar+mehta

    On Error Resume Next

    IEApp.Navigate URL

    If Err.Number <> 0 Then

        Set IEApp = Nothing

        Set IEApp = New InternetExplorer

        IEApp.Visible = True    'Some things don't work unless it's visible

        IEApp.Navigate URL

        End If

    If IEApp Is Nothing Then Exit Sub

    End Sub

 

In the worksheet, column B has stock symbols.  The worksheet SelectionChange event procedure below calls the AsyncIE procedure if it can identify a legitimate stock symbol.

Option Explicit

 

Private Sub Worksheet_SelectionChange(ByVal Target As Range)

    Dim StockCell As Range

    Set StockCell = Application.Intersect(Target.EntireRow, Target.Parent.Columns(2))

    If StockCell Is Nothing Then Exit Sub

    If StockCell.Cells.Count > 1 Then Exit Sub

    If IsEmpty(StockCell.Value) Then Exit Sub

    AsyncIE _

            "http://finance.yahoo.com/q?s=" & StockCell.Value

    End Sub

 

Obviously, one is not restricted to Yahoo! Finance and stock prices.  One could use the above to look up clients through an Intranet application.

The InternetExplorer with Events

In the first section on InternetExplorer the VB code waited until IE completely loaded the web page.  It was only after that that it proceeded to the analysis phase.  While that technique is OK for demonstration purposes it suffers from two potential problems.  First, while the VB code is waiting so must the user.  And, it could be a long wait.  The alternative is to leverage the events of the InternetExplorer object.  Just like Excel and other Office programs, IE supports several different events.  We will look at only one, the DocumentComplete event.  First in a class module named clsIE, we declare IE as a ‘WithEvents’ object, use the class Initialize event to instantiate IE, if necessary, add a Navigate method, and a DocumentComplete event procedure that will analyze the web page.

Option Explicit

 

Dim WithEvents IE As InternetExplorer

 

Private Sub Class_Initialize()

    On Error Resume Next

    IE.Visible = True

    If Err.Number <> 0 Then

        Set IE = New InternetExplorer

        IE.Visible = True

        End If

    End Sub

Public Sub Navigate(ByVal URL As String)

    IE.Navigate URL

    End Sub

Private Sub IE_DocumentComplete(ByVal pDisp As Object, URL As Variant)

    MsgBox TypeName(pDisp)

    Dim HTMLDoc As HTMLDocument

    Set HTMLDoc = pDisp.Document

    If InStr(1, URL, "google.com", vbTextCompare) > 0 Then

        GoogleAnalyzer HTMLDoc

    ElseIf InStr(1, URL, "finance.yahoo", vbTextCompare) > 0 Then

        getYahooStockQuote HTMLDoc, Mid(URL, InStr(1, URL, "=") + 1)

        End If

    End Sub

With the event procedure available to us, the code that responds to the user no longer has to wait for IE to complete loading the page.  It is now drastically simplified to:

Option Explicit

 

Dim oIE As clsIE

 

Sub demoIEWithEvents()

    Set oIE = New clsIE

    oIE.Navigate "http://www.google.com/search?q=" & Escape("tushar mehta")

    End Sub

Another advantage of this approach is that the client code (the demoIEWithEvents procedure above) does not have any IE specific code in it.  All the IE related code is encapsulated in the clsIE class module.

One can also trap events raised by objects inside the browser.  For more see http://msdn2.microsoft.com/en-US/library/aa752045.aspx.

XMLHttp

Use of InternetExplorer, as in the previous section, has four drawbacks.

First, it requires instantiating a separate application that was designed to display a web page to the user.

Second, and as a consequence, once the webpage loads we had to parse the DOM to extract the information of interest.  We would have a hard time interacting with a webservice that provides information in a computer-friendly form (XML, for example).

Third, the application becomes visible to the user however briefly.

Fourth, all the information we sent to the web server was in the URL itself.  Whatever parameters we wanted to pass (in the case of the Google example the search string and in the case of the Yahoo! example the stock symbol) were suffixed to the URL itself.  This is the same as using the GET method to process a client form.  The alternative is to use the POST method.  In this case, the parameters are in the message body.  So, what happens if we have a server that expects an incoming message to use the POST method?  It is possible to automate InternetExplorer in this case (see Submitting a form) but the processing could get somewhat convoluted. 

We can address our concerns by using the XMLHttp object.  This ActiveX object was initially introduced by Microsoft and is one of the core components of AJAX.  The web browsers Firefox, Opera, and Safari also support this capability.  Of course, in this document we are interested in how we can use VB to interact with it.

In the next section we will look at key properties and methods of the XMLHttp object.  Then, we will look at four different scenarios of using the object.  First, we will duplicate the InternetExplorer example from above.  Second, we will process the server response asynchronously.  Third, we will use the POST method to retrieve data from a server.  Fourth, and finally, we will retrieve data in two different machine-friendly forms, XML and JSON.

Key XMLHttp properties and methods

The first step in using the XMLHttp object is to use the Open method to establish key parameters.

XMLHttp.open(strMethod, strUrl, varAsync, strUser, strPassword)

The open method opens a connection to the server but it doesn’t actually send any information across.

strMethod specifies the GET or POST method

strUrl is the URL of interest,

varAsync indicates whether the response will be process asynchronously (varAsync=TRUE) or synchronously,

and strUser and strPassword are for authentication.

 

Next, in the case of an asynchronous request, we specify the event handler.  This requires some special processing in VBA and is explained in a subsequent section.

XMLHttp.onreadystatechange = funcMyHandler

If the XMLHttp response is to be processed asynchronously (see the open method above), the onreadystatechange property specifies the event handler for the readystatechange event.  In VB this requires some special handling that we will discuss in a later section

 

Next, we send the request to the server with the Send method.

XMLHttp.send(varBody);

The send method actually sends the message to the server.  varBody is an optional message text.  It contains the parameters for the POST method.

 

In the case where we are processing the response asynchronously, we need to monitor the readyState of the request.

XMLHttp.readyState

The read-only readyState represents the state of the request.  When the request is complete, readyState will have a value of 4.

 

When the process is complete, use the status property to check how it was completed.

XMLHttp.status

The read-only status represents the HTTP status code returned by the request.  A value of 200 indicates it was processed successfully.  For other codes see the link below.

 

If the returned response is a HTML document or a JSON string, use the responseText property to retrieve it.

XMLHttp.responseText

The read-only responseText property contains the text of the response as a string.

 

On the other hand, if the returned response is a XML document, use the responseXML property to retrieve the XML object.

XMLHttp.responseXML

The read-only responseXML property returns an object containing the parsed XML document.  See a later section on how to process the contents.

 

With the introduction to the key properties and methods complete, we can now move on to the first of our XMLHttp requests.  For additional documentation on the XMLHttp object visit the Microsoft link at http://msdn2.microsoft.com/en-us/library/ms759148.aspx

Using XMLHttp to retrieve data in a synchronous manner

This, the first use of XMLHttp, essentially duplicates the InternetExplorer capability we saw above.  The code to use the ActiveX object is below.  Notice how much simpler it is compared to the InternetExplorer version.  The Open method’s 3rd argument is False to indicate the request should be processed synchronously.  This will make our code wait at the Send until the request completes processing.  Once that happens, the webpage’s HTML is in the responseText string property.  To use the HTMLDocument object, we simply create one and assign its body’s innerHT