![]() |
A Beginner's Guide to the XML DOM
Brian Randell
You are a Visual Basic® developer and you receive some data in the form of an XML document. You now want to get the information from the XML document and integrate that data into your Visual Basic solutions. You could of course write code yourself to parse the contents of the XML file, which after all is just a text file. However, this isn’t very productive and negates one of the strengths of XML: that it is a structured way to represent data. A better approach to retrieving information from XML files is to use an XML parser. An XML parser is, quite simply, software that reads an XML file and makes available the data in it. As a Visual Basic developer you want to use a parser that supports the XML Document Object Model (DOM). The DOM defines a standard set of commands that parsers should expose so you can access HTML and XML document content from your programs. An XML parser that supports the DOM will take the data in an XML document and expose it via a set of objects that you can program against. In this article, you will learn how to access and manipulate XML documents via the XML DOM implementation, as exposed by the Microsoft® XML Parser (MSXML.DLL). Before you read any further, you should look at a raw XML file to get an idea of how a parser can make your life easier. The following text exposes the content of the file CDS.XML that contains compact disc items. Each item contains information such as the artist, title, and tracks.
The second line of the previous document references an external DTD or Document Type Definition file. A DTD defines the layout and expected content for a particular type of XML document. An XML parser can use a DTD to determine if a document is valid. DTDs are just one way you can help a parser validate your documents. Another increasingly popular method to validate documents is XML Schemas. You define schemas using XML in contrast to DTDs, which use their own "interesting" syntax. The following text displays the contents of CDS.DTD used by CDS.XML:
This article won’t get into any depth on DTDs and XML Schemas. The XML Schema What exactly is a DOM?A DOM for XML is an object model that exposes the contents of an XML document. The W3C's Document Object Model (DOM) Level 1 Specification currently defines what a DOM should expose as properties, methods, and events. Microsoft's implementation of the DOM fully supports the W3C standard and has additional features that make it easier for you to work with XML files from your programs. How do I use the XML DOM?You use the XML DOM by creating an instance of an XML parser. To make this possible, Microsoft exposes the XML DOM via a set of standard COM interfaces, in MSXML.DLL. MSXML.DLL contains the type library and implementation code for you to work with XML documents. If you're working with a scripting client, such as VBScript executing in Internet Explorer, you use the DOM by using the CreateObject method to create an instance of the parser object.
If you are using VBScript from an Active Server Page, you use Server.CreateObject.
If you're working with Visual Basic you can access the DOM by setting a reference to the MSXML type library, provided in MSXML.DLL. To use MSXML from within Visual Basic 6.0, open the Project References dialog box and select Microsoft XML, version 2.0 from the list of available COM objects. If you do not find this item, you'll need to obtain the MSXML library. You can then create an instance of the parser object.
Where can you find MSXML.DLL? You can obtain the MSXML library in one of two ways. You can install Internet Explorer 5.0—the MSXML parser is an integral component. Alternatively, you can download a redistributable version of the Microsoft XML parser. Once you reference the type library in your Visual Basic project, invoke the parser, load a document, and party on the document. What am I working with?If you open the MSXML library and examine its object model using the Visual Basic 6.0 Object Browser, you see that the object model is quite rich. This article demonstrates how you can access an XML document using the DOMDocument class and the IXMLDOMNode interface. How do I load a document?To load an XML document, you must first create an instance of the DOMDocument class.
Once you obtain a valid reference, open a file, using the Load method. The MSXML parser can load XML documents from a local disk, over the network using UNC references, or via a URL. To load a document from disk create the following construct using the Load method:
Once you are finished with the document, you need to release your object reference to it. The MSXML parser does not expose an explicit Close method. The best you can do is explicitly set the reference to Nothing.
When you ask the parser to load a file, it does so asynchronously by default. You can change this behavior by manipulating the document's Boolean async property. It is important that you examine a document's readyState property to ensure a document is ready before you start to examine its contents. The readyState property can return one of five possible values as listed below:
The MSXML parser exposes events that you can use when loading large documents to track the status of the load process. These events are also useful when loading a document from a URL over the Internet asynchronously. To open a file from a URL you specify the location of the file using a fully formed URL. You must include the http:// prefix to the file location. Here is an example of loading a file from a URL:
By setting the document's async property to False, the parser will not return control to your code until the document is completely loaded and ready for manipulation. If you leave it set to True, you will need to either examine the readyState property before accessing the document or use the DOMDocument's events to have your code notified when the document is ready. Dealing with FailureYour document can fail to load for any number of reasons. A common cause might be that the document name passed to the Load method is invalid. Another cause might be that the XML document itself is invalid. By default, the MSXML parser will validate your document against a DTD or schema if either has been specified in the document. You can tell the parser not to validate the document by setting the validateOnParse property of the DOMDocument object reference before you invoke the load method.
Be forewarned that turning off the parser's validation feature is not a good idea in production applications. An incorrect document can lead to your program failing for any number of reasons. At a minimum, it could provide invalid data to your users. Regardless of the failure type, you can ask the parser to give you information about the failure, by accessing the parseError object. Set a reference to the IXMLDOMParseError interface of the document itself in order to work with the properties of the parseError object. The IXMLDOMParseError interface exposes seven properties that you can use to investigate the cause of the error. The following example will display a message box and all the error information available from the parseError object.
You can use the information exposed by the parseError object to display this information to the user, log it to an error file, or try to correct the error yourself. Retrieving Information From an XML DocumentOnce you have a document loaded, the next step is for you to retrieve information from it. While the document object is important, you will find yourself using the IXMLDOMNode interface most of the time. You use the IXMLDOMNode interface to read and write to individual node elements. Before you do anything, you need to understand that there are currently 13 node types supported by the MSXML parser. The following tables lists a few of the most common node types you will encounter.
You access the node type via two properties exposed by the IXMLDOMNode interface. The nodeType property exposes an enumeration of DOMNodeType items (some of which are listed in the previous table). In addition, you can use the nodeTypeString to retrieve a textual string for the node type. Once you have a reference to a document, you can start walking the node hierarchy. From your document reference, you can access the childNodes property, which gives you a top down entry point to all of the nodes in your document. The childNodes property exposes the IXMLDOMNodeList which supports the Visual Basic For/Each construct. Thus, you can enumerate all of the individual nodes of the childNodes property. In addition, the childNodes property exposes a level property, which returns the number of child nodes that exist. Not only does the document object expose a childNodes property, but all individual nodes do also. This, in conjunction with IXMLDOMNode's hasChildNodes property, makes it easy for you to walk the node hierarchy examining elements, attributes, and values. One thing to be aware of is the parent-child relationship between a document element and the element's value. For example, in the CDs XML document, the element <title> exposes a song title. To retrieve the actual value of the <title> element, you need to look for nodes of the type NODE_TEXT. Once you've found a node with some interesting data, you can examine attributes and even reach up and access its parent node via the parentNode property. How do I Walk a Document?You walk an XML document by traversing the set of nodes exposed by the document object. Because XML documents are hierarchical in nature, it is relatively easy to write a recursive routine to walk the entire document. The following routine LoadDocument opens an XML document. LoadDocument then calls another routine, DisplayNode, which actually walks the document. LoadDocument passes a reference to the currently open XML document's childNodes property as a parameter and an integer value specifying where to start the indent level. The code uses the Indent parameter to format the display of the text in the Visual Basic Immediate Window of the document structure. The function DisplayNode walks the document looking specifically for nodes of the type NODE_TEXT. Once the code finds a node of the type NODE_TEXT, it retrieves the text of the node using the nodeValue property. In addition, the parentNode property of the current node is used to get a to get a back-reference to a node of the type NODE_ELEMENT. Nodes of the type NODE_ELEMENT expose a nodeName property. The contents of nodeName and nodeValue are displayed. If a node has children, determined by checking the hasChildNodes property, then DisplayNode calls itself recursively until it reaches the end of the document. The DisplayNode routine writes the information to Visual Basic's Immediate window using Debug.Print.
DisplayNode uses the hasChildNodes property to determine if it should call itself again. You could also use the node's level property and check for a value greater than 0. Now What?This article is just a teaser. You are now ready to dig deeper and expand your knowledge of XML and the MSXML parser. You can do many interesting things like update values of individual node items, search with-in a document, build your own documents, and more. Visit the MSDN Online XML Developer Center
|
|||||||||||||||||||||||||
© 1999 Microsoft Corporation. All rights reserved. Terms of Use. |