ORIGINAL DRAFT

Few developers have implemented SAX content handlers without having to manage some notion of the current tag across SAX events. Context can be handled in various ways, ranging from a currentTag instance variable to the use of a stack to push and pop the current tag as startElement and endElement events are processed. Having access to the current tag is especially important important when deciding what to do with text content. Unfortunately, adhoc approaches can get pretty messy as the software evolves. A better solution involves reusable code that abstracts out the need to handle tag context.

This month, we’re going to invent a simple event-driven mechanism that abstracts basic SAX events through an XMLEventDispatcher. The dispatcher will send events to an XMLEventHandler which defines the following interface:

public interface XMLEventHandler
{
  public void handleDocStart();
  public void handleTagStart(
    XMLEventContext context);
  public void handleAttribute(
    XMLEventContext context,
    Attributes attrs);
  public void handleText(
    XMLEventContext context,
    StringBuffer text);
  public void handleTagEnd(
    XMLEventContext context);
  public void handleDocEnd();
}

This interface closely mimics the structure of a SAX ContentHandler. The primary differences are the use of a XMLEventContext object to maintain tag path contexts and the use of a StringBuffer in the handleText method. For performance reasons, the StringBuffer is reused on each call, so you’ll want to handle text events by calling the toString method on the StringBuffer to get a String object.

There are six classes in this project, including a test class and example XMLEventWriter that implements the XMLEventHandler interface. Figure 1 shows the relationship between classes.

Figure 1: The XMLEventHandler abstraction
relies on an XMLEventDispatcher to translate SAX ContentHandler events to context-sensitive
XMLEventHandler events.

Figure 1: The XMLEventHandler abstraction relies on an XMLEventDispatcher to translate SAX ContentHandler events to context-sensitive XMLEventHandler events.

The XMLEventDispatcher class extends the SAX DefaultHandler class and delegates it’s calls to the specified XMLEventHandler. The XMLEventContext object uses a Java Stack instance to manage a list of XMLEventElement objects that represent tag information. Since the Stack implementation extends the Java Vector class, we also have direct access to individual elements on-demand. The XMLEventElement object stores URI, Local Name and Qualified Name values for each tag on the stack.

Since XMLEventElement is little more than a container for values, akin to structures in most computer languages, we can safely ignore them. We’ve already defined the XMLEventHandler interface, so our focus will be on the XMLEventDispatcher class, which handles event delegation, and the XMLEventContext implementation, which handles the tag path context. Naturally, all the code from this project is available only at xmlmag.com.

Listing 1 shows the code for the XMLEventContext class. I mentioned earlier that we use the Java Stack class to manage the tag path. The first method is a facade for the stack’s size method. We want to be able to push tags onto the stack and pop them during processing. Tags are stored as XMLEventElement objects, which contain a URI, local and qualified name for each tag, consistent with the arguments provided by the SAX startElement and endElement methods. To simplify things, our push method takes these elements as a separate strings and creates a XMLEventElement instance to add to the stack.

For convenience we provide a getTag and a getLastTag method. The first retrieves a XMLEventElement object at the given index position. The second retrieves the last element in the path. These are both common operations. Using this approach, you can build and decompose paths made up of XMLEventElement objects. The context is important in most applications and provided transparently to anyone implementing the XMLEventHandler interface.

Listing 2 shows the XMLEventDispatcher implementation. Most of the work here is in startElement, endElement and character methods, which are part of the SAX ContentHandler interface. The startDocument and endDocument methods delegate their processing directly to the XMLEventHandler’s handleDocStart and handleDocEnd methods.

The startElement method uses the URI, local and qualified name arguments to push a new XMLEventElement onto the XMLEventContext stack. The current XMLEventContext is maintained in an instance variable called context, which is used for the first time in the handleTagStart method. We pass the Attributes reference directly to the handleTagStart method for processing.

Both the endElement and character methods are very simple. The endElement method calls the XMLEventHandler handleEndTag method and then pops the current tag off the tag path. We pop the context only after sending the endElement event intentionally, because the context is still relevant until after the method returns.

The characters method resets the StringBuffer length to zero and uses the append method to add the relevant character array segment from the arguments provided by SAX. The buffer, along with the current context, are then sent to the XMLEventHandler’s handleText method. You’ll want to avoid keeping references to the StringBuffer in your code since it is heavily reused and may contain different information the next time you look.

When you download the code, you’ll find a simple XMLEventWriter class that implements the XMLEventHandler interface. This class merely writes the output as an XML document that looks remarkably like the input document. It exists primarily as a test to make sure nothing is lost along the way, but it also shows how you might use an XMLEventHandler to reference context information without having to write your own stack or alternate solution.

In complex development environments, determining where you are (relative to XML tags) as SAX events are being processed is critically important. Being able to separate context management from the code you are writing is a big advantage in both maintainability and code reuse. This strategy is simple and reusable. In fact, the use of a stack to manage context is so common in SAX programming that having a set of classes that do the work for you can save you considerable time and effort. I hope you can benefit from these ideas in your own application.

Listing 1

import java.util.*;

public class XMLEventContext
{
  protected Stack tagStack = new Stack();

  public int getElementCount()
  {
    return tagStack.size();
  }
  
  public void push(String uri,
    String localName, String qualifiedName)
  {
    tagStack.push(new XMLEventElement(
      uri, localName, qualifiedName));
  }
  
  public XMLEventElement pop()
  {
    return (XMLEventElement)tagStack.pop();
  }
  
  public XMLEventElement getTag(int index)
  {
    return (XMLEventElement)tagStack.get(index);
  }
  
  public XMLEventElement getLastTag()
  {
    if (tagStack.size() == 0) return null;
    return (XMLEventElement)tagStack.peek();
  }
}

Listing 2

import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class XMLEventDispatcher
  extends DefaultHandler
{
  protected StringBuffer stringBuffer = new StringBuffer();
  protected XMLEventContext context = new XMLEventContext();
  protected XMLEventHandler handler;
  
  public XMLEventDispatcher(XMLEventHandler handler)
  {
    this.handler = handler;
  }
  
  public void startDocument()
  {
    handler.handleDocStart();
  }
  
  public void endDocument()
  {
    handler.handleDocEnd();
  }
  
  public void startElement(String uri,
    String localName, String qualifiedName,
    Attributes attrs)
  {
    context.push(uri, localName, qualifiedName);
    handler.handleTagStart(context, attrs);
  }
  
  public void endElement(String uri,
    String localName, String qualifiedName)
  {
    handler.handleTagEnd(context);
    context.pop();
  }
  
  public void characters(char[] chars, int offset, int length)
  {
    stringBuffer.setLength(0);
    stringBuffer.append(chars, offset, length);
    handler.handleText(context, stringBuffer);
  }
}