ORIGINAL DRAFT

XML is quickly becoming a primary standard for messaging systems. One of the great advantages of message-based architectures is the loose coupling and scalable nature of these systems. Messages can be queued, processed in parallel and routed to different services. This month, we’re going to look at routing XML messages. Much like packet routing, we’ll take a look at the envelope of a message and move a message to a given target based on some simple rules for matching tags and attributes.

While other approaches are possible, we’ll assume that the first tag in a message, the root tag, will be used to decide where the document should be routed. The same approach can be expanded to apply more complex rules using a similar technique. I think you’ll find this article a suitable starting point if this turns out to be important to you.

To maximize flexibility, we’ll develop a set of classes that allow you to associate tag specifications with a routing target. We’ll use a RoutingTable class to hold these associations and a RoutingSerializer to read from an XMLReader and push the output to a RoutingTarget. RoutingTarget is actually an interface so we’ll implement a ConsoleTarget, FileTarget and a SocketTarget class to show how easy it is to define where your messages can be routed. In practice, you may want to wrap this code with a simple proxy server that uses socket connections to move messages to suitable message processors.

Before we take a closer look at the code, let’s consider a couple of example scenarios. The first scenario is a large-scale backroom operation that needs to move messages to different processors based on the type of each message. The type of message can be determined by the root tag and a routing proxy server can accepts socket connections and send messages to target processors through a streaming socket channel. Because of the stream-based nature of our implementation, the proxy router never needs to build a document in memory and can fluidly push the output to the right target with minimal processing overhead.

Figure 1: Message routing with a proxy router;
shows how documents or messages might be routed based on the main tag in the XML document or message.

Figure 1: Message routing with a proxy router; shows how documents or messages might be routed based on the main tag in the XML document or message.

Figure 1 shows three types of message documents and three types of processors. When the XML Router recognizes the first tag, it immediately activates a suitable target writer (in this case a socket streams) and sends the document to an appropriate processor. This is a powerful mechanism for sorting documents, archiving to different locations, storing or processing the documents in various ways, modifying them or broadcasting to different locations, repurposing documents to different mediums such as fax, html, wml, etc.

Another scenario takes us to the other end of the spectrum, on a much smaller scale. XML routing streams can be used to decide in which directory documents should be stored, which viewer or editor needs to be used in a visual application, what templates should be applied for a presentation layer, and so on. In other words, the princiles can be applied to aspects of a large scale backroom operations as easily as elements of a standalone desktop application.

Figure 2: XMLRouter classes. The RoutingSerializer uses 
a RoutingTable that maps RoutingKey objects to RoutingTarget implementations, threes of which are provided for
demonstration purposes.

Figure 2: XMLRouter classes. The RoutingSerializer uses a RoutingTable that maps RoutingKey objects to RoutingTarget implementations, threes of which are provided for demonstration purposes.

With that in mind, let’s take a closer look at the code. We won’t be able to cover each class in detail, so let’s start with an overview and then dive in to the central classes. Figure 2 shows the class relationships in this project. The XMLSerializer is a simple mechanism for writing out an XML document in text form. It implements the SAX ContentHandler interface.

We use a RoutingTarget interface to set up a Writer that will receive a routed XML document. I’ve provided a ConsoleTarget, which directs output to the console, a SocketTarget class that shows how the interface can be used to send routed documents to a socket stream, as well as a FileTarget class that uses a FileWriter to target a file.

The most important class is the RoutingSerializer, which actually sends its output to a given RoutingTarget, depending on the first tag it reads. More ellaborate routing is certainly possible but would require read-aheads, buffering and more complex logic, so I tried to keep it simple to illustrate the technique without clouding the logic. If your needs are more complex, you can still use this as a starting point. We’ll take a closer look at the RoutingSerializer class in a moment.

The RoutingTable and associated RoutingKey and RoutingTag classes are designed to associated a tag description with a RoutingTarget. The RoutingKey is complete, storing all the parameters you’ll find in a given SAX startElement method call, including the namespace, local and tag name and possible attributes. A RoutingTag is more specific and includes only the tag name. In effect, the RoutingSerializer uses the RoutingKey to match values with the RoutingTag in a HashMap. When a match is found, the RoutingTarget is returned and the serializer sends subsequent output to the specified Writer.

The RoutingTarget interface is very simple and looks like this:

public interface RoutingTarget
{
  public Writer getWriter();
}

The ConsoleTarget class wraps a OutputStreamWriter around System.out while the SocketTarget class creates a socket connection and wraps an OutputStreamWriter around the socket’s OutputStream. In both cases, the getWriter method returns the OutputStreamWriter. The RoutingTable class is also straight forward, providing a mapping between a RoutingKey and a RoutingTarget. It provides a few type-specific accessors and supports the definition of a default target, in case no mapping is available at lookup time.

Since the RoutingKey and RoutingTag classes are simple enough, we’ll focus our attention on the RoutingSerializer and the XMLRouter classes, the latter demonstrating how you can use the RoutingSerializer.

The XMLSerializer class in Listing 2, is based on a class that was originally part of the Xerces distribution. Unfortunately, somewhere along the way this class became private, thus blocking this approach. By the time I found this problem I was pretty close to a publication deadline, so I wrote a simple version that doesn’t necessarily preserve all the details of a complete XML documents, though it is certainly sufficient for typical documents.

If you take a look at the source code, you’ll see that we’re extending the standard DefaultHandler and overriding the startDocument and endDocument methods to writer the XML heading and flush the output, respectively. We implement the startElement and endElement methods and write tags and any attributes to the writer and the characters method to write out the document’s content. Tags are indented based on the current depth so that the output remains readable.

Let’s take a closer look at the RoutingSerializer class, which extends XMLSerializer, in Listing 2. The constructor expects a RoutingTable argument, from which it gets the default RoutingTarger, which in turn determines the Writer to send output to. This writer is set using the setWriter method. You’ll notice I left a few print statements in the code to show where the routing is going. You can turn these off by setting the debug instance variable value to false or remove them altogether if you like.

Since the XMLSerializer handles writing the XML header tag when startDocument is called, flushes the writer when endDocument is called, and writes the tag (and any attribute names and values in the case of a start tag) when the startElement and endElement methods are called, we only need to handle one method explicitly to intercept the routing information we need to switch targets.

The startElement method checks the depth variable, which is defined in the parent class, for the first tag, where depth equals zero. If this is the first tag, we create an instance of RoutingKey with all the tag and attribute information passed in as arguments in the startElement method. We use this key to lookup the RoutingTarget in the RoutingTable and apply the Writer by calling the setWriter method.

To use the RoutingSerializer, take a look at the main method in XMLRouter (Listing 3). We first get an instance of a SAX parser and set up a RoutingTable with a default ConsoleTarget and a target associated with a RoutingTag instance with the tag name “message”. We can then create a RoutingSerializer with the specific routing table. Finally, we can get the XMLReader and set the content handler to the serializer, calling the parse method with a suitable filename. The sample message.xml file looks like this:

<?xml version="1.0"?>
<message>
  <firsttag attr="value">
    <secondtag attr="value">
    </secondtag>
  </firsttag>
</message>

The attributes and associated values are there only for testing purposes. The first tag (message) is the one we match on in our demonstration. While this test is not elaborate enough to truly demonstrate document routing, it’s easy enough to picture what can be done. You can build your own XML message documents and add entries to the RoutingTable to send them to different RoutingTarget instances.

XML provides a mechanism for data interchange that’s incomparable at the moment. By applying simple techniques for deciding where your XML documents or messages are routed, you can build powerful solutions that range from large scale operations to stand-alone descktop applications or anything in between. This article demonstrated a simple and practical approach for XML document routing. I hope the technique serves you well.

Listing 1

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class RoutingSerializer extends XMLSerializer
{
  protected RoutingTable table;
  protected RoutingTarget target;
  protected boolean debug = true;

  public RoutingSerializer(RoutingTable table)
  {
    this.table = table;
    target = table.getDefaultTarget();
    Writer writer = target.getWriter();
    setOutputCharStream(writer);
    if (debug) System.out.println("Default:\n");
  }
  
  public void startElement(
    String uri, String local, String tag,
    Attributes attributes)
      throws SAXException
  {
    if (depth == 0)
    {
      RoutingKey key = new RoutingKey(
        uri, local, tag, attributes);
      target = table.getTarget(key);
      Writer writer = target.getWriter();
      setOutputCharStream(writer);
      if (debug) System.out.println(
        "Routed to " + '"' + tag + '"' + ":\n");
    }
    super.startElement(uri, local, tag, attributes);
  }
}

Listing 2

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class XMLSerializer extends DefaultHandler
{
  protected int depth;
  protected PrintWriter writer;
  
  public void setWriter(Writer writer)
  {
    this.writer = new PrintWriter(writer);
  }

  public void startDocument()
    throws SAXException
  {
    depth = 0;
    writer.println("<?XML version=\"1.0\"?>");
  }
  
  public void endDocument()
    throws SAXException
  {
    writer.flush();
  }
  
  public void startElement(
    String uri, String localName, String tag,
    Attributes attributes)
      throws SAXException
  {
    depth++;
    for (int i = 0; i < depth; i++)
    {
      writer.print(' ');
    }
    writer.print("<" + tag);
    int count = attributes.getLength();
    for (int i = 0; i < count; i++)
    {
      writer.print(" ");
      writer.print(attributes.getQName(i));
      writer.print('=');
      writer.print('"');
      writer.print(attributes.getValue(i));
      writer.print('"');
    }
    writer.print(">");
    writer.println();
  }
  
  public void endElement(
    String uri, String localName, String tag)
      throws SAXException
  {
    for (int i = 0; i < depth; i++)
    {
      writer.print(' ');
    }
    writer.print("</" + tag + ">");
    writer.println();
    depth--;
  }

  public void characters(char[] text, int offset, int length)
  {
    String string = new String(text, offset, length);
    writer.print(string.trim());
  }
}

Listing 3

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class RoutingSerializer extends XMLSerializer
{
  protected RoutingTable table;
  protected boolean debug = true;

  public RoutingSerializer(RoutingTable table)
  {
    this.table = table;
    RoutingTarget target = table.getDefaultTarget();
    setWriter(target.getWriter());
    if (debug) System.out.println("Default:\n");
  }
  
  public void startElement(
    String uri, String local, String tag,
    Attributes attributes)
      throws SAXException
  {
    if (depth == 0)
    {
      RoutingKey key = new RoutingKey(
        uri, local, tag, attributes);
      RoutingTarget target = table.getTarget(key);
      setWriter(target.getWriter());
      if (debug) System.out.println(
        "Routed to " + '"' + tag + '"' + ":\n");
    }
    super.startElement(uri, local, tag, attributes);
  }
}