ORIGINAL DRAFT

To ensure the integrity of XML documents, various validation strategies have been developed. Validation specifications list permissible combinations of elements and attribute values. Several solutions exist, ranging from DTDs (Data Type Defintion), XML Schema, RELAX and TREX, to name a few. While each of these definitions is designed primarily for XML validation, there’s no reason the same approach couldn’t be applied to other validation requirements in your applications.

Most development projects could benefit from easily edited data type libraries. Ideally, we should be able to define these using a standard schema definition language. The XML Schema specification, for example, supports 19 primitive datatypes, 25 derived datatypes and 16 constraint facets that let you refine each data type definition. Numbers can be bounded, strings can be constrained to specific lengths, regular expressions can be applied, etc. XML Schema is very expressive, and powerful enough to specify data type validation for all but the most unusual modern application.

This article shows how you can apply a Java library from Sun called the Multi-Schema XML Validator, written by Kohsuke Kawaguchi. The Multi-Schema XML Validator is geared primary at XML validation and accomplishes its goals remarkably well, enabling validators of different types to be applied to any XML document, using the same infrastructure. While the Multi-Schema XML Validator infrastructure is capable of parsing multiple types of schemas, our focus will be on XML Schema, and specifically on the simpleType definitions.

When validating fields in a GUI application, or constraining input values in a Web application, or even validating file content, there’s little need for the XML Schema complex type, which, while more sophisticated, is geared at nested structures. We’ll stick to simple data types in this article, but even simple types in this context can be surprisingly powerful. It’s conceivable that complex types could be applied to relationships between fields, but we’ll leave that for intrepid readers to explore. You’ll find that simple types are more than enough for most applications.

Figure 1: Schema validation applied to a GUI
interface. In this example, you can edit the XML Schema and save it, then apply a validator to the text field
by using the combo box and pressing the Validate button to test it.

Figure 1: Schema validation applied to a GUI interface. In this example, you can edit the XML Schema and save it, then apply a validator to the text field by using the combo box and pressing the Validate button to test it.

Given that you can use XML Schema-defined validators in any form-based application, it would be nice to have a way to interactively develop schemas and test them on-the-fly. To do this, I’ve written a simple GUI application that lets you edit the XML Schema, save it and apply the changes to a trio of test components. The first component, a combo box, displays all of the known simple type validators for selection. The second and third components are a text field and button you can press to test the currently selected validator.

Figure 1 shows what happens when you apply the OneOf validator to a text field with a ‘ten’ value. You can see the XML Schema simpleType defined to the left of the message dialog box. The OneOf type is derived from the string base type and must be a value included in the enumerated restrictions. I won’t try to explain the XML Schema specification in this article. There’s plenty of information around that you can refer to. Suffice it to say that you can apply any simple type and each simple type can be defined by deriving a base XML Schema type and applying valid constraints.

The SchemaValidatorTest application requires you to press the Validate button to apply the test. It reports the results in the dialog box. In typical GUI applications, validators are applied at editing time. To be more specific, Swing defines an InputVerifier class that can be attached to a JTextField. An InputVerifier gets called when the field loses focus and the focus is returned to the field if the test method, called verify, returns a boolean false value.

I first applied this approach to the test application but because the focus is lost when the combo box is used, the message could often lag behind. Effectively, the lost focus from the text field caused to the currently selected validator to be applied instead of the one that wasbeing selected. Since my intent was to provide an interactive environment for developing XML Schema-defined validator definitions, I’ve commented out the code that attaches the InputVerifier implementation to the text field. In a real-world application you would probably use the InputVerifier approach, so we’ll look at how it works after we cover the SchemaValidator class.

Figure 2: The SchemaValidator class is the key to
using XML Schema to validate any simple text data. The FieldValidator provides an infrastructure for applying 
this approach to standard Swing applications.

Figure 2: The SchemaValidator class is the key to using XML Schema to validate any simple text data. The FieldValidator provides an infrastructure for applying this approach to standard Swing applications.

Figure 2 shows the classes we’ll be developing. The keystone for this article is the SchemaValidator class. The SchemaValidatorGUI and SchemaValidatorTest classes provide the application framework for interactive editing and testing data type definitions. The FieldValidator class extends the Swing InputVerifier and provides SchemaValidator-specific behavior. Because the exact behavior of a validation failure is application-specific, the FieldValidator provides a method called onInvalidData which can be applied in subclasses. The default behavior is to do nothing. The ExampleFieldValidator prints the XML Schema data type name, text field content and the text message from a DatatypeException, explaining the cause of the validation error.

Let’s take a look at the SchemaValidator class in Listing 1. The constructor expects a filename argument. We use the GrammarLoader from the Multi-Schema XML Validator package to load the XML Schema file. The GrammarLoader returns a Grammar object and is smart enough to recognize the type of schema involved. We assume an XML Schema file and, while different grammars are supported by the library, the rest of this code will fail if you try to use non-XML Schema grammars in this context. Since we are expecting a XMLSchemaGrammar, the next line casts the Grammar from the GrammarLoader.

The XMLSchemaGrammar lets us access alternate name space-specific schemas but the one we are interested in is the default (no name) space, which is returned as an XMLSchemaSchema object. The XMLSchemaSchema object stores an instance reference to the simple data types, called simpleTypes, which uses an inner class called SimpleTypeContainer. From this, we can get an array of ReferenceExp objects, which represent each of the definition expressions in the XML document. These are actually SimpleTypeExp objects, so we can cast them to get more specific information in the loop that follows. For each SimpleTypeExp instance, we can retrieve the XSDatatype validator, which is really what we’re after. We store these in a HashMap using the expression name as the key.

These XSDatatype instances expose a pair of interesting methods called isValid and checkValid. These methods take two arguments - a String value and a ValidationContext. The context is useful in compound documents, like XML, and provides information about other elements and attributes that may be critical for complex type validation. In our case, however, we can pass a null value for the ValidationContext and the value will be tested against the specified data type definition. This is what we’re really interested in doing.

With each XSDatatype object stored in a HashMap, we can look up a data type by name and apply either the isValid or checkValid test quite easily. Both methods are exposed by the SchemaValidator using two arguments, first the name of data type and then the value to be tested. Both methods throw a IllegalArgumentException if the named data type is not in the HashMap, which implies it was never defined. The isValid method returns a boolean value, but the checkValid method throws a DatatypeException. The only other method implemented in the SchemaValidator class is getNameList, which returns a list of named data types.

Listing 2 shows the code for FieldValidator. The constructor expects a SchemaValidator instance, along with the data type name to be applied to the field. There are both a setValidator and setName method so that you can change them at runtime without having to apply a new FieldValidator to a text field. The verify method does the real work, implementing the InputVerifier’s abstract method behavior. If the name value is null, we assume the test is valid and return true in the verify method. Otherwise, we follow the InputVerifier pattern by casting the JComponent to a JTextField. This is safe because the only Swing component that uses an InputVerifier is the JTextField implementation.

Once we have a reference to the JTextField, we can retrieve the text to be tested. We could return the boolean result of a call to SchemaValidator’s isValid method here, but we’re interested in providing more detailed user feedback, so we call checkValid instead in a try/catch block. If the value is valid, no exception is thrown, so we return true. If the value is not valid, we rely on a onInvalidData method to take action and return a false value. The default onInvalidData implementation does nothing.

Listing 3 shows the code for the ExampleFieldValidator, which is application-specific. The constructor expects a SchemaValidator object, but uses a null value for the data type name. In our application, the SchemaValidatorGUI, the data type name is set via the setName method in the FieldValidator class when the user changes the selection in the JComboBox. Your application would probably set this value in the constructor.

The ExampleFieldValidator’s onInvalidData method retrieves the current text from the JTextField component and builds a string using the data type name, field value and DatatypeException message for display. In this implementation, it merely prints the output to the console but a dialog box could be presented to the user instead, explaining the problem and recommending a course of action.

The SchemaValidatorGUI class is not listed but you can find it online and study it to see how it works. In principle, it’s fairly simple, setting up a JComboBox, JTextField for editing tests, a JTextArea for viewing and editing the XML Schema definition and a couple of buttons; one to validate the context of the JTextArea, the other to save the XML document text and reload it using the SchemaValidator. When a schema document is reloaded, the JComboBox is repopulated with the data type names retrieved using the getNameList method.

To run the GUI application, you’ll need to call the SchemaValidatorTest class. The JAR files from the Multi-Schema XML Validator package must be included in your class path. If you are using Java 1.4 you do not need to include the Xerces package. The JAR files to include are: isorelax.jar, msv.jar, xdlib.jar and relaxngDatatype.jar.

The approach detailed in this article is extremely powerful and easily adapted to other validation infrastructures. We’ve provided an example implementation using the Swing InputVerifier, but the same approach can be applied to the Struts Web development infrastructure or other custom solutions. Being able to define data types in a standard XML format can enable you to adapt rapidly to changing requirements without having to rewrite large portions of code. What’s more, you can keep large libraries of data type validation definitions on hand and reuses them in multiple projects. Have fun.

Listing 1

import java.io.*;
import java.util.*;

import com.sun.msv.grammar.*;
import com.sun.msv.reader.util.*;
import com.sun.msv.datatype.xsd.*;
import com.sun.msv.grammar.xmlschema.*;
import org.relaxng.datatype.*;
import javax.xml.parsers.*;
import org.xml.sax.*;

public class SchemaValidator
{
  protected XMLSchemaSchema.SimpleTypeContainer simpleTypes;
  
  protected Map dataTypes = new HashMap();
  
  public SchemaValidator(String filename)
    throws SAXException, IOException,
      ParserConfigurationException
  {
    Grammar grammar = GrammarLoader.loadSchema(filename);
    XMLSchemaGrammar schemaGrammar = (XMLSchemaGrammar)grammar;
    XMLSchemaSchema schema = schemaGrammar.getByNamespace("");
    XMLSchemaSchema.SimpleTypeContainer types = schema.simpleTypes;
    ReferenceExp[] list = types.getAll();
    for (int i = 0; i < list.length; i++)
    {
      SimpleTypeExp expr = (SimpleTypeExp)list[i];
      XSDatatype validator = expr.getType();
      dataTypes.put(expr.name, validator);
    }
  }
  
  public String[] getNameList()
  {
    Set set = dataTypes.keySet();
    String[] names = new String[set.size()];
    Iterator iterator = set.iterator();
    for (int i = 0; iterator.hasNext(); i++)
    {
      names[i] = (String)iterator.next();
    }
    return names;
  }
  
  public boolean isValid(String name, String text)
  {
    XSDatatype validator = (XSDatatype)dataTypes.get(name);
    if (validator == null) throw new IllegalArgumentException(
      "validator '" + name + "' not found");
    return validator.isValid(text, null);
  }
    
  public void checkValid(String name, String text)
    throws DatatypeException
  {
    XSDatatype validator = (XSDatatype)dataTypes.get(name);
    if (validator == null) throw new IllegalArgumentException(
      "validator '" + name + "' not found");
    validator.checkValid(text, null);
  }
}

Listing 2

import javax.swing.*;
import org.relaxng.datatype.*;

public class FieldValidator
  extends InputVerifier
{
  protected SchemaValidator validator;
  protected String name;
  
  public FieldValidator(SchemaValidator validator, String name)
  {
    this.validator = validator;
    this.name = name;
  }
  
  public void setValidator(SchemaValidator validator)
  {
    this.validator = validator;
  }
  
  public void setName(String name)
  {
    this.name = name;
  }
  
  public boolean verify(JComponent component)
  {
    if (name == null) return true;
    JTextField field = (JTextField)component;
    String text = field.getText();
    try
    {
      validator.checkValid(name, text);
      return true;
    }
    catch (DatatypeException e)
    {
      onInvalidData(component, e.getMessage());
      return false;
    }
  }
  
  public void onInvalidData(JComponent component, String msg) {}
}

Listing 3

import java.awt.*;
import javax.swing.*;

public class ExampleFieldValidator
  extends FieldValidator
{
  protected static final char NL = '\n';
  
  public ExampleFieldValidator(
    SchemaValidator validator)
  {
    super(validator, null);
  }
  
  public void onInvalidData(JComponent component, String msg)
  {
    JTextField field = (JTextField)component;
    StringBuffer buffer = new StringBuffer();
    buffer.append("Datatype: " + name + NL);
    buffer.append("Value: " + field.getText() + NL);
    buffer.append("Message: " + msg + NL);
    System.out.println(buffer.toString());
  }
}