ORIGINAL DRAFT

One of the great things about XML is it’s ability to identify the encoding scheme as part of it’s document header information. As such, XML is particularly well suited to handling multiple language problems, such as internationalization and localization. Java provides a powerful localization mechanism called resource bundles, which must either be coded into Java classes or handled by the default Java Properties object and its associated file format. XML seems better suited for this than property files, however, so this Java Break, we’ll built a localization infrastructure for Java that’s based on XML.

One of the things the standard Java ResourceBundle class does well is to handle the management of property file names using a hierarchical naming convention, based on Locale objects. A Locale is built up from three elements

  • Language, Country and Variant. In practice the Language is important and the Country is sometimes important, but the Variant is rarely used. Each element can be represented by two-letter symbols. File names can easily be built by using an underscore delimiter and a base filename. For example, a US-English bundle might have the name “BaseName_en_US.xml”, while a more general English bundle would have the name “BaseName_en.xml”. This is the same basic naming scheme used for property file names in standard Java resource bundles.

Localization is really about externalizing language or Locale-specific resources and loading them using keys. Java uses String keys to get at resources, while other languages may use integer values. The keys, however, are completely arbitrary and conventions are entirely up to developers. In a large project, it’s important to develop and adopt a global scheme for naming resources as early as possible, so that name collisions don’t occur during development. Fortunately, this can be handled in part by using resource bundles that are specific to given components or modules, each having a different base name.

What’s important in resolving localization keys is that the more specific entries are found first, thus the hierarchical nature of resource bundles. A US-English entry is more specific than a more general English entry with the same name. Fortunately, XML is hierarchical as well, so you can imagine a scheme in which keys are resolved from the deepest node first. It’s still important that bundles can be distributed across files, though, but it should also be possible to keep multiple languages or Locales in the same file. We’ll design an infrastructure where both approaches, or a mix, are possible.

Let’s take a look at the high-level interface we want to be able to use. We’ll define an XMLResourceBundle by it’s base file name, so the constructor requires a single String argument. The loadFile method will load resources from one or more XML bundles based on a given Locale (provided as the only argument). We want this call to handle all the conflict resolution for us so that we end up with the key mappings that are appropriate for the provided Locale. After this, we’ll be free to call the getText method with a key to get a String resource, or the getFile method to get file names.

Figure 1: XMLResourceBundle makes use of support
classes to load files, parse them and to resolve cached resource key entries.

Figure 1: XMLResourceBundle makes use of support classes to load files, parse them and to resolve cached resource key entries.

Figure 1 shows the classes we’ll need for this project. The LocaleUtil class is a collection of static methods that help construct file names from Locale objects. It provides a utility method that returns an ordered list of filenames from least to most specific. The ResourceCache is a HashMap that requires a ResourceKey for each entry. A ResourceKey keeps information about both the resource name and the data type. By parsing the XML resource files in the right order, we can overwrite ResourceCache entries with more specific entries as they are loaded, reducing most of the complexity we might otherwise have to deal with.

We won’t have enough room here to explore all the classes, so we’ll focus on the XMLBundleParser class, which handles parsing the XML source documents, and the XMLResourceBundle class, which is the main class in our localization infrastructure. The ResourceKey, ResourceCache and LocaleUtil classes are all fairly short and simple, and you can find them online if you’d like to take a closer look.

Let’s take a glance at an example resource file:

<?xml version="1.0" encoding="UTF-8"?>
<RESOURCE>
  <RES key="file" type="String">File</RES>
  <RES key="open" type="String">Open</RES>
  <RES key="close" type="String">Close</RES>
  <RES key="new" type="String">New</RES>
  <RES key="edit" type="String">Edit</RES>
  <RES key="delete" type="String">Delete</RES>
  <RES key="cut" type="String">Cut</RES>
  <RES key="copy" type="String">Copy</RES>
  <RES key="paste" type="String">Paste</RES>
  <RES key="file" type="File">ExampleFileName.txt</RES>
</RESOURCE>

This file is a base file and stores it’s entries in RES tags, directly under the main RESOURCE tag. The convention we’ll use involves a key name and data type attribute for each entry. The content is normally text but might represent other resource types, such as a File type (in the last entry, in which case we’ll return a Java File object using the text used to define the path/filen ame). This example provides useful default values for common text keys.

Here’s another example with a deeper structure:

<?xml version="1.0" encoding="UTF-8"?>
<RESOURCE>
  <LANGUAGE name="pt">
    <COUNTRY name="BR">
      <RES key="language" type="String">Portuguese (Brazil)</RES>
      <RES key="file" type="String">Arquivo</RES>
      <RES key="delete" type="String">Excluir</RES>
      <RES key="cut" type="String">Recortar</RES>
      </COUNTRY>
  </LANGUAGE>
</RESOURCE>

This example shows entries for the “Portuguese (Brazil)” locale. A given file can contain RES entries either under the root RESOURCE tag, under the LANGUAGE tag, and optionaly under the COUNTRY tag or an optional VARIANT tag. Deeper entries are more specific. The example above captures entries under the COUNTRY tag and is named “ResourceBundle_pt_BR.xml”. The first example is simply called “ResourceBundle.xml”. Here’s a DTD that describes the XMLResourceBundle format.

<!ELEMENT RESOURCE (RES*,LANGUAGE?)>
<!ELEMENT LANGUAGE (RES*,COUNTRY?)>
<!ATTLIST LANGUAGE name CDATA #REQUIRED>
<!ELEMENT COUNTRY (RES*,VARIANT?)>
<!ATTLIST COUNTRY name CDATA #REQUIRED>
<!ELEMENT VARIANT (RES*)>
<!ATTLIST VARIANT name CDATA #REQUIRED>
<!ELEMENT RES (#PCDATA)>
<!ATTLIST RES key CDATA #REQUIRED type CDATA #REQUIRED>

As I mentioned earlier, I’ve avoided the need to put things either in one or many files. You can put everything in a single file if you like and the XMLResourceBundle class will handle things transparently. The standard scheme assumes you’ll probably want to distribute resources across suitably named files, which follow the same convention as the standard Java resource bundles (see JavaDocs for details). In either case, resources are loaded in the correct order and cached in the ResourceCache for subsequent access.

Let’s take a closer look at the classes we’ll need to accomplish this. The XMLBundleParser (see Listing 1) is an extension of the SAX DefaultHandler class. The parser method does the high-level work and sets up various instance variables before calling the SAXParserFactory to create a SAXParser. We get the XMLReader from the SAXParser and set the content handler to this class before calling the reader’s parse method.

The rest of the work is primarily done by the startElement, endElement and characters methods. We watch for the LANGUAGE, COUNTRY, VARIANT and RES tags and capture the content in a given context. We use a StringBuffer to accumulate text content and flush it into an appropriate key when the endElement method is reached, if we are in a suitable context. The inContext method helps us check to see if the current LANGUAGE, COUNTRY and VARIANT name attributes match the locale settings we are looking for. Otherwise the text is discarded.

Listing 2 shows the code for the XMLResourceBundle class, which provides the primary interface for handling localization with XML resource bundles. The loadFile method, as mentioned earlier, creates a new ResourceCache instance and then calls the XMLBundleParser instance created by the constructor. The parser call is actually done by a utility method called parseFile which uses the file name and locale provided by the caller. The loadFile method repeats this call for each possible file it needs to load. The list of files is constructed by the static method getFileNameList in the FileUtil class.

By the time the parse method in XMLResourceBundle returns all the relevant files have been checked first for existence and then loaded into the ResourceCache instance we created. As you’ll recall, they are loaded from general to specific order, such that more specific values will always overwrite less specific entries. Because the ResourceCache is a HashMap that uses ResourceKey instances for reference, keys may be duplicated, so long as the type is different for each entry. Otherwise, existing entries are intentionally overwritten.

You’ll notice that the XMLBundleParser instance is reused and that a new ResourceCache is created by each call to the parse method. This means that changing locales can be done by simply calling the parse method. If you use multiple base file names, however, you’ll want to use multiple instances of XMLResourceBundle.

The rest of the methods are very simple, providing access to resources by key name. I’ve implemented a standard getText method, which is sufficient in most cases, but I’ve also provided a getFile method to demonstrate alternate data types. This approach can just as well be applied to colors, bitmap information or anything else you might see fit. You can implement your own by extending XMLResourceBundle.

Let’s take a quick look at a simple piece of code which demonstrates how you might use XMLResourceBundle. I’ve provided a set of XML resource bundle files that map common keys like cut, copy, paste, new, edit, delete, etc. to different locales. These are very common names in graphical applications and among the first to be localized. The examples I’ve provided cover only a few languages, enough to demonstrate the technique but not enough to use in development. The languages include English, French, Italian, Spanish, and Portuguese. I’‘ve provided a more specific Portuguese (Brazil) file as well to show a more localized example.

Listing 3 shows the code for XMLResourceBundleTest, which lets you print out a few of the key references for each of the bundles we have. When you run this, you notice a couple of things. First, the File type entry is always retrieved from the base file because it never occurs in any of the other bundles. Secondly, the last file is for the Brazil (COUNTRY) variant for Portuguese. I picked this example because the text is actually different for some of the entries. Finally, you’ll notice that the Locale entries in the test class are based on Java constants where possible, but had to be explicitly created for some of the locales.

This approach to localization in Java is both powerful and easy to apply. The process of localization is non-trivial on larger projects but using XML may enable more flexible tools to be applied. Authoring tools can simplify the process, or a database may be employed to store records that are exported to XML during the build process. In either case, this approach can yield considerable savings in development time and effort, thanks to the use of XML resource bundles, which provide all the benefits of the standard Java localization model, along with all the benefits of XML content management.

Listing 1

import java.io.*;
import java.util.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;

public class XMLBundleParser
  extends DefaultHandler
{
  protected ResourceCache cache = new ResourceCache();
  protected String parserLanguage, parserCountry, parserVariant;
  protected String targetLanguage, targetCountry, targetVariant;
  protected StringBuffer buffer = new StringBuffer();
  protected ResourceKey key;
  
  public void parse(ResourceCache cache,
    String filename, Locale targetLocale)
      throws Exception
  {
    this.cache = cache;
    targetLanguage = targetLocale.getLanguage();
    targetCountry = targetLocale.getCountry();
    targetVariant = targetLocale.getVariant();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser parser = factory.newSAXParser();
    XMLReader reader = parser.getXMLReader();
    reader.setContentHandler(this);
    reader.parse(filename);
  }
  
  protected boolean inContext()
  {
    if (parserLanguage == null ||
        parserLanguage.equals(targetLanguage))
    {
      if (
        parserCountry == null ||
        parserCountry.equals(targetCountry))
      {
        if (parserVariant == null ||
            parserVariant.equals(targetVariant))
        {
          return true;
        }
      }
    }
    return false;
  }
  
  public void characters(char[] chars, int offset, int length) 
  {
    buffer.append(chars, offset, length);
  }

  public void startElement(
    String uri, String lName, String qName,
    Attributes attrs)
  {
    if (qName.equals("LANGUAGE"))
    {
      parserLanguage = attrs.getValue("name");
    }
    if (qName.equals("COUNTRY"))
    {
      parserCountry = attrs.getValue("name");
    }
    if (qName.equals("VARIANT"))
    {
      parserVariant = attrs.getValue("name");
    }
    if (qName.equals("RES"))
    {
      key = new ResourceKey(
        attrs.getValue("key"),
        attrs.getValue("type"));
    }
  }
  
  public void endElement(
    String uri, String lName, String qName)
  {
    String content = buffer.toString().trim();
    buffer.setLength(0);
    if (qName.equals("LANGUAGE"))
    {
      parserLanguage = null;
    }
    if (qName.equals("COUNTRY"))
    {
      parserCountry = null;
    }
    if (qName.equals("VARIANT"))
    {
      parserVariant = null;
    }
    if (qName.equals("RES") && inContext())
    {
      cache.put(key, content);
    }
  }
}

Listing 2

import java.io.*;
import java.util.*;

public class XMLResourceBundle
{
  protected XMLBundleParser parser;
  protected ResourceCache cache;
  protected String filename;
  
  public XMLResourceBundle(String filename)
  {
    parser = new XMLBundleParser();
    this.filename = filename;
  }
  
  public void loadFile(Locale locale)
    throws Exception
  {
    cache = new ResourceCache();
    String[] fileList = LocaleUtil.getFileNameList(
      locale, filename, ".xml");
    for (int i = 0; i < fileList.length; i++)
    {
      parseFile(cache, fileList[i], locale);
    }
  }
  
  protected void parseFile(ResourceCache cache,
    String filename, Locale locale)
      throws Exception
  {
    File file = new File(filename);
    if (file.exists())
    {
      parser.parse(cache, filename, locale);
    }
  }
  
  public String getText(String key)
  {
    return cache.getResource(key, "String");
  }
  
  public File getFile(String key)
  {
    String file = cache.getResource(key, "File");
    return new File(file);
  }
}

Listing 3

import java.util.*;

public class XMLResourceBundleTest
{
  protected static final Locale
    SPANISH = new Locale("es");
  protected static final Locale
    PORTUGUESE = new Locale("pt");
  protected static final Locale
    PORTUGUESE_BRAZIL = new Locale("pt", "BR");
  
  protected static Locale[] LOCALES =
  {
    Locale.ENGLISH, Locale.FRENCH, Locale.ITALIAN,
    SPANISH, PORTUGUESE, PORTUGUESE_BRAZIL
  };
  
  protected static String toString(Locale locale)
  {
    String language = locale.getDisplayLanguage();
    String country = locale.getDisplayCountry();
    return language + (!country.equals("") ? " (" + country + ")" : "");
  }
  
  public static void main(String[] args)
    throws Exception
  {
    String baseFile = "ResourceBundle";
    
    System.out.println();
    System.out.println("Base file name: " + '"' + baseFile + '"');
    System.out.println();
    
    for (int i = 0; i < LOCALES.length; i++)
    {
      Locale locale = LOCALES[i];
      System.out.println("Locale: " + toString(locale));
      
      XMLResourceBundle bundle =
        new XMLResourceBundle(baseFile);
      bundle.loadFile(locale);
    
      System.out.println("cut=" +
        '"' + bundle.getText("cut") + '"');
      System.out.println("copy=" +
        '"' + bundle.getText("copy") + '"');
      System.out.println("paste=" +
        '"' + bundle.getText("paste") + '"');
      System.out.println("file=" +
        '"' + bundle.getFile("file") + '"');
      System.out.println();
    }
  }
}