ORIGINAL DRAFT

It’s common when operating on a group of files to use a definition that captures the file selection criteria. In applications like backup, mirroring or synchronization software, for example, the notion of a file set is used to define paths and files to include or exclude in a set. Specific files or directories can be specified, or expressions can be applied to distinguish between files using more complex criteria.

This article centers on using XML to store and exchange file set definitions. A file set is a collection of file specifications, which may include or exclude groups of files. A file set can be applied to a file system, resolving to an explicit set of files. To maximize flexibility, our design will support a callback mechanism you can use to process the individual files a set resolves to, as you see fit.

Let’s take a look at our basic design. We want to be able to specify files in various way. A file has a set of externally visible characteristics, such as its path, file name, last modified date and size, as well as a few boolean attributes like readability and writability. These attributes are explicit.

There are implicit characteristics we want to consider as well. A file name can be matched using wildcard expressions, for example. We may want to recurse through subdirectories in a path definition. And we may want to exclude files from a set as much as define those that should be included, enabling us to express concepts like ‘all html files in a directory hierarchy, except those that are marked as hidden’.

We’ll develop a set of classes that clearly separate responsibility for processing file sets. Our file specification interface looks like this:

public interface FileSpec
  extends FilenameFilter
{
  public String getPath();
  public void setPath(String path);
  public void setFile(Match match);
  public void setInclude(boolean include);
  public void setRecurse(boolean recurse);
  public void setDateRange(ComparableRange range);
  public void setSizeRange(ComparableRange range);
  public void setIsHidden(boolean isHidden);
  public void setCanWrite(boolean canWrite);
  public void setCanRead(boolean canRead);
  public boolean isParent(String path);
  public boolean isRecurse();
  public boolean isInclude();	
  public boolean match(File file);
}

Using this interface, we can specify a path, file name (using a matching algorithm), whether we are including or excluding a file, and whether we recurse into subdirectories. We support date and size ranges, as well as attributes named canRead, canWrite, isHidden. Only the file name, path, include and recurse specifications are required. The others are optional.

We define a set as a group of file specifications using an interface as well:

public interface FileSet
  extends FilenameFilter
{
  public void addSpec(FileSpec file);
  public void removeSpec(FileSpec file);
  public int getSize();
  public FileSpec getSpecAt(int index);
  public boolean match(File file);
}

You’ll notice that we extent the FilenameFilter interface, which provides a file-matching interface used in the Java File class. The rest of the methods allow you to add, remove, count or access individual files specifications. I’ve exposed a common match method that takes a File argument in case you want to test for set inclusion in different contexts. The FilenameFilter interface uses an accept method with two arguments, one for the path and another for the file name.

Figure 1 shows the classes we’ll develop in this project.

Figure 1: File Set management classes.

Figure 1: File Set management classes.

You’ll notice that we provide both a Basic and XML implementation of the FileSet and FileSpec interfaces. The reason for this is that XML represents a data format and there is no reason XML should be directly associated with the behavioral implementation. As such, the BasicFileSet and BasicFileSpec classes actually implement the interfaces, while the XMLFileSet and XMLFileSpec classes subclass them to support the ability to import to and export from an XML representation.

The FileProcessor class resolves a file set, sending each matching file object to a FileProcessorCallback interface. The Match interface provides support for multiple matching algorithms, of which we provide only one, the Wildcard inner class. We won’t have the space to cover the matching classes in this article, but you can find additional matching algorithms in a JavaPro article I wrote entitled "Finding a Perfect Match", published in July, 2000.

We won’t have enough room to explain each of the of the classes in detail, but you can find them all online at www.xmlmag.com. We’ll provide a high-level explanation for some of the classes and take a closer look at the ones that play key roles in this project.

The ComparableRange class is used to determine if an object falls within a given range. This is useful for both date and size definitions, in our case, but the solution if generic enough to be usable in other contexts. We relly on the Collections API, and the Comparable interface in particular, to compare a value against from and to boundaries in the range.

The BasicFileSet class is pretty straight forward as well. It manages the content of a List using a Java ArrayList instance. FileSpec instances can be added and removed from the list and we can get the size of the list or a specific element by it’s index value.

The match method uses a utility method by the same name that walks the list of file specifications which are either include or exclude specs. We first check for a matching inclusion spec and return false if we find none. Then we check for exclusion before determining whether the file we’re testing is a match.

BasicFileSet also implements a FilenameFilter accept method which uses the match method, returning true if the test applies to a directory. We do this because we want traversable paths to be included. As you’ll see later, the FileProcessor class preprocesses directory names in the FileSet to make sure paths are traversed only once.

The XMLFileSet class extends BasicFileSet. Since the fundamental FileSet behavior is implemented in BasicFileSet, all we really need to do is to implement the ability to read and write XML representations of a FileSet. We do this by implementing toXML and fromXML methods.

These methods are both simple list traversals which walk the FileSpec entries to write or create new instances from an XML Element when reading. The same approach is taken in the XMLFileSpec class, which we’ll take a closer look at momentarily. Additional save and load methods are provided to save or load an XML file in a suitable format.

First, let’s take a quick look at Listing 1, the BasicFileSpec implementation provides a constructor that expects a path, file name, include and recurse flags. The path should be in a format specific to the file system you are going to apply the set to or, better yet, in a platform neutral forward slash-delimited format.

The file name can contain standard wildcard characters. Our wildcard-matching algorithm supports star characters to specify any number characters and the question mark to specify a single character. All other characters are literals that may be matched against.

The include value is true if you want to specify that a matching file should be included. If this value is false, any matching file will be explicitly excluded. The recurse flag determines whether matching files in subdirectories should be considered a match. If this value is false, only the explicitly specified path is considered.

We provide a number of accessors, as specified in the FileSpec interface, as well as match methods and the FilenameFilter interface accept method. Because we want to support optional specifications, such as the date range or boolean state values, like canRead and canWrite, we stay away from primitive types internally. If an instance variable is set to null, we can assume the attribute can be ignored in our match test.

The match method does most of the work in this class, testing for each attribute in a set of conditional statements. Rather than testing for truth, we test for falsity and return a boolean false for any non-matching, relevant attribute. This allows us to test each attribute in turn, returning as soon as we hit a mismatch. If we make it to the end of the method, we definitely have a match and we can return true.

The only other methods of note are the two normalizePath methods. These methods replace the platform-specific separator character with a forward slash to avoid platform-specific problems. By doing this, we can be sure that relative paths will function correctly on any platform, regardless of how they were stored. Paths which include drive letters in Windows will, of course, not be portable.

XMLFileSpec extends BasicFileSpec and adds the ability to read and write a specification as an XML Element. We use the JDOM interface to manage elements more easily in Java. You can see this code in Listing 2. You’ll notice the constructor just passes the same arguments to the parent class, but we implement two methods, toXML and fromXML, to support XML translation. These methods are used by the XMLFileSet class to manage individual file specifications.

There are number of supporting methods in this class, each of which translates a ComparableRange either to or from XML. This is the way we handle date and size ranges if they are present. These are stored as content tags within the XMLFileSpec tag. In each case, we are managing a translation to or from an Element object.

The last piece of code we’ll take a look at is the FileProcessor class in Listing 3. To make it easy to process more than one set, we provide a setFileSet method, which defines the set we will operate on.

The setCallback method specifies a callback instance to apply while processing matching files. You’ll notice a couple of static internal classes. One is the Callback interface and the other is a DefautCallback instance that is used by default. The default implementation merely prints out matching file names, but you can create your own which operates in more creative ways.

Three methods help ensure that we only traverse directories once. The getRootPathList method returns a list of root directories. A directory is considered a root if it is the shortest parental path in a group of file definitions. The checkContainsParent method adds a path to the list if it is not the child of any other path already in the list. The checkIfBetterParent method replaces any previously collected path if the one we are checking is the parent of the other path in the list

The traverse method actually walks the directory structure for each root path in the file set. A secondary traverse method, which expects a File argument, actually recurses through the directory tree. The call to listFiles uses the set’s accept method, which implements the FilenameFilter interface, to list only the relevant files and any directory that was found. Files are processed by calling the process method in the Callback interface and subdirectories are processed by recursively calling the traverse method.

When you download the code, you’ll find a FileProcessorTest class that shows how you can use the FileProcessor class. Because I can’t really predict what your file system might look like, you’ll have to change the folder variable assignment to point to your JRE directory. When you run the test, a new set will be created and saved to an XML file, and then loaded and processed. By saving and reloading we can be certain the XML input and output code is fully functional.

As you can see, XML is a useful tool for storing file set definitions. By approaching file sets this way, you can define complex file groups for processing and apply them at appropriate times. You may choose to replicate files, translate them or remove them on a periodic basis. By implementing your own Callback implementation, you can do pretty much anything you like with matching files.

Listing 1

import java.io.*;
import java.util.*;

public class BasicFileSpec
  implements FileSpec
{
  protected Match match;
  protected String path, file;
  protected boolean recurse, include;
  protected ComparableRange date, size;
  protected Boolean canRead, canWrite, isHidden;

  public BasicFileSpec(String path, String file,
    boolean include, boolean recurse)
  {
    this.file = file;
    setFile(new Match.Wildcard(file));
    setPath(path);
    setInclude(include);
    setRecurse(recurse);
  }
  
  public void setFile(Match match)
  {
    this.match = match;
  }
  
  public void setDateRange(ComparableRange range)
  {
    date = range;
  }
  
  public void setSizeRange(ComparableRange range)
  {
    size = range;
  }
  
  public void setCanRead(boolean canRead)
  {
    this.canRead = new Boolean(canRead);
  }
  
  public void setCanWrite(boolean canWrite)
  {
    this.canWrite = new Boolean(canWrite);
  }
  
  public void setIsHidden(boolean isHidden)
  {
    this.isHidden = new Boolean(isHidden);
  }
  
  public boolean accept(File path, String file)
  {
    return match(new File(path, file));
  }
  
  public boolean match(File file)
  {
    // Path Match
    String textPath = normalizePath(file.getPath());
    if (recurse)
    {
      if (!textPath.startsWith(path))
        return false;
    }
    else
    {
      if (!textPath.equals(path))
        return false;
    }
    
    // File Name Match
    if (match != null)
    {
      if (!match.match(file.getName()))
        return false;
    }
    
    // Date & Size Ranges
    if (date != null)
    {
      Date mod = new Date(file.lastModified());
      if (!date.inRange(mod)) return false;
    }
    if (size != null)
    {
      Long len = new Long(file.length());
      if (!size.inRange(len)) return false;
    }
    
    // Attributes
    if (canRead != null)
    {
      if (canRead.booleanValue() != file.canRead())
        return false;
    }
    if (canWrite != null)
    {
      if (canWrite.booleanValue() != file.canWrite())
        return false;
    }
    if (isHidden != null)
    {
      if (isHidden.booleanValue() != file.isHidden())
        return false;
    }
    return true;
  }

  public boolean isParent(String spec)
  {
    return path.startsWith(spec);
  }
  
  public String getPath()
  {
    return path;
  }
  
  public boolean isRecurse()
  {
    return recurse;
  }
  
  public boolean isInclude()
  {
    return include;
  }
  
  public void setPath(String path)
  {
    this.path = normalizePath(path);
  }
  
  public void setRecurse(boolean recurse)
  {
    this.recurse = recurse;
  }
  
  public void setInclude(boolean include)
  {
    this.include = include;
  }
  
  public boolean getInclude()
  {
    return include;
  }

  protected String normalizePath(File filePath)
  {
    return normalizePath(filePath.getPath());
  }

  protected String normalizePath(String textPath)
  {
    textPath = textPath.replace(File.separatorChar, '/');
    if (!textPath.endsWith("/")) textPath += '/';
    return textPath;
  }
}

Listing 2

import org.jdom.*;
import java.util.*;

public class XMLFileSpec extends BasicFileSpec
{
  public XMLFileSpec(String path, String file,
    boolean include, boolean recurse)
  {
    super(path, file, include, recurse);
  }
  
  protected Element dateRange()
  {
    if (date == null) return null;
    Element element = new Element("DATE");
    long start = ((Date)date.getFrom()).getTime();
    long end = ((Date)date.getTo()).getTime();
    element.addAttribute("from", "" + start);
    element.addAttribute("to", "" + end);
    return element;
  }
  
  protected Element sizeRange()
  {
    if (size == null) return null;
    Element element = new Element("DATE");
    long start = ((Long)size.getFrom()).longValue();
    long end = ((Long)size.getTo()).longValue();
    element.addAttribute("from", "" + start);
    element.addAttribute("to", "" + end);
    return element;
  }
  
  protected static ComparableRange dateRange(Element element)
  {
    String value = null;
    value = element.getAttributeValue("from");
    Date from = new Date(Long.parseLong(value));
    value = element.getAttributeValue("to");
    Date to = new Date(Long.parseLong(value));
    return new ComparableRange(from, to);
  }
  
  protected static ComparableRange sizeRange(Element element)
  {
    String value = null;
    value = element.getAttributeValue("from");
    Long from = Long.getLong(value);
    value = element.getAttributeValue("to");
    Long to = Long.getLong(value);
    return new ComparableRange(from, to);
  }
  
  public Element toXML()
  {
    Element element = new Element("FILE");
    element.addAttribute("path", path);
    element.addAttribute("file", file);
    element.addAttribute("include",	
      include ? "true" : "false");
    element.addAttribute("recurse", 
      recurse ? "true" : "false");

    // File Attributes
    if (canRead != null)
      element.addAttribute(
        "canread", canRead.toString());
    if (canWrite != null)
      element.addAttribute(
        "canwrite", canWrite.toString());
    if (isHidden != null)
      element.addAttribute(
        "hidden", isHidden.toString());
    
    // Date & Size Ranges
    if (date == null)
      element.addContent(dateRange());
    if (size == null)
      element.addContent(sizeRange());
    
    return element;
  }
  
  public static XMLFileSpec fromXML(Element element)
  {
    String path = element.getAttributeValue("path");
    String file = element.getAttributeValue("file");
    boolean include = Boolean.valueOf(element.
      getAttributeValue("include")).booleanValue();
    boolean recurse = Boolean.valueOf(element.
      getAttributeValue("recurse")).booleanValue();
    XMLFileSpec spec = new XMLFileSpec(
      path, file, include, recurse);
    
    // File Attributes
    String value = null;
    value = element.getAttributeValue("canread");
    if (value != null) spec.setCanRead(
      Boolean.valueOf(value).booleanValue());
    value = element.getAttributeValue("canwrite");
    if (value != null) spec.setCanWrite(
      Boolean.valueOf(value).booleanValue());
    value = element.getAttributeValue("hidden");
    if (value != null) spec.setIsHidden(
      Boolean.valueOf(value).booleanValue());
    
    // Date & Size Ranges
    List children = element.getChildren();
    for (int i = 0; i < children.size(); i++)
    {
      Element child = (Element)children.get(i);
      if (child.getName().equals("DATE"))
      {
        spec.setDateRange(dateRange(child));
      }
      if (child.getName().equals("SIZE"))
      {
        spec.setSizeRange(sizeRange(child));
      }
    }
    return spec;
  }
}

Listing 3

import java.io.*;
import java.util.*;

public class FileProcessor
{
  protected FileSet set;
  protected Callback callback;

  public void setFileSet(FileSet set)
  {
    this.set = set;
    setCallback(new DefaultCallback());
  }
  
  public static interface Callback
  {
    public void process(File file);
  }

  public static class DefaultCallback
    implements Callback
  {
    public void process(File file)
    {
      System.out.println("File: " + file);
    }
  }

  public void setCallback(Callback callback)
  {
    this.callback = callback;
  }

  protected List getRootPathList()
  {
    int size = set.getSize();
    List list = new ArrayList();
    for (int i = 0; i < size; i++)
    {
      FileSpec spec = set.getSpecAt(i);
      String name = spec.getPath();
      checkContainsParent(list, spec);
      checkIfBetterParent(list, spec);
    }
    return list;
  }

  protected void checkContainsParent(
    List list, FileSpec spec)
  {
    int size = list.size();
    String name = spec.getPath();
    for (int i = 0; i < size; i++)
    {
      String item = (String)list.get(i);
      if (name.startsWith(item)) return;
    }
    list.add(name);
  }
  
  protected void checkIfBetterParent(
    List list, FileSpec spec)
  {
    int size = list.size();
    String name = spec.getPath();
    for (int i = 0; i < size; i++)
    {
      String item = (String)list.get(i);
      if (item.startsWith(name))
      {
        list.set(i, name);
        return;
      }
    }
  }
  
  public void traverse()
  {
    List list = getRootPathList();
    for (int i = 0; i < list.size(); i++)
    {
      String path = (String)list.get(i);
      traverse(new File(path));
    }
  }
  
  protected void traverse(File path)
  {
    File[] children = path.listFiles(set);
    for (int i = 0; i < children.length; i++)
    {
      File child = children[i];
      if (child.isFile())
      {
        callback.process(child);
      }
      else traverse(child);
    }
  }
}