With the communication of data in structured formats, and most notably XML, there is a need for assembling elementary transformations (such as XSLT stylesheets) for achieving complex and rationnalized transformation tasks. Several tools have arisen recently for achieving this. They have a lot in common despite some differences in objectives. The following table introduce them with salient diverging features in the expression of the transformation.
\ | file | flow |
task oriented | Ant | Transmorpher |
data oriented | Xweb, Lagoon | Cocoon |
\ | push | pull |
static | Transmorpher, Xweb | Ant |
dynamic | Lagoon | Cocoon |
In the following, we make the distinction between a sitemap which describes what files/output can be obtained and a stylesheet which describe how to perform a particular transformation. Of course, both are often mixed since how to obtain what file is often described.
There exists other systems for this task Xpipes (http://xpipe.sourceforge.net) or Ux
Lagoon (http://lagoon.sourceforge.net/docs/userguide.html, previously XotW [staldal2000a]) is aimed at generating web sites offline. The site is described by a sitemap providing all the files in the web site. Lagoon distinguishes between six types of components: format (from XML to bytes), transform (from XML to XML), source (a generator of XML), read (a generator of bytes), parse (from bytes to XML) and process (from bytes to bytes). Unlike many of other tools (and like Xweb), Lagoon mixes byte and XML streams. Lagoon has Make-like facilities for recomputing files only when necessary. Emphasis is put on this evolved form of caching. Like Make, Lagoon mixes sitemap and stylesheet for all the files. The transformations are rules for providing a file. They are expressed in a functional way (each producer, but the split transform, can only provide one file). This prohibits transformation flow reuse as with tasks.
Cocoon (http://xml.apache.org/cocoon, [mclaughlin2000a]) is another stylesheet composition system written in Java and integrated with Servlet servers. It allows to compute online web sites. Advantages of Cocoon include document caching and explicit declaration of transformations ("sitemap"). Cocoon is based on a three-step site publication model (creation, content processing and rendering). This provides a clear methodology for developing sites but confines the system to a particular type of processing. The caching mechanism of Cocoon is tied to that methodology by enabling caching only at these steps.
Ant (http://jakarta.apache.org/ant/) is a substitute for the famous Make using XML and Java. It is thus a program configurator and updater. Its goal is not XML processing but it shares features with the systems presented here: a simple processing model and an easily customizable philosophy. Ant is task and file oriented,
XWeb (http://meganesia.int.gu.edu.au/~pbecker/xweb/manual.html) aims at generating web sites offline. Its current approach is file-based with input/output handled implicitely. It thus implements pipelines. Processes processes either XML or binary information.
A new processing model for XWeb is available (http://meganesia.int.gu.edu.au/~pbecker/xweb/processingModel.html).
Transmorpher (http://transmorpher.inrialpes.fr, [euzenat2001a]) is an environment for defining and processing complex transformation flows. It targets transformation engineering, not especially web site generation. It enables the description of complex data flows combining other flows with basic transformations.
Transmorpher allows to:
The first exercise in order to compare languages consists in comparing the terms used for describing them. The first table considers the basic concepts manipulated by the processing models (at least those which are common to these systems, they have other concepts not considered here).
ant | cocoon | lagoon | xweb (*) | xpipe | transmorpher | |
stylesheet | makefile | sitemap | sitemap | website | stylesheet | |
I/O | file | - (implicit) | file | stream | - (implicit) | channel |
context | properties (?) | context | ||||
parameters | parameters | configuration | parameters | |||
base components | built-in tasks | producers | process | Xcomponent | built-in transformations | |
tasks | task | pipeline/action-set | xsl nesting/macro | Xpipe/Xrigs | process | |
selection | matching | selector | - | - | ||
iteration | - (implicit on directories) | - (implicit on directories) |
- (implicit on directories) | - (implicit on directories) | - | iterator |
These systems are provided with various basic components depending on their destinations. The second table deals with the basic components that are provided for implementing the actual processing (not the organization of this processing). Ant is not really relevant for this comparison (it has different extensions).
cocoon | lagoon | xweb | transmorpher | |
generating output | serializer/generator | format/consumer | serializer | |
getting input | reader | source/read | generator | |
applying and external program | action | transform/parse/process | programCall | external |
applying a custom program | - | ruleset | ||
generating dynamic pages | JSP/XSP | LSP | - | - |
applying an XSLT stylesheet | transformer (xslt) | transform (xslt) | xsl | external (xslt) |
applying a query | transformer (sql) | - | query | |
aggregating results | aggregator | merge | ||
spliting results | matcher (?) | transform (split) | dispatch |
One can layer these systems in the following way (adapted from Peter Becker):
It appears that all these systems have comparable needs to plug coded transformation and data manipulation procedures. One great benefit should come from the sharing of the plug-in definitions so that as soon as one plug-in is made for one system, it is avalaible for the others (all these systems require the same tools: XSLT engines, special purpose formatters, various parsers and serializers, etc.
This is not obvious to do in JAXP [armstrong2001a] which has been especially designed for XSLT transformations (XML stream, template interface, only one input and one output).
Achieving the sharing of plug-in definitions can take advantage of the terminological comparison above because it helps for defining the rock-bottom categories requiring a special interface. For instance, Lagoon distinguishes transformations on the basis of the "encoding" of input/output (i.e. whether it is XML or just bytes), though Transmorpher distinguishes on the number of input/output. As a consequence, Transmorpher distinguishes dispatch from transformation and Lagoon distinguishes format from transform.
The expected interface for XWeb is the following is made of a Registry with the method:
public static void register(net.sourceforge.xweb.processors.ProcessorFactory factory)Factory with the methods (?):
public static String getProcessName() public static String getProcessNamespace()and Processors with methods:
public static net.sourceforge.processors.Processor getProcessor(??? configuration) public List getInputs() public List getOutputs() public void connectOutput(Processor other, Input input) protected void input(Input input, BinaryData data) protected void input(Input input, XMLData data) public void run()
The (undocumented) interface for Transmorpher is made of a Factory which knows a correspondance table between Process types (xslt, broadcast, concat, etc.) and the actual Java class names. These correspondances will be given in a near future to Transmorpher through a defextern tag similar to Cocoon declarators. There also exists an undocumented property file which allows to choose a default implementation of a particular Process category. Howeer, we prefer to be able to mix implementations within the same stylesheet so not develop this possibility. The Factory provides:
public void initFactory() public static final TProcess newProcess(String type, Object[] params) { public final TProcess newGenerator(String[] pOut,String type,Parameters pParam,StringParameters staticAttributes){ public static final TProcess newSerializer(String[] pIn,String type,Parameters pParam,StringParameters staticAttributes){ public static final TProcess newDispatcher(String[] pIn,String[] pOut,String type,Parameters pParam) { public static final TProcess newConnector(String[] pIn,String[] pOut,String type,Parameters pParam) { public static final TProcess newExternal( String[] pIn,String[] pOut,String type,Parameters pParam) public static final TProcess newApplyQuery( String[] pIn,String[] pOut,String type,Parameters pParam,StringParameters staticAttributes){Important features here are: I/O and parameters are given at creating time. All processes are typed (this helps the factory to know what parameters are needed. As a consequences, Transmorpher provides several types of Processes interface. They most important method is the constructor which deals with all the parameters. The TProcess interface is very simple (not everything is useful):.
Beside SAX plug-in types, it might be useful to consider:
In Cocoon, plugs-in are declared within the siteman inside the components tag. They are typed as generators, transformers, serializers, readers, selectors, matchers and actions. They are identified by a name and the name of the class which implements them. Additional parameters can be passed on to the component. For instance:
The interface for, e.g, a Serializer is the following:
public interface XMLConsumer extends ContentHandler, LexicalHandler {} public interface SitemapOutputComponent extends Component { /** * Set theOutputStream
where the requested resource should * be serialized. */ void setOutputStream(OutputStream out) throws IOException; /** * Get the mime-type of the output of thisComponent
. */ String getMimeType(); /** * Test if the component wants to set the content length */ boolean shouldSetContentLength(); } public interface Serializer extends XMLConsumer, SitemapOutputComponent { String ROLE = "org.apache.cocoon.serialization.Serializer"; }
The interface is not yet documented.
It would be nice to be able to call some of these systems from the outside (e.g. XWeb calls Lagoon or Cocoon calls Transmorpher at the processing tool level). In fact, if we can normalize a built-in interface, this could lead to be able to interoperate through the "external interface" of each system.
There exists one such interface in JAXP [armstrong2001a]: the transformer interface. Unfortunately, it only allows one input and one output to a transformer. This is too specific and calls for something else.
One of the advantage of sharing the call interface would be to call a particular system as a plug-in in addition to a processing tool (e.g., Cocoon could delegate only a few of its basic tasks to Lagoon). This requires these systems to be reentrant.
Generators:
Transformer:
Serializer: