Overview
Formatting objects (FO) are a part of XSLT
specification, and are used to create printable output
from an XML document. Take XHTML and CSS,
which are used to present information in a web
browser. The document is usually one gigantic
page, which can be a huge pain in the ass if you
want hard copy output. FO is used to provide
this hard copy output, and theoretically you
could use it to layout and print an entire
book. Since formatting objects is currently an
infantile technology, there aren't very many FO
processors available yet. The two most common and
complete are the Apache project's FOP, and XEP by
RenderX. These two processors render the FO
document directly to PDF, PS, PCL, and some
other optional formats. Other processors are
available, many of which convert the FO document
to a TeX document, which must then be processed by
TeX to get printable output.
The FO specification gives a stylesheet
author the ability to define a number of pages and
all of the properties of those pages; margins,
headers, footers, and so on. These pages can then
be arranged in a page sequence.
Say you wanted to print a report, duplexed
(printed on both sides of the paper), with a two
inch right margin for odd pages and a two in left
margin for odd pages (for hole punching and
putting into binders). FO gives you the power to
do this, by allowing you to define separate page
layouts for a cover page, an abstract page, a
table of contents page, and then the even and odd
pages. You can put these pages together using a
page sequence, so that you have one title page,
one abstract page, one table of contents page, and
then alternating even and odd pages.
Process
I'll compare the process of getting a PDF out
of an XML document with the process of getting
HTML out of the same document.
It is pretty straight forward to write an XSLT
stylesheet to transform an XML document into
HTML. In some cases, you only have to replace the
XML tags with proper HTML tags and add the header
material (<head>, <title>, etc.). Once
you have the XML document and a stylesheet to
transform it, you can run a processor on the pair
to create an HTML. This can either be done on the
server, before a web browser looks at it, or if
you have a browser with a built-in XSLT processor,
such as IE 5.5 or Mozilla, you can simple include
a stylesheet directive in the XML document
pointing to the stylesheet, and when the browser
loads the XML document, it will automatically load
and process the stylesheet as well. To
summarize:
- Generate the XML document (maybe from a
database).
- Use a XSLT stylesheet to transform the XML
into HTML.
- View the HTML in a browser.
Compare this to the process of generating a PDF
file (using FOP) from an XML document. For this,
you'll also have to create a XSLT stylesheet, but
rather than spitting out html, it will convert the
XML to another XML document where the tags are in
the FO namespace. This stylesheet will in general
be more complicated, but only because you have to
include a lot more header information that
describes the page formats and sequences.
Once you have the stylesheet, you process your
XML document with it the same way you would for
the HTML example. Instead of getting HTML, you now
have an FO document, which must now be passed
through a FO processor like FOP to get printable
(PDF) output. It's just one extra step.
- Generate the XML document (maybe from a
database).
- Use a XSLT stylesheet to transform the XML
into a FO XML document.
- Run a FO processor (such as FOP) on the FO
document.
- View the PDF in a reader.
Uses
So far, formatting object haven't really found
a lot of use in the real world. This is probably
mostly because of the newness of the technologies
surrounding FO, and due to the fact that the
formatting object specification is still in
flux. However, there are some places that
formatting objects will work to great
advantage.
In particular, I envision FO being used to
generate reports for business applications (something I am experimenting with
now). Anyone who has used Crystal Reports knows
what a bitch it can be; I think that formatting
objects will eventually become easier to use than
Crystal, but not before a good WYSIWYG editor is
created for creating page layouts and generating
the XSLT stylesheets usedf to convert XML into FO
documents. If there were a web based application
that did this, programmers could give their users
the XML over the web and let users create their
own reports, storing the resulting stylesheets on
the server and doing all the processing there as
well. Of course, there is a long way to go in this
arena before FO approaches the usability of
Crystal.
Conclusion
Since the only difference between HTML and PDF
output is the stylesheet and an extra processing
step, there is a lot of flexibility and
extensibility available to present the same data
in a lot of different formats. This is, of course,
one of the big promises of XML and XSLT, and FO
gives another type of output which makes this
presentation mutability more of a reality.
References
Harold, Elliotte Rusty. "XSL Formatting Objects" XML Bible. 3
Jun. 2001
<http://www.ibiblio.org/xml/books/bible2/chapters/ch18.html>.
The Extensible Stylesheet Language (XSL). 10 Jul. 2001. The
World Wide Web Consortium. 11
Aug. 2001. <http://www.w3c.org/Style/XSL/>.