July 13, 2004

Configuring DocBook and XsltProc on Windows

I am writing my MSc thesis at the moment, and am trying to do it in docbook - the same way I'm trying to write the PN documentation (slowly!). Transforming docbook into PDF takes an impressive toolchain, XML & XSL -> XSL:FO -> PDF = XML editor, docbook stylesheets, xsltproc, fop (java + xalan + saxon + apache fop). Just a couple of tools, you might think - but they took me ages to collect and configure into a working setup.

I'm trying to write this explanation to keep a note for myself on how I did things. The various sections are in no particular order, so my apologies if it seems to be a ramble.

Getting the tools

You can get xsltproc for windows from the website referenced in the References section. You need to retrieve libxml2 (which contains libxml2.dll, xmlcatalog.exe and xmllint.exe), libxslt (libxslt.dll, libexslt.dll and xsltproc.exe), iconv and zlib. Download all the zips, and extract the .dll and .exe files. These are scattered around the bin and lib directories inside the zip files.

To transform the DocBook XML into XSL:FO XML (for later conversion to PDF), you need to get the DocBook XSL stylesheets. These need to be extracted and stored in a sensible location on your hard-disk.

FOP is the tool from the Apache XML project that converts from an XSL:FO formatted XML file (a file full of layout instructions) into other formats such as PDF. FOP is a java tool so you'll need a Java runtime. I downloaded the binary version, and also needed to download JAI in order to get picture insertion working. You need to install JAI and then the FOP tool will pick it up automatically. The FOP site also references something called JIMI but I couldn't get this working.

XML Catalogs

If you're not really into the world of XML (I feel like I know a good bit about it, and am barely scratching the surface compared to many others) then you may not really know a lot about DTDs, schemas and catalogs. Simply put, the DTD and Schema things are often used by the tools listed above to validate XML content - they define a contract for the content of XML files. If you just run these tools without a catalog, then they will attempt to retrieve these contract files from the internet. This takes a long time and really slows down the conversion process.

It took me ages to work out how to get catalogs to work properly with xsltproc, there was no windows documentation so I pieced it together from e-mails and snippets found using google.

Creating the Catalog

This shows how to create a simple catalog that points to a local copy of the docbook DTD. First you need to download the DTDs, which there are links to in the references section below. I suggest placing them in a directory structure like:

xml
xml\docbook
xml\docbook\4.3
xml\docbook\4.3\dtd <-- DTDs for docbook 4.3 in here

The DTDs are referenced in the xml files you are working with by a reference name, like for example: -//OASIS//DTD DocBook XML V4.3//EN. The catalog mechanism works by mapping from this reference to a file on your disk.

Sample

Here is a simple catalog file containing a mapping for this DTD:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="-//OASIS//DTD DocBook XML V4.3//EN" uri="file:///c:/xml/docbook/4.3/dtd/docbookx.dtd"/>
</catalog>

Note that you can also map previous versions of the requested DTD onto the newer version by mapping the old IDs to the new files.

Pointing at the catalog

Under Linux, xsltproc looks for a catalog in the default location of /etc/xml/catalog (or something similar). No alternative default is offered on Windows. Therefore, to point xsltproc at your catalog you must set the XML_CATALOG_FILES environment variable. This allows a space-separated list of filenames to be used.

From the command prompt:

set XML_CATALOG_FILES=c:\xml\catalog.xml

you can also set this through the system properties control panel application. Once this is set, xsltproc will load your catalog file and use it to resolve the DTDs.

Debugging

If you think this isn't working properly, you can view debug information relating to the use of the catalog by defining an environment variable like this:

set XML_DEBUG_CATALOG=1

You will now see lots more information about resolution when running xsltproc.

Conclusion

This post gives a bit of information about how to get the environment set up. I'll hopefully have time to write a bit about using all these tools as well in another pose.

References

1. Windows ports of xsltproc and required libraries: http://www.zlatkovic.com/libxml.en.html
1. Docbook Xml DTDs: http://www.docbook.org/xml/index.html
1. DocBook XSL Stylesheets: http://docbook.sourceforge.net/projects/xsl/
1. FOP: http://xml.apache.org/fop/
1. JAI: http://java.sun.com/products/java-media/jai/

Posted by Simon at July 13, 2004 09:34 PM | TrackBack
Feedback
Simon of the PN Devlog shows in his latest posting how to configure xsltproc and Apache FOP for converting DocBook . The article contains instructions on downloading and installing the tools and configuring catalogs.
Read more in Get a DocBook Toolchain working (for Windows users) »
Trackbacked from Software Documentation Weblog Jul 14, 2004 2:24 PM