July 13, 2004
Configuring DocBook and XsltProc on Windows

I am writing my MSc thesis at the moment, and am trying to do it in docbook - the same way I'm trying to write the PN documentation (slowly!). Transforming docbook into PDF takes an impressive toolchain, XML & XSL -> XSL:FO -> PDF = XML editor, docbook stylesheets, xsltproc, fop (java + xalan + saxon + apache fop). Just a couple of tools, you might think - but they took me ages to collect and configure into a working setup.

I'm trying to write this explanation to keep a note for myself on how I did things. The various sections are in no particular order, so my apologies if it seems to be a ramble.

Getting the tools

You can get xsltproc for windows from the website referenced in the References section. You need to retrieve libxml2 (which contains libxml2.dll, xmlcatalog.exe and xmllint.exe), libxslt (libxslt.dll, libexslt.dll and xsltproc.exe), iconv and zlib. Download all the zips, and extract the .dll and .exe files. These are scattered around the bin and lib directories inside the zip files.

To transform the DocBook XML into XSL:FO XML (for later conversion to PDF), you need to get the DocBook XSL stylesheets. These need to be extracted and stored in a sensible location on your hard-disk.

FOP is the tool from the Apache XML project that converts from an XSL:FO formatted XML file (a file full of layout instructions) into other formats such as PDF. FOP is a java tool so you'll need a Java runtime. I downloaded the binary version, and also needed to download JAI in order to get picture insertion working. You need to install JAI and then the FOP tool will pick it up automatically. The FOP site also references something called JIMI but I couldn't get this working.

XML Catalogs

If you're not really into the world of XML (I feel like I know a good bit about it, and am barely scratching the surface compared to many others) then you may not really know a lot about DTDs, schemas and catalogs. Simply put, the DTD and Schema things are often used by the tools listed above to validate XML content - they define a contract for the content of XML files. If you just run these tools without a catalog, then they will attempt to retrieve these contract files from the internet. This takes a long time and really slows down the conversion process.

It took me ages to work out how to get catalogs to work properly with xsltproc, there was no windows documentation so I pieced it together from e-mails and snippets found using google.

Creating the Catalog

This shows how to create a simple catalog that points to a local copy of the docbook DTD. First you need to download the DTDs, which there are links to in the references section below. I suggest placing them in a directory structure like:

xml
xml\docbook
xml\docbook\4.3
xml\docbook\4.3\dtd <-- DTDs for docbook 4.3 in here

The DTDs are referenced in the xml files you are working with by a reference name, like for example: -//OASIS//DTD DocBook XML V4.3//EN. The catalog mechanism works by mapping from this reference to a file on your disk.

Sample

Here is a simple catalog file containing a mapping for this DTD:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="-//OASIS//DTD DocBook XML V4.3//EN" uri="file:///c:/xml/docbook/4.3/dtd/docbookx.dtd"/>
</catalog>

Note that you can also map previous versions of the requested DTD onto the newer version by mapping the old IDs to the new files.

Pointing at the catalog

Under Linux, xsltproc looks for a catalog in the default location of /etc/xml/catalog (or something similar). No alternative default is offered on Windows. Therefore, to point xsltproc at your catalog you must set the XML_CATALOG_FILES environment variable. This allows a space-separated list of filenames to be used.

From the command prompt:

set XML_CATALOG_FILES=c:\xml\catalog.xml

you can also set this through the system properties control panel application. Once this is set, xsltproc will load your catalog file and use it to resolve the DTDs.

Debugging

If you think this isn't working properly, you can view debug information relating to the use of the catalog by defining an environment variable like this:

set XML_DEBUG_CATALOG=1

You will now see lots more information about resolution when running xsltproc.

Conclusion

This post gives a bit of information about how to get the environment set up. I'll hopefully have time to write a bit about using all these tools as well in another pose.

References

1. Windows ports of xsltproc and required libraries: http://www.zlatkovic.com/libxml.en.html
1. Docbook Xml DTDs: http://www.docbook.org/xml/index.html
1. DocBook XSL Stylesheets: http://docbook.sourceforge.net/projects/xsl/
1. FOP: http://xml.apache.org/fop/
1. JAI: http://java.sun.com/products/java-media/jai/

Posted by Simon at 09:34 PM
July 11, 2003
Movable Type Rebuild

Apologies to anyone who subscribes to this site's RSS feed. My movable type installation managed to corrupt it's database, and I've had to re-install.

I believe the corruption was something to do with using offline blog tools to post. See my next post about offline blog tools, and grin while reading it at the irony of it being the post that my installation broke on.

Posted by Simon at 02:49 PM
April 10, 2003
.NET Framework 1.1 is out

Ok, so Microsoft have released version 1.1 of the .NET framework. Some bits are really quite cool:

Side-by-side installation and operation with version 1.0, I can even choose which to use with a config file. That's good.

However, I find myself staring at my screen in disbelief at the fact that they still seem incapable of making the framework operate properly with XP visual styles. Surely it doesn't take a genius to get the tab control to work properly? Instead we are left with trying to fix this using kludgy hacks (no offence to the author). This is a huge disappointment. I suppose at least the awful procedure for enabling the themes has gone: Application.EnableVisualStyles() - that's better.

Links:
.NET Framework Redistributable v1.1.
.NET Framework SDK v1.1.

Posted by Simon at 06:51 PM
March 27, 2003
JZip and my associations

Why is it that authors of software such as JZip assume that when I first install/run their software I really want them to steal my .zip associations?

Did they ask before doing so? I don't think so - I'm sure I'd have noticed and answered no. This is exceptionally bad practice.

This is even more infurating in Windows XP where there is ZIP folders support - meaning that I rarely use third-party zip tools.

In an effort to actually review the software in question, it looks reasonable and attempts to be a free WinZIP. Unfortunately it seemed to take about 10 seconds to load on my Pentium 4 2.5Ghz computer (whilst consuming more than 60% processor time) and doesn't seem to support the functions of WinZIP that are truly useful - i.e. un-tarring, mime decoding and the rest.

Oh, and don't even get me started on XML Spy stealing ".txt". Grrrrr.

Posted by Simon at 10:02 AM
December 13, 2002
Minor Madness

I've been trying to find out why sometimes a Tree control on a property page in pn2 would disappear (specifically when no items were selected) after switching pages to and fro. It would appear that if the dialog (which is the page) has its transparent property set to true then the tree (and this affects lists too) decides it needn't paint when the dialog is re-shown. This behaviour may only be when using XP themed drawing, I'm not entirely sure.

Anyone else seen this?

Posted by Simon at 10:04 PM