Now we have jumped into to try to see the whole DocBook related process. There is a hierarchy of Ant builds from the global buildall.xml to buildall.xml of each project, to build of each sub-project and finally calling ${MUSE_HOME}/doc/tools/build.xml where this line was invoked
<java classname=”org.apache.fop.apps.Fop” fork=”true” failonerror=”false” maxmemory=”512m” resultproperty=”errorCode” timeout=”900000?>
<sysproperty key=”MUSE_HOME” value=”${MUSE_HOME}”/>
<!--...-->
</java>
We were trying to read the FOP documentation. We do not use the recommended way (http://xmlgraphics.apache.org/fop/0.95/anttask.html), that is, using the Ant Task and we were calling the command line from an external process. We considered doing this (using the FOP Ant task), although we had a very very old FOP (fop-0.20.5). However we needed to split what the command line was able to do and the Ant task was not into two operations. But that is the recommended way. Namely first create the *fo file, through an XSLT and then run the FOP converter. The command line was running the XSLT inside and without the need of a temporary file.
We considered doing this because we were thinking in the direction of switching to FOP 1.0 in the future, which actually accepts the XML dockbook as parameter and not the FO. Also we were thinking there could be improvements to speed because we could be running in the future multiple docs through a file set, without the need to step into each document. Considering this we tried using the documented way of using FOP in Ant and…we ran into a PermGen issue. After increasing it to 256 MB we still get OOM PermGen. We considered then that this is due to the recommended way of doing a task def…
<taskdef classname="org.apache.fop.tools.anttasks.Fop">
<classpath>
<fileset dir="${fop.home}/lib">
<include/>
</fileset>
<fileset dir="${fop.home}/build">
<include/>
<include />
</fileset>
</classpath>
</taskdef>
We had many classpath elements not few as the example above, and although this task def was in the context of the build script for the manual finished with each manual, still persist in Ant. This is … an ANT bug:
“Sub-builds (antcall, subant) load a task each time the task is defined, but do not release it when the sub-build project completes.[...]“
[https://issues.apache.org/bugzilla/show_bug.cgi?id=49021]
Following the workarounds mentioned we ended up trying to define the FOP task at the upper level and at the lowest level we tested if it is defined and only then we were defining it. To make it simpler we used the recommended Antlib approach. We ended up with this in the ${MUSE_HOME}/doc/tools/build.xml
<condition property=”alreadyDefined” value=”true” else=”false”>
<typefound name=”antlib:org.apache.fop.tools.anttasks.Fop:fop”/>
</condition>
<echo message=”alreadyDefined:${alreadyDefined}”/>
<if value=”false”>
<echo message=”redifining”/>
<taskdef uri=”antlib:org.apache.fop.tools.anttasks.Fop” resource=”antlib.xml” classpath=”${MUSE_HOME}/doc/tools/fop/lib/”/>
</if>
<!--...-->
<fop:fop format=”application/pdf” userConfig=”${MUSE_HOME}/doc/tools/fop/conf/userconfig.xml”
basedir=”${docBaseDir}/${docName}” fofile=”${temp.fopFile}” outfile=”${docBaseDir}/${docName}.pdf”/>
“basedir Base directory to resolve relative references (e.g., graphics files) within the FO document. No, for single FO File entry, default is to use the location of that FO file.”
[http://xmlgraphics.apache.org/fop/0.95/anttask.html]
Hence we used basedir and were happy that it worked. Another run, forgot a document was opened, the whole process needed to be revert, run again…finished in about 60 minutes…forgot to mention that we left the ANT_OPTS on the build machine to use more memory (-Xmx896M -XX:MaxPermSize=128M). This should make all the building process take a little less.
Then we were happy that the images appeared in the docs. But we said we should be comparing various new rendered PDFs with old ones. Many of them had the same dimension, but there were others with differences. Muse Testing.pdf (one containing many images) was smaller than the one from CVS with 1 MB.
By looking into it we saw images from Muse Designer Console.pdf and from ICE MARC to XML Converter.pdf. Well we said it has to do with some cache, because now there is a single instance of FOP to the entire run…and, indeed that was it, actually a feature of FOP:
“FOP caches images between runs. There is one cache per FopFactory instance. The URI is used as a key to identify images which means that when a particular URI appears again, the image is taken from the cache. If you have a servlet that generates a different image each time it is called with the same URI you need to use a constantly changing dummy parameter on the URI to avoid caching.”
[http://xmlgraphics.apache.org/fop/1.0/graphics.html#caching]
At this point we said stop to this whole chain as it wasn’t worth. Although we are keeping, for reference, the work done, so that when these issues may be resolved in the future we use the recommended way, adding a dummy parameter (hadn’t done this anyway) or trying to use the name of the file in the image or even mentioning the directory (which is the document name) means many modifications, and also in case a document name change is necessary then there are too many troubles in modifying it. So, using different file names for images should never be considered.
Meanwhile, while doing this by following the right documentation, we came across the timeout parameter of the Java fork, and had always had this workaround in mind. But we were curios why that JAI (Java Advanced Imaging) blocks there in the stack trace from above, and actually found that this is a JAI bug, not resolved in the latest version either. The bug is detailed here:
“The shutdownHook TempFileCleanupThread throws a ConcurrentModificationException in fileIter.next() sometimes. This exception is ignored and because .next() doesn’t succeed the loop never ends. It is about this code in com/sun/media/jai/codec/FileCacheSeekableStream.java:”
/**
* Deletes all <code>File</code>s in the internal cache.
*/
public void run() {
if(tempFiles != null && tempFiles.size() > 0) {
Iterator fileIter = tempFiles.iterator();
while(fileIter.hasNext()) {
try {
File file = (File)fileIter.next();
file.delete();
} catch(Exception e) {
// Ignore
}
}
}
}
[http://java.net/jira/browse/JAI_CORE-121]
According to the comments on Java Net Jira this isn’t resolved so far. We have also a peek in the latest JAI code from trunk and it is not yet resolved. It is strange something can block upon termination. So, we ended up doing the timeout trick…nothing else came to mind, and there was already too much time spent on this.
We wrote all this in case someone else confronts with something similar, and if not to see how not to design things. Recently we have come across many Swing bugs, even filled one to Oracle.
The Core team was fighting a similar stopping bug with RMI from Sun (in the context of Tomcat and Jackrabit)…all these are not nice, in case that there is not even a workaround for some bugs.