Forrest Sitemap Reference (v0.9)

Forrest's sitemap comprises the multiple $FORREST_HOME/main/webapp/*.xmap files. The main one is sitemap.xmap which delegates to others, including to sitemaps in the various plugins.

You can add pre-processing sitemaps to your project src/documentation directory (or wherever ${project.sitemap-dir} points to). Any match that is not handled, passes through to be handled by the default Forrest sitemaps - obviously extremely powerful. The capability is described in "Using project sitemaps".

Another way to experiment with the sitemap is to do 'forrest run' on a Forrest-using site. Making changes to the core *.xmap files will now be immediately effective at http://localhost:8888/

Forrest's sitemap is divided both physically and logically. The most obvious is the physical separation. There are a number of separate *.xmap files, each defining pipelines for a functional area. Each *.xmap file has its purpose documented in comments at the top. Here is a brief overview of the files, in order of importance.

sitemap.xmap	Primary sitemap file, which delegates responsibility for serving certain URIs to the others (technically called sub-sitemaps). More about the structure of this file later.
forrest.xmap	Sitemap defining Source pipelines, which generate the body section of Forrest pages. All pipelines here deliver XML in Forrest's intermediate "document-v13" format, regardless of originating source or format.
menu.xmap	Pipelines defining the XML that becomes the menu.
linkmap.xmap	Defines a mapping from abstract ("site:index") to physical ("index.html") links for the current page. See Menus and Linking for a conceptual overview, and the Link rewriting section for technical details.
resources.xmap	Serves "resource" files (images, CSS, Javascript).
raw.xmap	Serves files located in src/documentation/content/xdocs that are not to be modified by Forrest.
plugins.xmap	Provides access to the plugins descriptor files.
aggregate.xmap	Generates a single page (HTML or PDF) containing all the content for the site.
faq.xmap	Processes FAQ documents.
status.xmap	Generates changes and todo pages from a single status.xml in the project root.
issues.xmap	Generates a page of content from an RSS feed. Used in Forrest to generate a "current issues" list from JIRA.
revisions.xmap	Support for HOWTO documents that want "revisions". Revisions are XML snippets containing comments on the main XML file. The main pipeline here automatically appends a page's revisions to the bottom.
dtd.xmap	A Source pipeline that generates XML from a DTD, using Andy Clark's DTD Parser. Useful for documenting DTD-based XML schemas, such as Forrest's own DTDs.
profiler.xmap	Defines the "profiler" pipeline. allowing pipelines to be benchmarked.

Most *.xmap files (forrest, aggregate, faq, status, issues, revisions, dtd) define Source pipelines. Source pipelines define the content (body) XML for site pages. The input XML format can be any format (document-v13, Docbook, RSS, FAQ, Howto) and from any source (local or remote). The output format is always Forrest's intermediate "document-v13" format.

Source pipelines always have a ".xml" extension. Thus, index.xml gives you the XML source for the index page. Likewise, faq.xml gives you XML for the FAQ (transformed from FAQ syntax), and changes.xml returns XML from the status.xml file. Take any page, and replace its extension (.html or .pdf) with .xml and you'll have the Source XML.

This is quite powerful, because we now have an abstraction layer, or "virtual filesystem", on which the rest of Forrest's sitemap can build. Subsequent layers don't need to care whether the XML was obtained locally or remotely, or from what format. Wikis, RSS, FAQs and Docbook files are all processed identically from here on.

                   (subsequent Forrest pipelines)
                                 |
--------+------------------------^------------------------------------------
        |          STANDARD FORREST FORMAT (current document-v13)
        +-----^-------^--------^------------^------^-----^-----^------^-----
SOURCE        |       |        |            |      |     |     |      |
FORMATS    doc-v11  doc-v13  doc-v20 ... Docbook  FAQ  Howto  Wiki  RSS  ??
(*.xml)
                        (in forrest.xmap, faq.xmap, etc)

forrest.xmap

Most of the usual Source pipelines are defined in forrest.xmap which is the default (fallback) handler for **.xml pages. The forrest.xmap uses the SourceTypeAction to determine the type of XML it is processing, and converts it to document-v13 if necessary.

For instance, say we are rendering a Howto document called "howto-howto.xml". It contains this DOCTYPE declaration:

<!DOCTYPE howto PUBLIC "-//APACHE//DTD How-to V1.3//EN"
  "http://forrest.apache.org/dtd/howto-v13.dtd">

The SourceTypeAction sees this, and applies this transform to get it to document-v13:


          <map:when test="howto-v13">
            <map:transform src="{forrest:forrest.stylesheets}/howto-to-document.xsl" />
          </map:when>

Other source pipelines

As mentioned above, all non-core Source pipelines are distributed in independent *.xmap files. There is a block of sitemap.xmap which simply delegates certain requests to these subsitemaps:


<!-- Body content -->
      <map:match pattern="**.xml">
        <map:match pattern="locationmap.xml">
          <map:generate src="{forrest:forrest.locationmap}" />
          <map:serialize type="xml"/>
        </map:match>
        <map:match pattern="plugins.xml">
          <map:mount uri-prefix="" src="plugins.xmap" check-reload="yes" />
        </map:match>
        <map:match pattern="pluginDocs/plugins_(.*)/index(|\.source).xml" type="regexp">
          <map:mount uri-prefix="" src="plugins.xmap" check-reload="yes" />
        </map:match>
        <map:match pattern="linkmap.*">
          <map:mount uri-prefix="" src="linkmap.xmap" check-reload="yes" />
        </map:match>
        <map:match pattern="**faq.xml">
          <map:mount uri-prefix="" src="faq.xmap" check-reload="yes" />
        </map:match>
        <map:match pattern="community/**index.xml">
          <map:mount uri-prefix="" src="forrest.xmap" check-reload="yes" />
        </map:match>        ....
        ....

Late-binding pipelines

One point of interest here is that the sub-sitemap is often not specific about which URLs it handles, and relies on the caller (the section listed above) to only pass relevant requests to it. We term this "binding a URL" to a pipeline.

For instance, the main pipeline in faq.xmap matches **.xml, but only **faq.xml requests are sent to it.

This "late binding" is useful, because the whole URL space is managed in sitemap.xmap and not spread over lots of *.xmap files. For instance, say you wish all *.xml inside a "faq/" directory to be processed as FAQs. Just override sitemap.xmap and redefine the relevant source matcher:


        <map:match pattern="**faq.xml">
          <map:mount uri-prefix="" src="faq.xmap" check-reload="yes" />
        </map:match>

To recap, we now have a *.xml pipeline defined for each page in the site, emitting standardized XML. These pipeline definitions are located in various *.xmap files, notably forrest.xmap

We now wish to render the XML from these pipelines to output formats like HTML and PDF.

PDF output

Note

PDF is now generated via the org.apache.forrest.plugin.output.pdf plugin.

Easiest case first; PDFs don't require menus or headers, so we can simply transform our intermediate format into XSL:FO, and from there to PDF. This is done by the following matches in output.xmap from the pdf plugin ...


        <!-- Match requests for XSL:FO documents -->
        <map:match type="regexp" pattern="^(.*?)([^/]*).fo$">
            <map:select type="exists">
                <map:when test="{lm:project.{1}{2}.fo}">
                    <map:generate src="{lm:project.{1}{2}.fo}"/>
                </map:when>
                <map:otherwise>
                    <map:aggregate element="site">
                        <map:part src="cocoon://module.properties.properties"/>
                        <map:part src="cocoon://skinconf.xml"/>
                        <map:part src="cocoon://{1}{2}.xml"/>
                    </map:aggregate>
                    <map:transform type="xinclude"/>
                    <map:transform type="linkrewriter" src="cocoon://{1}linkmap-{2}.fo"/>
                    <map:transform src="{lm:pdf.transform.document.fo}">
                        <map:parameter name="imagesdir"  value="{properties:resources.images}/"/>
                        <map:parameter name="xmlbasedir" value="{properties:content.xdocs}{1}"/>
                        <map:parameter name="path"       value="{1}"/>
                    </map:transform>
                </map:otherwise>
            </map:select>
            <map:serialize type="xml"/>
        </map:match>

This section of the pipeline matches requests for XSL:FO documents by using a regular expression match to break the request into directory (.*?) and filename ([^/]*) parts. If the XSL:FO document exists in the project (the exists selector), it is used; otherwise, the XSL:FO is generated:

The properties input module, skinconf and the source document are combined into an aggregate
XInclude elements are processed
Links are rewritten
The source as generated from the preceding steps is transformed by the stylesheet with the locationmap hint pdf.transform.document.fo and serialized as the final XSL:FO document


        <!-- Match requests for PDF documents -->
        <map:match type="regexp" pattern="^(.*?)([^/]*).pdf$">
            <map:select type="exists">
                <map:when test="{lm:project.{1}{2}.pdf}">
                    <map:read src="{lm:project.{1}{2}.pdf}"/>
                </map:when>
                <map:when test="{lm:project.{1}{2}.fo}">
                    <map:generate src="{lm:project.{1}{2}.fo}"/>
                    <map:transform type="i18n">
                      <map:parameter name="locale" value="{../locale}"/>
                    </map:transform>
                    <map:serialize type="fo2pdf"/>
                </map:when>
                <map:otherwise>
                    <map:generate src="cocoon://{1}{2}.fo"/>
                    <map:transform type="i18n">
                      <map:parameter name="locale" value="{../locale}"/>
                    </map:transform>
                    <map:serialize type="fo2pdf"/>
                </map:otherwise>
            </map:select>
        </map:match>

This next section of the pipeline matches requests for PDF documents in a manner similar to the previous match for XSL:FO documents. If the PDF document exists in the project, it is passed directly to the client. If the XSL:FO document exists for the requested PDF, the XSL:FO is serialized by the fo2pdf serializer and passed to the client as PDF (after i18n is handled by the i18n transformer). When neither PDF nor XSL:FO exists, XSL:FO is generated by the match described above, i18n elements are processed for the current locale, and the result is serialized as PDF.

HTML output

Generating HTML pages is more complicated, because we have to merge the page body with a menu and tabs, and then add a header and footer. Here is the *.html matcher in sitemap.xmap ...

          
          1   <map:match pattern="*.html">
          2     <map:aggregate element="site">
          3       <map:part src="cocoon:/skinconf.xml"/>
          4       <map:part src="cocoon:/build-info"/>
          5       <map:part src="cocoon:/tab-{0}"/>
          6       <map:part src="cocoon:/menu-{0}"/>
          7       <map:part src="cocoon:/body-{0}"/>
          8     </map:aggregate>
          9     <map:call resource="skinit">
          10      <map:parameter name="type" value="transform.site.xhtml"/>
          11      <map:parameter name="path" value="{0}"/>
          12    </map:call>
          13  </map:match>

So index.html is formed by aggregating skinconf.xml, build-info, body-index.html and menu-index.html and tab-index.html and then applying the site-to-xhtml.xsl stylesheet to the result.

The conversion from transform.site.xhtml to site-to-xhtml.xsl (line 10 above) is handled by the locationmap in this fragment from locationmap-transforms.xml:

          
          <match pattern="transform.*.*">
            <select>
              <location src="{properties:skins-dir}{forrest:forrest.skin}/xslt/html/{1}-to-{2}.xsl" />
              <location src="{forrest:forrest.context}/skins/{forrest:forrest.skin}/xslt/html/{1}-to-{2}.xsl"/>
              <location src="{forrest:forrest.context}/skins/common/xslt/html/{1}-to-{2}.xsl"/>
              <location src="{forrest:forrest.stylesheets}/{1}-to-{2}.xsl"/>
            </select>
          </match>

There is a nearly identical matcher for HTML files in subdirectories:

          
          <map:match pattern="**/*.html">
            <map:aggregate element="site">
              <map:part src="cocoon:/skinconf.xml"/>
              <map:part src="cocoon:/build-info"/>
              <map:part src="cocoon:/{1}/tab-{2}.html"/>
              <map:part src="cocoon:/{1}/menu-{2}.html"/>
              <map:part src="cocoon:/{1}/body-{2}.html"/>
            </map:aggregate>
            <map:call resource="skinit">
              <map:parameter name="type" value="transform.site.xhtml"/>
              <map:parameter name="path" value="{0}"/>
            </map:call>
          </map:match>

Page body

Here is the matcher which generates the page body:

          
          1   <map:match pattern="**body-*.html">
          2     <map:generate src="cocoon:/{1}{2}.xml"/>
          3     <map:transform type="idgen"/>
          4     <map:transform src="{lm:transform.xml.xml-xpointer-attributes}"/>
          5     <map:transform type="xinclude"/>
          6     <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
          7     <map:transform src="{lm:transform.html.broken-links}" />
          8     <map:call resource="skinit">
          9       <map:parameter name="type" value="transform.xdoc.html"/>
          10      <map:parameter name="path" value="{1}{2}.html"/>
          11      <map:parameter name="notoc" value="false"/>
          12    </map:call>
          13  </map:match>

In our matcher pattern, {1} will be the directory (if any) and {2} will be the filename.
First, we obtain XML content from a source pipeline
We then apply a custom-written IdGeneratorTransformer, which ensures that every <section> has an "id" attribute if one is not supplied, by generating one from the <title> if necessary. For example, <idgen> will transform:
```
              <section>
              <title>How to boil eggs</title>
              ...
            
```
into:
```
              <section id="How+to+boil+eggs">
              <title>How to boil eggs</title>
              ...
            
```
Later, the document-to-html.xsl stylesheet will create an <a name> element for every section, allowing this section to be referred to as index.html#How+to+boil+eggs. document-to-html.xsl is looked up by the key transform.xdoc.html in the locationmap in line 9 above. See locationmap-transforms.xml for this match.
We then expand XInclude elements.
and rewrite links..
and then finally apply the stylesheet that generates a fragment of HTML (minus the outer elements like <html> and <body>) suitable for merging with the menu and tabs.

Page menu

In the sitemap.xmap file, the matcher generating HTML for the menu is:

          
          1   <map:match pattern="**menu-*.html">
          2     <map:generate src="cocoon:/{1}book-{2}.html"/>
          3     <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
          4     <map:transform src="{lm:transform.html.broken-links}" />
          5     <map:call resource="skinit">
          6       <map:parameter name="type" value="transform.book.menu"/>
          7       <map:parameter name="path" value="{1}{2}.html"/>
          8     </map:call>
          9     <map:serialize type="xml"/>
          10  </map:match>

We get XML from a "book" pipeline, rewrite links, and apply the book-to-menu.xsl stylesheet to generate HTML.

How the menu XML is actually generated (the *book-*.html pipeline) is sufficiently complex to require a section of its own.

Page tabs

          
          <map:match pattern="**tab-*.html">
            <map:mount uri-prefix="" src="tabs.xmap" check-reload="yes" />
          </map:match>

And the match from tabs.xmap:

          
          1   <map:match pattern="**tab-*.html">
          2     <map:generate src="{lm:project.tabs.xml}"/>
          3     <map:transform type="xinclude"/>
          4     <map:select type="config">
          5       <map:parameter name="value" value="{properties:forrest.i18n}"/>
          6       <map:when test="true">
          7         <map:act type="locale">
          8           <map:transform src="{lm:transform.book.book-i18n}"/>
          9           <map:transform type="i18n">
          10            <map:parameter name="locale" value="{locale}"/>
          11          </map:transform>
          12        </map:act>
          13      </map:when>
          14    </map:select>
          15    <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
          16    <map:call resource="skinit">
          17      <map:parameter name="type" value="transform.tab.menu"/>
          18      <map:parameter name="path" value="{1}{2}.html"/>
          19    </map:call>
          20  </map:match>

All the smarts are in the tab-to-menu.xsl stylesheet (resolved by the locationmap in line 17), which needs to choose the correct tab based on the current path. Currently, a "longest matching path" algorithm is implemented. See the tab-to-menu.xsl stylesheet for details.

Many resources are resolved by the locationmap. This allow us to provide many alternative locations for a file without cluttering up the sitemap with multiple processing paths. We use a strict naming convention to help make the sitemaps more readable. This is described in the Locationmap documentation.

The "book" pipeline is defined in sitemap.xmap as:

        
        <map:match pattern="**book-*.html">
          <map:mount uri-prefix="" src="menu.xmap" check-reload="yes" />
        </map:match>

Meaning that it is defined in the menu.xmap file. In there we find the real definition, which is quite complicated, because there are three supported menu systems (see menus and linking). We will not go through the sitemap itself (menu.xmap), but will instead describe the logical steps involved:

Take site.xml and expand hrefs so that they are all root-relative.
Depending on the forrest.menu-scheme property, we now apply one of the two algorithms for choosing a set of menu links (described in menu generation):
- For "@tab" menu generation, we first ensure each site.xml node has a tab attribute (inherited from a parent if necessary), and then pass through nodes whose tab attribute matches that of the "current" node.
  
  For example, say our current page's path is community/howto/index.html. In site.xml we look for the node with this "href" and discover its "tab" attribute value is "howtos". We then prune the site.xml-derived content to contain only nodes with tab="howtos".
  
  All this is done with XSLT, so the sitemap snippet does not reveal this complexity:
```
    
    <map:transform src="resources/stylesheets/site-to-site-normalizetabs.xsl" />
    <map:transform src="resources/stylesheets/site-to-site-selectnode.xsl">
      <map:parameter name="path" value="{1}{2}"/>
    </map:transform>
    
              
```
- For "directory" menu generation, we simply use an XPathTransformer to include only pages in the current page's directory, or below:
```
<map:transform type="xpath">
  <map:parameter name="include" value="//*[@href='{1}']" />
</map:transform>
                
              
```
  Here, the "{1}" is the directory part of the current page. So if our current page is community/howto/index.html then "{1}" will be community/howto/ and the transformer will include all nodes in that directory.
We now have a site.xml subset relevant to our current page.
The "href" nodes in this are then made relative to the current page.
The XML is then transformed into a legacy "book.xml" format, for compatibility with existing stylesheets, and this XML format is returned (hence the name of the matcher: **book-*.html).

In numerous places in sitemap.xmap you will see the "linkrewriter" transformer in action. For example:

<map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>

This statement is Cocoon's linking system in action. A full description is provided in Menus and Linking. Here we describe the implementation of linking.

Cocoon foundations: Input Modules

The implementation of site: linking is heavily based on Cocoon Input Modules, a little-known but quite powerful aspect of Cocoon. Input Modules are generic Components which simply allow you to look up a value with a key. The value is generally dynamically generated, or obtained by querying an underlying data source.

In particular, Cocoon contains an XMLFileModule, which lets one look up the value of an XML node, by interpreting the key as an XPath expression. Cocoon also has a SimpleMappingMetaModule, which allows the key to be rewritten before it is used to look up a value.

The idea for putting these together to rewrite "site:" links was described in this thread. The idea is to write a Cocoon Transformer that triggers on encountering <link href="scheme:address">, and interprets the scheme:address internal URI as inputmodule:key. The transformer then uses the named InputModule to look up the key value. The scheme:address URI is then rewritten with the found value. This transformer was implemented as LinkRewriterTransformer, currently distributed as a "block" in Cocoon 2.1 (see API docs).

Implementing "site:" rewriting

Using the above components, "site:" URI rewriting is accomplished as follows.

cocoon.xconf

First, we declare all the input modules we will be needing:


<!-- For the site: scheme -->
<component-instance
  class="org.apache.cocoon.components.modules.input.XMLFileModule"
  logger="core.modules.xml" name="linkmap"/>

<!-- Links to URIs within the site -->
<component-instance
  class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
  logger="core.modules.mapper" name="site"/>

<!-- Links to external URIs, as distinct from 'site' URIs -->
<component-instance
  class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
  logger="core.modules.mapper" name="ext"/>

linkmap will provide access to the contents of site.xml; for example, linkmap:/site/about/index/@href would return the value "index.html".
site provides a "mask" over linkmap such that site:index expands to linkmap:/site//index/@href
ext provides another "mask" over linkmap, such that ext:ant would expand to linkmap:/site/external-refs//ant/@href

However at the moment, we have only declared the input modules. They will be configured in sitemap.xmap as described in the next section.

sitemap.xmap

Now in the sitemap, we define the LinkRewriterTransformer, and insert it into any pipelines which deal with user-editable XML content:


....
<!-- Rewrites links, e.g. transforming
     href="site:index" to href="../index.html"
-->
<map:transformer name="linkrewriter"
  logger="sitemap.transformer.linkrewriter"
  src="org.apache.cocoon.transformation.LinkRewriterTransformer">
  <link-attrs>href src</link-attrs>
  <schemes>site ext</schemes>

  <input-module name="site">
    <input-module name="linkmap">
      <file src="{src}" reloadable="false" />
    </input-module>
    <prefix>/site//</prefix>
    <suffix>/@href</suffix>
  </input-module>
  <input-module name="ext">
    <input-module name="linkmap">
      <file src="{src}" reloadable="false" />
    </input-module>
    <prefix>/site/external-refs//</prefix>
    <suffix>/@href</suffix>
  </input-module>
</map:transformer>
....
....
<map:match pattern="**body-*.html">
  <map:generate src="cocoon:/{1}{2}.xml"/>
  <map:transform type="idgen"/>
  <map:transform type="xinclude"/>
  <map:transform type="linkrewriter" src="cocoon:/{1}linkmap-{2}.html"/>
  ...
</map:match>

As you can see, our three input modules are configured as part of the LinkRewriterTransformer's configuration.

Most deeply nested, we have:
```
                <input-module name="linkmap">
                  <file src="{src}" reloadable="false" />
              </input-module>
              
```
The "{src}" text is expanded to the value of the "src" attribute in the "linkrewriter" instance, namely "cocoon:/{1}linkmap-{2}.html" Thus the linkmap module reads dynamically generated XML specific to the current request.
One level out, we configure the "site" and "ext" input modules, to map onto our dynamically configured "linkmap" module.
Then at the outermost level, we configure the "linkrewriter" transformer. First we tell it which attributes to consider rewriting:
```
                <link-attrs>href src</link-attrs>
                <schemes>site ext</schemes>
              
```
So, "href" and "src" attributes starting with "site:" or "ext:" are rewritten.

By nesting the "site" and "ext" input modules in the "linkrewriter" configuration, we tell "linkrewriter" to use these two input modules when rewriting links.

The end result is that, for example, the source XML for the community/body-index.html page has its links rewritten by an XMLFileModule reading XML from cocoon:/community/linkmap-index.html

Dynamically generating a linkmap

Why do we need this "linkmap" pipeline generating dynamic XML from site.xml, instead of just using site.xml directly? The reasons are described in the linkmap RT: we need to concatenate @hrefs and add dot-dots to the paths, depending on which directory the linkee is in. This is done with the following pipelines in linkmap.xmap ...


<!-- site.xml with @href's appended to be context-relative. -->
<map:match pattern="abs-linkmap">
  <map:generate src="content/xdocs/site.xml" />
  <map:transform src="resources/stylesheets/absolutize-linkmap.xsl" />
  <map:serialize type="xml" />
</map:match>

<!-- Linkmap for regular pages -->
<map:match pattern="**linkmap-*">
  <map:generate src="cocoon://abs-linkmap" />
  <map:transform src="resources/stylesheets/relativize-linkmap.xsl">
    <map:parameter name="path" value="{1}{2}" />
    <map:parameter name="site-root" value="{conf:project-url}" />
  </map:transform>
  <map:serialize type="xml" />
</map:match>

You can try these URIs out directly on a live Forrest to see what is going on (for example, Forrest's own abs-linkmap).

Forrest Sitemap Reference

Getting started

Sitemap Overview

Source pipelines (**.xml)