5

Customizing DocBook

$Revision: 7799 $

$Date: 2008-03-03 12:32:19 -0500 (Mon, 03 Mar 2008) $


For the applications you have in mind, DocBook “out of the box” may not be exactly what you need. Perhaps you need additional inline elements or perhaps you want to remove elements that you never want your authors to use. By design, DocBook makes this sort of customization easy.

This chapter explains how to make your own customization layer. You might do this in order to:

You can use customization layers to extend DocBook or subset it. Creating a schema that is a strict subset of DocBook means that all of your instances are still completely valid DocBook instances, which may be important to your tools and stylesheets, and to other people with whom you share documents. An extension adds new structures, or changes the schema in a way that is not compatible with DocBook. Extensions can be very useful, but might have a great impact on your environment.

Customization layers can be as small as restricting an attribute value or as large as adding an entirely different hierarchy on top of the inline elements.

Should You Do This?

Changing a schema can have a wide-ranging impact on the tools and stylesheets that you use. It can have an impact on your authors and on your legacy documents. This is especially true if you make an extension. If you rely on your support staff to install and maintain your authoring and publishing tools, check with them before you invest a lot of time modifying the schema. There may be additional issues that are outside your immediate control. Proceed with caution.

That said, DocBook is designed to be easy to modify. This chapter assumes that you are comfortable with XML and RELAX NG grammar syntax, but the examples presented should be a good springboard to learning the syntax if it's not already familiar to you.

If You Change DocBook, It's Not DocBook Anymore!

Starting with DocBook V5.0, DocBook is identified by its namespace, http://docbook.org/ns/docbook. The particular version of DocBook to which an element conforms is identified by its version attribute. If the element does not specify a version, the version of the closest ancestor DocBook element that does specify a version is assumed. The version attribute is required on the root DocBook element.

If you make any changes to the DocBook schema, it is imperative that you provide an alternative version identifier that you use for the schema and the modules you changed. The license agreement under which DocBook is distributed gives you complete freedom to change, modify, reuse, and generally hack the schema in any way you want, except that you must not call your alterations “DocBook”.

The following format is recommended:

base_version-[subset|extension|variant] (name[-version])+

For example, version 1.0 of Acme Corporation's extension of DocBook V5.0 could be identified as “5.0-extension acme-1.0”.

A document that relied on the version 3.2 of Example Corporation's subset of DocBook V5.0, MathML 2.0, and SVG 1.1 could be identified as: “5.0-subset example-3.2 mathml-2.0 svg-1.1”.

If your schema is a proper subset, you can advertise this status by using the subset keyword in the version. If your schema contains any markup model extensions, you can advertise this status by using the extension keyword. If you'd rather not characterize your variant specifically as a subset or an extension, you can leave out this field entirely or, if you prefer, use the variant keyword.

Public Identifiers

Although not directly supported by RELAX NG, in some cases it may still be valuable to identify a DocBook V5.0 customization layer with a public identifier. A public identifier for DocBook V5.0 is:

-//OASIS//DTD DocBook V5.0//EN

If you make any changes to the structure of the schema, it is imperative that you alter the public identifier that you use to identify it.

You should change both the owner identifier and the description. Formal public identifiers for the base DocBook modules would have identifiers with the following syntax:

-//OASIS//text-class DocBook description Vversion//EN

Your own formal public identifiers should use the following syntax in order to record their DocBook derivation:

-//your-owner-ID//text-class DocBook Vversion-Based [Subset|Extension|Variant] your-descrip-and-version//lang

For example:

-//O'Reilly//DTD DocBook V5.0-Based Subset V1.1//EN

If your schema is a proper subset, you can advertise this status by using the Subset keyword in the description. If your schema contains any markup model extensions, you can advertise this status by using the Extension keyword. If you'd rather not characterize your variant specifically as a subset or an extension, you can leave out this field entirely, or, if you prefer, use the Variant keyword.

Customization Layers

A RELAX NG grammar is a collection of patterns. These patterns can be stored in a single file or in a collection of files that import each other. Patterns can augment each other in a variety of ways. A complete grammar is the logical union of the specified patterns.

For convenience, the DocBook grammar is distributed in a single file.

RELAX NG Syntax

There are two standard syntaxes for RELAX NG, an XML syntax and a “compact” text syntax. The two forms have the same expressive power; it is possible to transform between them with no loss of information.

Many users find the relative terseness of the compact syntax makes it a convenient form for reading and writing RELAX NG. That is the form we will use in the following examples.

DocBook Pattern Names

The names of the patterns used in a RELAX NG grammar are arbitrary, they have nothing to do with the names of the elements and attributes defined by the schema. The DocBook RELAX NG grammar employs a number of naming conventions in order to make it easier to navigate.

db.*.attlist

Defines the list of attributes associated with an element. For example, db.emphasis.attlist is the pattern that matches all of the attributes of the emphasis element.

db.*.attribute

Defines a single attribute. For example, db.conformance.attribute is the pattern that matches the conformance attribute on all of the elements where it occurs.

db.*.attributes

Defines a collection of attributes. For example, db.effectivity.attributes is all of the effectivity attributes (arch, audience, etc.).

db.*.blocks

Defines a list (a choice of) a set of related block elements. For example, db.list.blocks is a pattern that matches any of the list elements.

db.*.contentmodel

Defines a fragment of content model shared by several elements.

db.*.enumeration

Defines an enumeration, usually one used in an attribute value. For example, db.revisionflag.enumeration is a pattern that matches the list of values that can be used as the value of a revisionflag attribute.

db.*.info

Defines the info element for a particular element. For example, db.example.info is the pattern that matches info on example.

Almost all of the info elements are the same, but they are described with distinct patterns so that customizers can change them selectively.

db.*.inlines

Defines a list (a choice of) a set of related inline elements. For example, db.link.inlines is a pattern that matches any of the linking-related elements.

db.*.role.attribute

Defines the role attribute for a particular element. For example, db.emphasis.role is the pattern that matches role on emphasis.

All of the role attributes are the same, but they are described with distinct patterns so that customizers can change them selectively.

db.*

Is the pattern that matches a particular DocBook element. element. For example, db.title.role is the pattern that matches title.

RELAX NG allows multiple patterns to match the same element, so sometimes these patterns come in flavors, for example, db.indexterm.singular, db.indexterm.startofrange, and db.indexterm.endofrange. Each of these patterns matches a indexterm with varying attributes.

These are conventions, not hard and fast rules. There are patterns that don't follow these conventions.

The General Structure of Customization Layers

Although customization layers vary in complexity, most of them have the same general structure as other customization layers of similar complexity.

In the most common case, you probably want to include all of DocBook, but you want to make some small changes. These customization layers tend to look like this:

namespace db = "http://docbook.org/ns/docbook"
# perhaps other namespace declarations

include "docbook.rnc"                            (1)

# new patterns and augmented patterns            (2)

1

Start by importing the base DocBook schema.

2

Then you can add new patterns or augment existing patterns.

If you want to completely replace a pattern (for example, to remove or completely change an element), the template is a little different.

namespace db = "http://docbook.org/ns/docbook"
# perhaps other namespace declarations

include "docbook.rnc" {
   # redefinitions of DocBook patterns           (1)
}

# new patterns and augmented patterns            (2)

1

You can redefine patterns in the body of an import statement. These patterns completely replace any that appear in the imported schema.

2

As before, patterns outside the include statement can augment existing patterns (even redefined ones).

There are other possibilities as well, these examples are illustrative, not exhaustive.

Writing, Testing, and Using a Customization Layer

The procedure for creating, testing, and using a customization layer is always about the same. In this section, we'll go through the process in some detail. The rest of the sections in this chapter describe a range of useful customization layers.

Deciding What to Change

If you're considering writing a customization layer, there must be something that you want to change. Perhaps you want to add an element or attribute, remove one, or change some other aspect of the schema.

Adding an element, particularly an inline element, is one possibility. If you're writing about cryptography, you might want to add a “cleartext” element, for example.

Deciding How to Change a Customization Layer

Figuring out what to change may be the hardest part of the process. Finding something similar usually provides a good model for new changes.

Depending on the exact focus of your document, there are probably several candidates. In this case, all of the following look plausible: technical inlines, programming inlines, and domain inlines. Let's suppose you chose the domain inlines.

As shown in Example 5.1, “Adding cleartext with a Customization Layer”, your customization would import the DocBook schema, extend the domain inlines, and then provide a pattern that matches the new element.

Example 5.1. Adding cleartext with a Customization Layer

namespace db = "http://docbook.org/ns/docbook"
default namespace = "http://docbook.org/ns/docbook"

include "docbook.rnc"

db.domain.inlines |= db.cleartext                (1)

# Define a new cleartext element:                (2)

db.cleartext.role.attribute = attribute role { text }   (3)
db.cleartext.attlist =                           (4)
   db.cleartext.role.attribute?
 & db.common.attributes
 & db.common.linking.attributes

db.cleartext =                                   (5)
   element cleartext {
      db.cleartext.attlist,
      db._text
   }

1

The “|=” operator adds a new choice to a pattern. So this line makes the “db.cleartext” pattern a valid option where (anywhere that) “db.domain.inlines” appears.

2

Next, we create a pattern for the cleartext element. The convention in the DocBook schema is to create three patterns, one for the role attribute, one for the attributes, and one for the element. We use that pattern here in case someone wants to customize our customization.

3

Defining a separate pattern for the role attribute makes it easy for customizers to change it on a per-element basis.

4

Defining a separate pattern for the attributes makes it easy for customizer to change them on a per-element basis.

5

The pattern for the element pulls it all together. The pattern “db._text” matches text plus a number of ubiquitous or nearly ubiquitous inlines. It's recommended unless you really, really want only text.


Using Your Customization Layer

Using a customization layer is as simple as referring to it instead of the base DocBook schema where your tools offer the option.

Testing Your Work

Schemas, by their nature, contain many complex, interrelated patterns. Whenever you make a change to the schema, it's always wise to use a validator to double-check your work.

Start by validating a document that's plain, vanilla DocBook, one that you know is valid according to the DocBook standard schema. This will help you identify errors that you've introduced to the schema itself. After you are confident that the schema is correct, begin testing with instances that you expect (and don't expect) to be valid against it.

Removing Elements

DocBook has a large number of elements. In some authoring environments, it may be useful or necessary to remove some of these elements.

Removing msgset

MsgSet is a favorite target. It has a complex internal structure designed for describing interrelated error messages, especially on systems that may exhibit messages from several different components. Many technical documents can do without it, and removing it leaves one less complexity to explain to your authors.

Example 5.2, “Removing msgset” shows a customization layer that removes the msgset element from DocBook:

Example 5.2. Removing msgset

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.msgset = notAllowed
}

The complexity of msgset is really in its msgentry children. DocBook V4.5 introduced a simple alternative, simplemsgentry. Example 5.3, “Removing msgentry” demonstrates how you could allow msgset but only support the simpler alternative.

Example 5.3. Removing msgentry

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.msgentry = notAllowed
}

Closer examination of the msgentry content model will reveal that it contains a number of descendants. It isn't necessary, but it wouldn't be wrong, to define their patterns as notAllowed as well.

Removing Computer Inlines

DocBook contains a large number of computer inlines. The DocBook inlines define a domain-specific vocabulary. If you're working in another domain, many of them may be unnecessary.

They're defined in a set of patterns that ultimately roll-up to the “db.domain.inlines” pattern. If you make that pattern “notAllowed”, you'll remove them all in one fell swoop.

Example 5.4. Removing Computer Inlines

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.domain.inlines = notAllowed
}


If you want to be more selective, you might consider making one or more of the set not allowed instead: “db.error.inlines”, errors and error messages; “db.gui.inlines”, GUI elements; “db.keyboard.inlines”, key and keyboard elements; “db.markup.inlines”, markup elements; “db.math.inlines”, mathematical expressions; “db.os.inlines”, operating system inlines; and “db.programming.inlines”, programming-related inlines.

It's likely that a customization layer that removed this many technical inlines would also remove some larger technical structures (msgset, funcsynopsis).

Removing Synopsis Elements

Another possibility is removing the complex synopsis elements. The customization layer in Example 5.5, “Removing CmdSynopsis and FuncSynopsis” removes cmdsynopsis and funcsynopsis.

Example 5.5. Removing CmdSynopsis and FuncSynopsis

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.funcsynopsis = notAllowed
   db.cmdsynopsis = notAllowed
}

Removing Sectioning Elements

Perhaps you want to restrict your authors to only three levels of sectioning. To do that, you must remove the sect4 and sect5 elements, as shown in Example 5.6, “Removing sect4 and sect5 Elements ”.

Example 5.6. Removing sect4 and sect5 Elements

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.sect4 = notAllowed

   # Strictly speaking, we don't need to remove sect5 because, having removed
   # sect4, there's no way to reach it. But it seems cleaner to do so.
   db.sect5 = notAllowed
}

This technique works if your authors are using numbered sections. You could require them to do so by removing section. But suppose instead you want to allow them to use recursive sections and still limit them to only three levels.

One way to do this would be to define new “section2” and “section3” patterns, as shown in Example 5.7, “Limiting recursive sections to three levels”.

Example 5.7. Limiting recursive sections to three levels

namespace db = "http://docbook.org/ns/docbook"
default namespace = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.section =
      element section {
         db.section.attlist,
         db.section.info,
         db.recursive.blocks.or.section2s,
         db.navigation.components*
      }
}

db.recursive.section2s = (db.section2+, db.simplesect*) | db.simplesect+

db.recursive.blocks.or.section2s =
  (db.all.blocks+, db.recursive.section2s?) | db.recursive.section2s

db.section2 =
   element section {
      db.section.attlist,
      db.section.info,
      db.recursive.blocks.or.section3s,
      db.navigation.components*
   }

db.recursive.section3s = (db.section3+, db.simplesect*) | db.simplesect+

db.recursive.blocks.or.section3s =
  (db.all.blocks+, db.recursive.section3s?) | db.recursive.section3s

db.section3 =
   element section {
      db.section.attlist,
      db.section.info,
      db.all.blocks+
      db.navigation.components*
   }

Another solution, assuming your validation environment supports Schematron, is simply to add a new rule, as shown in Example 5.8, “Limiting recursive sections to three levels”.

Example 5.8. Limiting recursive sections to three levels

namespace db = "http://docbook.org/ns/docbook"
namespace s = "http://www.ascc.net/xml/schematron"
default namespace = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.section =
      [
         s:pattern [
            name = "Limit depth of sections"
            s:rule [
               context = "db:section"
               s:assert [
                  test = "count(ancestor::db:section) < 2"
                  "Sections can be no more than three levels deep"
               ]
            ]
         ]
      ]
      element section {
         db.section.attlist,
         db.section.info,
         db.recursive.blocks.or.sections,
         db.navigation.components*
      }
}

Removing Admonitions from Table Entries

Sometimes what you want to do is not as simple as entirely removing an element. Instead, you want to remove it only from some contexts. The way to accomplish this task is to redefine the patterns used to calculate the elements allowed in those contexts.

Standard DocBook allows any inline element or any block element to appear in a table cell. You might decide that it's unreasonable to allow admonitions (note, caution, warning, etc.) to appear in a table cell.

In order to remove them, you must change what is allowed in an entry, as show in Example 5.9, “Removing Admonitions from Tables”.

Example 5.9. Removing Admonitions from Tables

namespace db = "http://docbook.org/ns/docbook"
default namespace = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.entry = element entry {
      db.entry.attlist,
      (db.all.inlines* | db.some.blocks*)
   }
}

db.some.blocks =
   db.somenopara.blocks
 | db.para.blocks
 | db.extension.blocks

db.somenopara.blocks =
   db.list.blocks
 | db.formal.blocks
 | db.informal.blocks
 | db.publishing.blocks
 | db.graphic.blocks
 | db.technical.blocks
 | db.verbatim.blocks
 | db.bridgehead
 | db.remark
 | db.revhistory
 | db.indexterm
 | db.synopsis.blocks

The extent to which any particular change is easy or hard depends in part on how many patterns need to be changed. The DocBook Technical Committee is generally open to the idea of adding more patterns if it improves the readability of customization layers. Feel free to ask, if you think some refactoring would make your job easier.

Removing Attributes

Just as there may be more elements than you need, there may be more attributes.

Removing an Attribute

Suppose your processing system doesn't support “continued” lists. You want to remove the continuation attribute from the orderedlist element. There are two ways you could accomplish this. One way would be to redefine the “db.orderedlist.continuation.attribute” as not allowed; the other would be to redefine the “db.orderedlist.attlist” pattern so that it does not include the continuation attribute. Either would accomplish the goal.

Example 5.10. Removing continuations from orderedlist

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.orderedlist.continuation.attribute = empty
}

Subsetting the Common Attributes

DocBook defines a whole set of “common attributes”; these attributes appear on every element. Depending on how you're processing your documents, removing some of them can both simplify the authoring task and improve processing speed.

Some obvious candidates are:

Effectivity attributes (Arch , OS,...)

If you're not using all of the effectivity attributes in your documents, you can get rid of up to seven attributes in one fell swoop.

lang

If you're not producing multilingual documents, you can remove lang.

remap

The remap attribute is designed to hold the name of a semantically equivalent construct from a previous markup scheme (for example, a Microsoft Word style template name, if you're converting from Word). If you're authoring from scratch, or not preserving previous constructs with remap, you can get rid of it.

xreflabel

If your processing system isn't using xreflabel, it's a candidate as well.

The customization layer in Example 5.11, “Removing Common Attributes” reduces the common attributes to just xml:id version, and lang.

Example 5.11. Removing Common Attributes

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.common.base.attributes =
      db.version.attribute?
    & db.xml.lang.attribute?
}

The xml:id attribute is added in two other patterns, one where it's required and one where it's optional.

Adding Elements: Adding a sect6

Adding a new inline or block element is generally a straightforward matter of creating a pattern for the new element and “|=” adding it to the right pattern. But if your new element is more intimately related to the existing structure of the document, it may require more surgery.

Example 5.12, “Adding a sect6 Element” extends DocBook by adding a sect6 element.

Example 5.12. Adding a sect6 Element

namespace db = "http://docbook.org/ns/docbook"
default namespace = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.sect5.sections = (db.sect6+, db.simplesect*) | db.simplesect+
}

db.sect6.sections = db.simplesect+

db.sect6.status.attribute = db.status.attribute
db.sect6.role.attribute = attribute role { text }
db.sect6.attlist =
   db.sect6.role.attribute?
 & db.common.attributes
 & db.common.linking.attributes
 & db.label.attribute?
 & db.sect6.status.attribute?

db.sect6.info = db._info.title.req

db.sect6 =
   element sect6 {
      db.sect6.attlist,
      db.sect6.info,
      ((db.all.blocks+, db.sect6.sections?)
       | db.sect6.sections),
      db.navigation.components*
   }

Here we've redefined sect5 to include sect6 and provided a pattern for sect6.

Other Modifications: Classifying a Role

The role attribute, found on almost all of the elements in DocBook, is a text attribute that can be used to subclass an element. In some applications, it may be useful to modify the definition of role so that authors must choose one of a specific set of possible values.

In Example 5.13, “Changing role on procedure”, role on the procedure element is constrained to the values required or optional.

Example 5.13. Changing role on procedure

namespace db = "http://docbook.org/ns/docbook"

include "docbook.rnc" {
   db.procedure.role.attribute = attribute role { "required" | "optional" }
}