Go to the first, previous, next, last section, table of contents.

Shared Document Elements

Most the elements described here can be used with any of the document types.

Paragraphs

The text of the document is broken into paragraphs. A blank line within the body of the document will start a new paragraph. Note that leaving multiple blank lines will create multiple empty paragraphs, which is undesirable, even though these will often be ignored by the typesetting system. Likewise, leaving a blank line before a section title will generate an empty paragraph.

The formatting of text within a paragraph (for example, justification) is controlled by the formatter. Normally it is expected that linefeeds and tabs will be treated as blanks and multiple blanks will be treated as a single blank. The default behaviour may be changed for some elements (section Program Code).

Some characters are special to SGML and sometimes need to be escaped when they appear in the text. These are usually:

a less-than symbol followed by a letter or certain punctuation characters (but less-than followed by a space or digit is no problem).
an ampersand followed by a letter, which is treated as an entity reference (often a character reference, see the gf User's Manual).

There are several methods for safely generating these symbols, e.g., use `<' for < and `&' for &.

Occasionally certain characters will be used for a special purpose within particular elements, e.g., the colon within the `x-header' element in the smemo document type. See the gf User's Manual for information on how to generate any of these symbols.

To include symbols which are not part of the normal character set, several of the character entity sets defined by the ISO can be used. The entity sets do not need to be declared in the DTD subset (as is the case for the "general" DTD). The gf User's Manual has tables showing the characters available (see also section Character Entity Support).

The DTD also defines a few extra symbols which were forgotten by the ISO: `&TeX;', `&LaTeX;' and `&smiley;'.

Some formatting styles apply extra space at particular points in the text, such as a space following a full-stop, question mark, exclaimation mark or colon. If this pattern arises for some other reason (e.g., an abbreviation) then the non-ISO character entity `&wsp;' can be used to represent a normal inter-word space, as in:

This is a sentence. This is an abbrev.&wsp;within a G.H.&wsp;sentence.

Non-breakable spaces can be represented using the ISO ` ' character entity.

Addresses

The `<address>' element can be used in various places, such as the `<author>' element in the spaper document type. An address contains a line of text, possibly including embedded highlighted phrases or short quotations. Subelements can also be tagged if needed, for example if the name of a country is to be typeset differently to the rest of an address. Valid subelements are: `<org>', `<street>', `<city>', `<region>', `<country>', `<postcode>' and `<postbox>'.

An address line could be coded as something like:

<pobox/P.O. Box 1234/, <city/Wellington/,;

Note that the punctuation has not been included in the sub-elements.

Short Quotations

Short quotations can appear almost anywhere: they appear between pairs of double quotes. Note that it is always necessary to balance the quotation marks correctly. Short quotations may contain highlighed phrases (see section Highlighted Phrases) and other short quotations. However in this latter case it is necessary to use full SGML tags (`<Q>' and `</Q>') rather than quotation marks, since otherwise the start of the embedded quotation would be incorrectly identified as the end of the first.

Sections

Sections may be nested hierarchically (chapter, section, sub-section...). These are represented in the text by the tags `<h1> <h2> <h3> <h4> <h5>', so that five levels of nesting are possible. For example:

This is text at the top level.
<h1>Common Features
This is text in the first level section.
<h2>Sections
This is down here in the subsection.

which represents a subsection called "Sections" within the top-level section "Common Features". Each section tag is followed by the section title, which is terminated by a blank line so that multi-line titles are possible. The numbering of the sections is performed (or not performed) automatically by the formatter.

Highlighted Phrases

A highlighted phrase is a short piece of text which is intended to stand out from its surroundings in some way. Often in printed documents this is acheived by changing the typeface, for example to bold or italic.

Simple emphasis can be marked using the `<emph>' tag:

You are <emph>not</emph> required to understand this.

which may be formatted as:

You are not required to understand this.

The terminating tag may not be omitted, although SGML minimalisation techniques can be used (see the gf User's Manual), e.g.,

You are <emph/not/ required to understand this.

As in the "general" DTD, the hp (highlighted phrase) tags can also be used, although in general they should be avoided in favour of tags better reflecting the reason for the highlighting. Default typefaces for the hp tags are:

hp0: roman
hp1: italic
hp2: bold
hp3: bold italic

and an example of usage is:

<hp1/Hi I'm Lighted/

There are many reasons why particular pieces of text are highlighted. It is not possible in a general purpose DTD to provide specialised tags for every situation.

One solution to this is the "architectural form" concept for the creation of specific elements which conform to a given structure. New elements can be created for a single document and will be associated with default processing rules (in this case those of the `<emph>' element).

A new highlighted phrase can be created in the document type declaration subset at the beginning of the document as follows:

<!DOCTYPE spaper PUBLIC "-//Houston//DTD snafu5//EN"[
   <!-- The name of a fish.  -->
   <!ENTITY % u.phrases "fishes-name">
]>

After this declaration, it is possible to use the `<fishes-name>' element in the same way as `<emph>'. If several customized phrase elements are required, use the construction:

"fishes-name | moa-sighting | sign-of-intelligent-life"

It is important to either use meaningful element names or provide comments for the declarations: an element like `<R23J5>' may be ambiguous to someone else reading the file.

Lists

There are four types of lists available:

<ol> Ordered List: each item in the list is assigned a number by the program.
<ul> Unordered List: items are marked with a bullet or other symbol.
<sl> Simple List: items are not marked.
<tl> Tagged List: a descriptive tag is given for each item.

Within an ordered, unordered or simple list individual items are marked with `<li>' (list item). For example:

<ol><li>This is an
<li>ordered list with
<li>three items.
</ol>

which may appear as:

This is an
ordered list with
three items.

Note that the end tag of the list (`</ol>' in this case) must not be omitted.

The tagged list is treated slightly differently. List items are marked with `<tli>' (tagged list item), and consist of a tag value followed by a colon and the list text. For example:

<tl><tli>xlsfonts: list the fonts available on the X-server
<tli>xfd: display the characters in a single font
<tli>xtetris: an unacceptable alternative to xlsfonts and xfd
</tl>

which may appear as:

xlsfonts: list the fonts available on the X-server
xfd: display the characters in a single font
xtetris: an unacceptable alternative to xlsfonts and xfd

Notes

A number of predefined note elements are available, together with an architectural form for defining additional elements. Notes should be used to contain text which is not considered to be part of the main content of the document. They are placed in the document at the position of the text to which they refer, but how they are displayed is controlled by the formatter.

Notes contain one or more paragraphs, and paragraph sub-elements (with the exception of notes and figures) may be included.

Footnotes

Footnotes are generally typeset at the foot of the page, although in some styles they are moved to the end of the document. The text for a footnote is specified within the `<fn>' element:

<fn>I pleaded insanity and was back on the net(2) within a week.</fn>.

The terminating `</fn>' (or minimalised version) must not be omitted.

"Ordinary" notes

The `<note>' element is similar to a footnote, but does not carry the implication that it should be typeset at the foot of the page.

Annotation

The `<annotate>' element is intended for use by someone other than the original author of the document, for the addition of comments or whatever. An optional attribute `resp' can be used to indicate the person responsible for the annotation:

<annotate resp=ed/Complete disregard for the facts is displayed as usual./

Customized note elements

New note elements can be defined within the document, in a similar way to the declaration of new phrases (see section Highlighted Phrases).

A new note element is created in the document type declaration subset at the beginning of the document, e.g.,

<!DOCTYPE spaper PUBLIC "-//Houston//DTD snafu5//EN"[
   <!-- Something that occured to me while writing the article.  -->
   <!ENTITY % u.notes "thought">
]>

After this declaration, it is possible to use the `<thought>' element in the same way as `<note>'. If several customized note elements are required, use the construction:

"thought | idea | notion"

It is important to either use meaningful element names or provide a comment on the contents of each new note element.

Figures

Encapsulated PostScript figures can be included in a document by using the `<fig>' element. However the figure must first be declared in the DTD subset, which appears in square brackets following the DTD declaration at the beginning of the document. For example:

<!DOCTYPE spaper PUBLIC "-//Houston//DTD snafu5//EN"[
  <!ENTITY dogfig SYSTEM "dog.ps" NDATA EPSF>
]>

In this example "dog.ps" is the name of the external file and "dogfig" is the name by which it will be refered to within the SGML document.

A simple example of the use of the `<fig>' element is:

<fig><figbody file="dogfig"><figcap>A picture of a dog</fig>

The figure is composed of a `<figbody>' element, in this case the encapsulated PostScript file "dog.ps", and an optional `<figcap>' element containing the figure caption.

An alternative to including a PostScript figure is to leave blank space in the figure body. In this case the body of the figure would be specified as something like:

<figbody space="10cm">

The `space' attribute must start with a digit and be followed by a dimension. The dimension should be `cm' (centimetres), `mm' (millimetres), `in' (inches) or `pt' (points). Other dimensions may also work, but this will depend on the typesetter.

The way in which the figure is constructed can be controlled further by specifying attributes of the `<fig>' and `<figbody>' elements. An attribute of the `<fig>' element controls how the figure will be positioned in the formatted document:

<fig inline>: Place the figure in the current position, without interruption of the current paragraph.
<fig here>: Interrupt the current paragraph and place the figure.
<fig top>: Only place the figure at the top of a page.
<fig bottom>: Only place the figure at the bottom of a page.
<fig page>: Place the figure on its own page.
<fig float>: Allow the typesetter to position the figure as it sees fit.

If the attribute is not specified then `float' is used.

The `<figbody>' element may take several attributes which describe how the encapsulated PostScript figure should be included. These are:

x-scale: horizontal scaling factor (default = 1)
y-scale: vertical scaling factor (default = 1)
rotation: angle of rotation in degrees (default = 0)
position: horizontal position, `left', `centre' or `right'. The default is `centre'.

For example,

<fig file=dogfig x-scale=0.8 y-scale=0.5 rotation=90>

will rotate the figure by 90{deg} and reduce the scale both horizonally and vertically. Note that fractional values of `x-scale' and `y-scale' must include the leading zero.

The current LaTeX formatter only supports rotation values of 0, 90, 180 and 270.

Program Code

A set of elements allows the tagging of text representing program listings, commands to be typed and other such things. Code is typically typeset using a fixed-width typeface, and is divided into two classes:

"inline" code, embedded within text and typically typeset in a similar way to its surroundings, e.g., with the same justification and the same treatment of multiple blanks--only the font will change.
"displayed" code, where the flow of text is broken to present the code (typically a program fragment or list of commands). In this case line breaks and spaces within the code will usually be unchanged by the formatter.

The basic element for tagging inline `code' is `<code>':

To <code>code</code> inline <code/code/ use <code/code/.

Two predefined elements are available for tagging displayed code: `<listing>' and `<code-lines>', which are just two ways of saying the same thing. For example(3):

<listing>
 A(    W,_) &&!B[W]; if(!j){ z]=1; R; } else{ N(m); _; v(0); }
</listing>

Neither of the styles of code will give a true "verbatim" representation of the coded text. Sequences which are special to SGML will continue to be resolved: for example `<' will give a "<" symbol. Care must be taken with text which happens to include such symbols, e.g.,

#include <stdio.h>

Such symbols can be escaped, as described in section Paragraphs. For large sections of program code, it may be useful to use a marked section as described in the gf User's Manual. An alternative is to import the code directly from an external file, as described in section External Entities.

The default elements described above may be too vague for some uses. The contents can be described more precisely through the use of architectural forms, as described in section Highlighted Phrases. To define a new inline or displayed code elements, place something like the following in the document type declaration subset:

  <!-- The name of commands in the foo and bar shells.  -->
  <!ENTITY % u.code "foo-cmd | bar-cmd">
  <!-- Listings in various advanced programming languages.  -->
  <!ENTITY % u.listing "fortran | cobol | basic">

After these declarations, tags like `<foo-cmd>' and `<fortran>' can be used.

TeX Equations

The TeX equation element can be used to enter mathematical formulae into a document. While this is not the same as coding an equation directly in SGML, TeX is a widely used and sometimes highly regarded means of expression for mathematics.

The disadvantage of using TeX markup is that it would be difficult to typeset the equation using anything other than TeX. The current ASCII and RTF formatters do not even attempt the conversion, but simply mark the TeX elements from the surrounding text.

The `texeqn' element is intended for papers requiring the odd formula, not necessarily for your thesis on Lie groups and quantum gravity!

A TeX equation can be typeset within the surrounding text, e.g.,

The distance (<texeqn>r = \sqrt{x^2 + y^2}</texeqn>) can be large and
positive.

which will appear as:

The distance ( ) can be large and positive.

Displayed equations can be created using the `texeqn-display' element, e.g.,

<texeqn-display>\hbox{assuming } r \hbox{ positive}</texeqn-display>

which is typeset as:

Note that the TeX equation should be entered using the notation defined in the TeXbook; i.e., corresponding to the plain TeX format. When LaTeX is being used as the typesetter, it is likely that LaTeX macros will be typeset correctly. However taking advantage of this feature is not recommended.

Cross References

Several elements can be used to refer to another part of the document, as in "see section 4 on page 20" or just "see section 4". The formatter will usually be able to supply the counts automatically. The elements which can be referenced are sections, figures, list items in ordered lists, and footnotes.

The first method for supplying references is to use the `<ref>' element. A label is added to the element being referenced, and the name of the label is entered in the reference itself. A label is supplied by adding an `id' attribute, e.g.,

<h1 id=dogs>

The cross reference is then created using something like:

see section <ref refid=dogs>

Names used for labels must start with a letter, and remaining characters may only be letters, digits or hyphens. Labels are not case sensitive.

By default the formatter will supply the count of the item selected, e.g., the section number. Alternatively the page may be displayed by specifying the `page' attribute, e.g., `see section <ref refid=dogs> on page <ref refid=dogs page>'.

There are some situations in which the `<ref>' element does not work well, for example when formatting with Texinfo. An alternative set of references may be preferable. Each element which can take an ID attribute (`<hx>', `<fig>', `<fn>' and other notes, and `<li>') has a corresponding reference element (`<hdref>', `<figref>', `<fnref>' and `<liref>'). Instead of simply generating the section or page number, these elements produce complete references. For example, `<hdref>' may generate "Section 4.1", and `<hdref page>' may generate "Section 4.1 on page 20"(4).

The third way of making a reference is to supply the text explicitly, rather than relying on the formatter produce it automatically. This can be done with any of the reference elements above. An example is: `see <hdref/section 4/'. This just marks the text as a cross reference: the formatter will not necessarily do anything special with it.

External Entities

As described in the gf User's Manual, it is possible to include text from an external file at one or more points within an SGML document by first declaring the file as an "external entity".

Several SGML notations are defined in the DTD to allow external files in non-SGML formats to be referenced. These are:

EPSF: Encapsulated PostScript, used only within the `<fig>' element.
ASCII or ISO646: for including external text files at any point in a document, for example within the `<code>' element.

Latin1 is also defined, but not currently supported by gf.

The Appendix

A document may optionally include an appendix, which consists of one or more top-level sections. The start of the appendix is marked with the `<appendix>' tag. E.g.,

<appendix>
<h1>This is the title of the first section in the appendix
This is the contents of the first section.

The Index

A simple index can be generated for an spaper document by placing the tag `<index>' at the end of the document. No text should be written after the index tag: it will be generated automatically by the formatter.

Items to be placed in the index can be marked throughout the document using the `<ix>' tag, for example, `<ix/$100,000/'. The text "$100,000" will be placed both in the body of the document and in the index. An entry can be placed in the index without generating text in the body by using the construction: `<ix print=""/bug reward/'. More generally, any text placed between the quotation marks will be put into the body.

It will often require special procedures to coax the formatter into actually generating the printed index. In the case of LaTeX, the commands are:

latex foo
makeindex foo
latex foo

The `makeindex' program is included in the LaTeX distribution.

In the case of Texinfo, the `texi2dvi' program (from the Texinfo distribution) will take care of the details automatically.

Character Entity Support

The Snafu DTD supports a number of ISO character entity sets, as well as a few additional symbols. See the declarations in the appendix and the tables in the gf user's manual for more details.

Additional characters can also be used, after a two step procedure:

Provide a declaration of the new symbol in the document type declaration subset, e.g.,
```
<!ENTITY BleECh SDATA "[BleECh]">
```
Provide either a portable description of the character entity (e.g., an encapsulated PostScript image) or system-specific configuration files for every system to be used (i.e., gf map files: see the gf user's manual for details). System-specific configuration will be necessary if you want to do anything useful with the document, with the portable description useful only for pure interchange or archival purposes.

Go to the first, previous, next, last section, table of contents.