Go to the first, previous, next, last section, table of contents.

SGML Facilities

This section describes a few constructions which may be useful in SGML documents. Note that not all SGML features are described here: consult the standard for a complete discussion.

Comments

SGML commenting facilities can be used to put text into a source document which will not be visible to a processing system. Any text within the construction:

<!--  text -->

will be removed when the document is parsed. Within a comment, pairs of double hyphens "--" must be balanced(1).

Likewise, the construction `<!>' is ignored and can be used for various purposes.

Minimalisation Techniques

The full markup of a non-empty SGML element is of the form:

<ELEMENT>element content</ELEMENT>

It will often be possible to omit the terminating tag: this depends on how the element is defined in the DTD. Sometimes the initial tag can also be omitted.

When the terminating tag is required, there are shorter forms which can be used:

<ELEMENT/element content/
<ELEMENT>element content</>

The symbol `</>' closes the most recently opened element. Beware that it may cause confusion when the element contains hidden sub-elements, since it will be the sub-element which is closed.

Likewise, there is a construction `<>' which reopens the last opened element. This can be just as confusing as `</>'.

When a number of tags appear in sequence the closing delimiters can be omitted, e.g., `<li<it>' is equivalent to `<li><it>'. A similar case is `</ul<ul>'.

A more drastic minimalisation technique is the short reference, several of which will often be defined by a DTD. They allow a single character to be used to represent an entity of some kind. Common examples are the use of " as an abbreviation for a the beginning or end of a short quotation and a blank line to begin a new paragraph.

Short references can be context sensitive, by being activated only within a subset of the document. An example would be a vertical bar which separates horizontal cells in a table but retains its normal ASCII code in other parts of the document.

Entity Declarations

There are several ways in which declaring entities within a document can be useful. Entities must be declared in the document type declaration at the beginning of the document.

The declaration associates a name with an object of some kind. The default "concrete syntax" SGML rules for an entity name are restrictive: it may contain no more than eight characters, it must start with a letter, and remaining characters may only be letters, digits and hyphens. Letters are case sensitive. However many SGML applications loosen the restriction on name length through the use of a modified concrete syntax.

An example of an external entity declaration is:

<!DOCTYPE spaper PUBLIC "-//Houston//DTD snafu5//EN"[
  <!ENTITY dogfig SYSTEM "dog.ps" NDATA EPSF>
]>

This declares an external file with notation EPSF, to be known within the SGML document as "dogfig".

Replacement Text

The simplest use of an entity is to provide replacement text. This is useful for text which is repeated several times in the document or which is subject to frequent change. To declare such an entity, use a line like:

<!ENTITY response "<hp1>Are you sure that you want to 
do this &rarr; </hp1>">

The entity can then be referred to within the document with the entity reference: `&response;', which will generate:

Are you sure that you want to do this ->

External Entities

Text can also be brought into the document from external files. When the entity is declared, the content type can also be specified. By default, the entity contains normal marked-up SGML text and is declared using a line such as:

<!ENTITY junk SYSTEM "junk.sgml">

It is also possible to use the "notation" feature of SGML to incorporate external text which is in some non-SGML format. For example, if a notation ASCII is defined (probably by the DTD), then markup such as:

<!ENTITY realjunk SYSTEM "junk.c" NDATA ASCII>

could be used to include a program source file into the document.

Marked Sections

Text within marked sections will be treated in a non-standard way by the SGML parser. The text is inclosed within the construction:

<![ KEYWORD [marked text]]>

where `KEYWORD' can be:

CDATA
the marked text will be treated as "character data", which means that potential SGML element tags or entity references will be ignored. This is useful when including verbatim program code (or SGML markup) in a document: only the sequence `]]>' can give problems.
RCDATA
treat as "replaceable character data". Element tags will be ignored but entity references will be resolved as normal.
IGNORE
treat the marked text as a comment
INCLUDE
treat the marked text as ordinary text
TEMP
treat the marked text as ordinary text (which will presumably be removed later).


Go to the first, previous, next, last section, table of contents.