Go to the first, previous, next, last section, table of contents.

Document Types

This section briefly describes each of the DTDs supported by gf.

general DTD

The "general" DTD was one of the earliest document types. It was originally published by the ISO in Annex E of the 1986 SGML standard as a demonstration of a general purpose DTD for books or articles. The date of publication (not surprisingly) appears to have preceded the development of reliable SGML parsers, since the document type declaration contains a minor error. gf includes a corrected version, with a superfluous "address" element removed from the "titlep" element.

gf can convert documents conforming to the DTD into LaTeX. Unfortunately I don't know of any freely available documentation for the DTD. A description is rumoured to have been published in ISO TR 9573, "Techniques for using SGML". Some idea of the intent of the DTD can be guessed from the HTML and Snafu documentation.

There are deficiencies (or bugs) in the conversion. The ones that I know of are:

Others probably exist.


The HTML DTD is used in the World Wide Web project, which makes documents available across the Internet for interactive browsing.

gf can convert documents conforming to the HTML DTD into LaTeX, RTF and the plain output formats. Since HTML documents are often stored without an SGML declaration or DOCTYPE declaration, gf will add them on if required. However the <HTML> and </HTML> tags must still be supplied.

The HTML document type declaration included with gf is an IETF draft version dated 1993-07-01. It defines a modified concrete reference syntax which allows direct entry of 8-bit Latin 1 characters. The output from gf will be 7-bit LaTeX, ASCII or RTF or 8-bit Latin-1.

Some modification to the DTD was required: the `amp' entity was changed from `"&amp;"' to `"&"' and the `lt' entity from `"&lt;"' to `"<"', since the recursion is probably an error rather than a feature. The type of the `NAME' attribute in the linkattributes entity was also changed from `NMTOKEN' to `CDATA', since it needs to match `HREF' (`NMTOKEN's would be capitalised). I have also ignored the requirement to honour multiple spaces in the source document: this seems outdated.

Documentation on HTML can be obtained from the sources of WWW information, e.g., by ftp from `info.cern.ch' or by using WWW. The document used for this implementation (`draft-ietf-iiir-html-01.txt', by Tim Berners-Lee and Daniel Connolly) was obtained by ftp from `ds.internic.net'. This is somewhat out of date, since the HTML 2.0 specification is now being finalised.

Currently the creation of printed documents from HTML can be unreliable. The HTML DTD was designed to be used by network browsing software with relatively primitive parsing support. In addition there is the problem that it is possible for a document on the network to be readable via WWW, but not conform to the HTML DTD (and be unparsable by sgmls).

Snafu DTD

I created this DTD for my own use and experimentation. Three types types of document are supported: letters, memos and simple technical papers. Several versions have been created: gf should be able to process all versions from 2 onwards. Documents conforming to the memo and letter variants can be converted to LaTeX, ASCII and RTF, and technical papers can also be converted to Texinfo.

See the paper "The Snafu SGML Document Types" for more information.

Go to the first, previous, next, last section, table of contents.