HTML-PRETTY 1 "04 December 1997" "Version 1.00" [section 5 of 14]

.-3[SYNOPSIS]         .-2[DESCRIPTION]         .-1[OPTIONS]
Top
.+1[STYLE FILES]     .+2[CATALOG DIRECTORY]         .+3[COMMENTS IN HTML AND SGML]


FORMATTING CONVENTIONS

html-pretty groups HTML tags into collections of style classes. Tags within a single style class receive similar formatting. The built-in style-class tag-membership lists recognize a large collection of tags from multiple grammar levels, and multiple browser vendor extensions, but all of the built-in lists can be modified by the user, as described in the -extend-style and -stylefile options above, and the STYLE FILES section below.

The subsections which follow catalog the origins of the recognized tags, and then describe the available style classes in alphabetical order, listing their default tag members, and briefly sketching how the tags are formatted.

Tags that are not explicitly named in these subsections, or in style files that are read at run time, are treated as normal text, and have no effect on the indentation or line breaking, other than their contribution toward the line length limit.

Tags in the style classes doctype, line-break, math, plaintext, short, standalone, and standalone-nocheck make up the set of HTML tags with SGML content EMPTY, which means that end tags for them are forbidden. html-pretty will issue warnings about such end tags, but will leave their deletion to a human.

Recognized HTML Tags

HTML 2.0 contains the following 49 tags: A, ADDRESS, B, BASE, BLOCKQUOTE, BODY, BR, CITE, CODE, DD, DIR, DL, DT, EM, FORM, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IMG, INPUT, ISINDEX, KBD, LI, LINK, LISTING, MENU, META, NEXTID, OL, OPTION, P, PLAINTEXT, PRE, SAMP, SELECT, STRONG, TEXTAREA, TITLE, TT, UL, VAR, and XMP.

HTML 3.0 augments the 2.0 grammar with 53 additional tags: ABBREV, ABOVE, ACRONYM, ARRAY, ATOP, AU, BAR, BELOW, BIG, BOX, BQ, BT, CAPTION, CHOOSE, CREDIT, DDOT, DEL, DFN, DIV, DOT, FIG, HAT, INS, ITEM, LANG, LEFT, LH, MATH, NOTE, OF, OVER, OVERLAY, PERSON, PRE, Q, RIGHT, ROOT, ROW, S, SMALL, SQRT, STYLE, SUB, SUP, T, TAB, TABLE, TD, TH, TILDE, TR, U, and VEC.

These tags are identified by their occurrence in the html.dtd and html-3.dtd document type definition files in lines like these:

<!ENTITY % font " TT | B | I ">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
<!ELEMENT (%font;|%phrase) - - (%text)+>
<!ELEMENT XMP - -  %literal>

ENTITY declarations define text string substitutions, and ELEMENT declarations define the tags recognized by the grammar.

HTML 3.2 introduced these 21 new tags: APPLET, AREA, BASEFONT, BIG, CAPTION, CENTER, DFN, DIV, FILE, FONT, MAP, NUMBER, PARAM, SCRIPT, SMALL, STRIKE, STYLE, SUB, SUP, TABLE, and U.

Proposed HTML 4.0 introduces these 15 new tags: BDO, BUTTON, COL, COLGROUP, FIELDSET, FRAMESET, IFRAME, LABEL, NOFRAMES, NOSCRIPT, OBJECT, SPAN, TBODY, TFOOT, and THEAD.

HTML Tag Omission

The HTML grammar permits certain end tags to be omitted, when their implied position can be determined from the grammatical context. In HTML 3.0, this includes the following tags: DD, DT, ITEM, LH, LI, OF, OPTION, P, ROW, STYLE, and TR.

The HTML 3.2 grammar permits these end tags to be omitted: DD, DT, INPUT, LI, OPTION, and P.

The HTML proposed 4.0 grammars permit these end tags to be omitted: COLGROUP, DD, DT, LI, OPTION, P, PARAM, TFOOT, THEAD, and TR.

In all grammar versions, these tags never have end tags: AREA, BASE, BR, FRAME, HR, IMG, INPUT, ISINDEX, LINK, META, NEXTID, PARAM, and PLAINTEXT.

In version 3.0, BASEFONT takes an optional end tag, but in succeeding versions, it must not have an end tag.

In version 3.0, STYLE takes an optional end tag, but in succeeding versions, it must have an end tag.

Supporting the tag-omission feature requires the ability to parse a complete SGML grammar, which requires a great deal more code than html-pretty provides. Consequently, html-pretty does not support optional end tags; based on typical usage, they are expected to be always present, or always absent, according to the rules given below. Omitted end tags can be automatically supplied by an SGML tag normalizer, such as sgmlnorm(1), spam(1), or html-spam(1).

html-pretty will warn about end tags that should not be present, based on the tags' membership in those style classes that are known not to have end tags. However, it will not delete them from the output stream, because human judgment may be called for. See the HTML GRAMMAR CONSTRAINTS section below for further details.

Style Class: body

The following HTML tag name must occur only once, with a begin/end pair, often with substantial amounts of intervening text: BODY. The begin/end tags are prettyprinted on separate lines, with their enclosed text indented one level. However, the BODY environment must occur after the HEAD environment, and one level inside the HTML environment. html-pretty will supply this environment if needed, unless the -brief option has suppressed it.

Style Class: comment

Short HTML comments are output inline, like normal text. Long ones, and ones with embedded angle brackets, are prettyprinted on separate lines. Their internal form is preserved exactly, without any line wrapping, since they will often contain specially-formatted material. Any whitespace between the final "--" and the closing angle bracket will be eliminated, when possible.

Style Class: doctype

The following HTML tag name occurs only once, and should normally be the first non-comment tag in a file: !DOCTYPE. html-pretty will supply this tag if needed, unless the -brief option has suppressed it. Strictly, this is not a tag, but rather a markup declaration, but html-pretty treats it as a special tag, and outputs it verbatim while checking for proper embedded comment balance. For more details, see the COMMENTS IN HTML AND SGML section below.

Style Class: font

These HTML tag names occur in begin/end pairs, usually with smaller amounts of enclosed material. They appear inline in the running text, and do not alter indentation: ACRONYM, B, BIG, BLINK, BT, CODE, DFN, EM, I, KBD, Q, REV, S, SAMP, SMALL, STRIKE, STRONG, T, TT, U, and VAR.

Style Class: head

This HTML tag occurs in begin/end pairs, which are prettyprinted on separate lines with enclosing text indented one level: HEAD. However, this tag pair must occur only once in a file, and then only inside an HTML environment, and before the BODY environment. html-pretty will supply this environment if needed, unless the -brief option has suppressed it.

Style Class: html

This HTML tag occurs in begin/end pairs, which are prettyprinted on separate lines with enclosing text indented one level: HTML. However, this tag pair must occur only once in a file, and then only at the outermost level. html-pretty will supply this environment if needed, unless the -brief option has suppressed it.

Style Class: inline

Tags in this class are treated as ordinary text, with no additional spacing requirements, or checks for enclosing environments.

Neither the default built-in style, nor any of the standard grammar-level-specific style files, use this class. It is provided to permit transparent handling of tags that may be added in future versions of the HTML grammars.

There is a difference in the handling of a member of this class, compared to that for a tag which is not defined in any class. The latter may result in warnings if the -unknown-tag-warning option has been selected, is allowed only in the BODY environment, and may cause a paragraph to end. A tag in the inline style class may occur in either the HEAD or BODY environments, never raises unknown-tag warnings, and does not end a paragraph.

Style Class: line-break

This HTML tag marks an explicit line break, and has no matching end tag; preceding space is deleted, and a newline follows: BR.

Style Class: link

This HTML tag has no matching end tag; it appears alone on a separate line: LINK. There is normally at least one LINK tag, in the HEAD environment, and html-pretty will supply one automatically if none is present in the input stream.

Style Class: list

These HTML tags names occur in begin/end pairs, and delimit lists. They appear on separate lines, with their enclosed text indented two levels: DIR, DL, MENU, OL, and UL.

Style Class: list-header

This HTML tag marks the title of a list: LH. The begin/end tags are output on separate lines, indented one level from the enclosing list.

Style Class: list-item

These HTML tags mark the beginning of list items, and have matching end tags which are supplied if they are absent. They are output on separate lines, indented one level from the enclosing list: DD, DT, and LI.

Style Class: markup-declaration

The following SGML markup declarations are also treated like special tags, and output verbatim while checking for proper embedded comment balance: !ATTLIST, !ELEMENT, !ENTITY, !NOTATION, !SGML, !SHORTREF, and !USEMAP. However, html-pretty does no further checking about where these `tags' are legal. Generally, they do not occur in HTML files, but are found mainly in DTD files.

Style Class: math

These HTML tag names occur only inside a MATH environment, and appear inline, without end tags, and without affecting indentation: ATOP, CHOOSE, LEFT, OF, OVER, RIGHT, and TAB.

Style Class: math-pair

These HTML tag names occur only inside a MATH environment, with begin/end pairs, and appear inline, without affecting indentation: ABOVE, BAR, BELOW, BOX, DDOT, DOT, HAT, ROOT, ROW, SQRT, SUB, SUP, TILDE, and VEC.

Style Class: pair

The following HTML tag names occur in begin/end pairs (<TAG>and</TAG>), often with substantial amounts of intervening text: A, ABBREV, ABSTRACT, ADDED, ADDRESS, APPLET, ARG, AROW, ARRAY, AU, BDO, BLOCKQUOTE, BQ, BUTTON, CAPTION, CENTER, CITE, CMD, COLGROUP, CREDIT, DEL, DIV, DIV1, DIV2, DIV3, DIV4, DIV5, DIV6, FIELDSET, FIG, FN, FONT, FOOTNOTE, FORM, FRAMESET, HIDE, IFRAME, INS, LABEL, LANG, MAP, MARGIN, MATH, MESSAGE, NOFRAMES, NOSCRIPT, NOTE, OBJECT, OPTION, PERSON, QUOTE, REMOVED, SELECT, SPAN, STYLE, TABLE, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, and TR. They are prettyprinted on separate lines, with their enclosed text indented one level.

Style Class: paragraph

This HTML tag occurs in begin/end pairs, which are prettyprinted on separate lines with enclosing text indented one level: P. However, paragraphing is tracked, empty paragraphs are discarded, and when new tags are encountered which are known to be illegal inside a paragraph, any open paragraph is automatically closed. Thus, old-style HTML files with omitted </P> tags will usually get them added. Unlike most word processors and many typesetting systems, blank lines in the SGML and HTML input stream do not imply a paragraph break; only the <P> tag does.

Style Class: plaintext

The HTML tag PLAINTEXT marks the beginning of verbatim text that continues to end-of-file; it appears on a separate line. Although some HTML viewers will terminate the plaintext environment on reaching a matching end tag, </PLAINTEXT>, that practice is now considered erroneous.

html-pretty will warn about this abberant environment, and recommend using <PRE> ... </PRE> instead.

Style Class: section

These HTML tags occur in begin/end pairs, which are prettyprinted on separate lines with enclosing text indented one level: H1, H2, H3, H4, H5, and H6. However, they must be logically ordered: H1 before H2 ... before H6, with no intermediate header levels omitted, and they must appear at the first level inside the BODY environment.

Style Class: short

This HTML tag has no matching end tag; it appears alone on a separate line: ITEM. However, tags in this class can be used only inside a BODY environment, and consequently, html-pretty will automatically end any open HEAD environment, and start a BODY environment, if needed.

Style Class: standalone

These HTML tags have no matching end tag; they appear alone on separate lines: CHANGED, HR, IMG, INPUT, RENDER, STYLES, and WBR. However, they may appear only inside the BODY environment, and outside a paragraph, and consequently, html-pretty will automatically end any open HEAD and P environments, and start a BODY environment, if needed.

Style Class: standalone-nocheck

These HTML tags have no matching end tag; they may appear in either the HEAD or the BODY environment, and they appear alone on separate lines: BASE, ISINDEX, META, and NEXTID.

As the class name implies, they are not checked against rules that might restrict their placement with respect to other environments.

Style Class: title

This HTML tag occurs in begin/end pairs, which are prettyprinted on separate lines with enclosing text indented one level: TITLE. However, this tag pair is restricted to occurring only in the HEAD environment, and should normally only be given once. html-pretty will supply this environment if needed, unless the -brief option has suppressed it, and will warn about multiple occurrences.

Style Class: verbatim

These HTML tags appear in begin/end pairs, delimit preformatted, or verbatim, text, and may occur only in the BODY environment: LISTING, NOBR, PRE, and XMP. The beginning and ending tags are output on separate lines, with no indentation, and with the enclosed material copied exactly as it appeared in the input stream.

Style Class: verbatim-nocheck

These HTML tags appear in begin/end pairs, delimit preformatted, or verbatim, text, and may occur in either HEAD or BODY environments: SCRIPT and STYLE. The beginning and ending tags are output on separate lines, with no indentation, and with the enclosed material copied exactly as it appeared in the input stream.

The SCRIPT and STYLE environments are not strictly verbatim environments, but since they contain material in one of several different scripting (Java, JavaScript, Tcl, VBScript, ...) or style-sheet (CSS, ...) languages, there is no reasonable way for html-pretty to reformat their contents, so they are included in this style class to prevent such reformatting.


.-3[SYNOPSIS]         .-2[DESCRIPTION]         .-1[OPTIONS]
Top
.+1[STYLE FILES]     .+2[CATALOG DIRECTORY]         .+3[COMMENTS IN HTML AND SGML]