HTML-PRETTY 1 "04 December 1997" "Version 1.00" [section 6 of 14]



In order to allow the user to control the formatting of particular HTML tags, html-pretty supports a powerful style file mechanism. At startup, it processes a style file in the directory where the executable program was found, a second style file in the user's home directory, and a third style file in the current directory. None of these need exist. Next, during command-line argument processing, additional style files can be provided with the -stylefile option. These style files support system-specific, user-specific, directory-specific, and job- or file-specific prettyprinting control.

The default name of the first three style files is system dependent: .html-prettyrc (UNIX), htmlpty.ini (IBM PC DOS), and html-pretty.ini (DEC VMS and OpenVMS). The exact names depend on the name of the executable program, which is used to construct the style file names. If you renamed it to sgmlpretty.exe, the names would be .sgmlprettyrc on UNIX, and sgmlpretty.ini on other systems.

More precisely, the program name passed to the prettyprinter as its zeroth argument is filtered by discarding any trailing version number and extension, then keeping the last consecutive string of characters that are letters, digits, hyphens, or underscores. If the zeroth argument is empty, implying that the program name is unknown, then the name htmlpty is used; it is short enough to be acceptable to any reasonable file system.

Automatic association of the style file names with the program name makes it easy to support multiple versions, e.g., html-pretty-2-0, html-pretty-3-0, html-pretty-3-2, and html-pretty-4-0 could select different HTML grammar versions, and on UNIX, and a few other operating systems, those programs could all be links to the same physical file, avoiding wasteful duplication of disk space.

In the current implementation of html-pretty, all of the standard HTML tags defined in HTML grammar levels up to the proposed 4.0, plus a few browser vendor extensions, are hard-coded into the program as members of over twenty formatting-style classes, so that no external style files need exist for the program to run. However, the -extend-style, -grammar-level, and -stylefile options allow the built-in rules to be completely replaced, if desired.

The -print-stylefile command displays the built-in style rules as a set of style class names, each followed by a colon and a list of HTML tags in that style class. Its output may be used without further modification as a style file, and looks something like this:

%% html-pretty version 1.00 date [29-Nov-1997]
%% User-modifiable style file generated on Wed Dec  3 08:34:39 1997

%% Uncomment the next line to clear all existing rules,
%% or leave it as a comment to preserve them:
% default :

body :                    BODY

doctype :                 !DOCTYPE

font :                    ACRONYM B BIG BLINK BT CODE \
                          DFN EM I KBD Q REV S SAMP \
                          SMALL STRIKE STRONG T TT U \

head :                    HEAD

html :                    HTML

inline :

line-break :              BR

link :                    LINK

list :                    DIR DL MENU OL UL

list-header :             LH

list-item :               DD DT LI

markup-declaration :      !ATTLIST !ELEMENT !ENTITY \
                          !NOTATION !SGML !SHORTREF \

math :                    ATOP CHOOSE LEFT OF OVER \
                          RIGHT TAB

math-pair :               ABOVE BAR BELOW BOX DDOT \
                          DOT HAT ROOT ROW SQRT SUB \
                          SUP TILDE VEC

pair :                    A ABBREV ABSTRACT ADDED \
                          ADDRESS APPLET ARG AROW \
                          ARRAY AU BDO BLOCKQUOTE BQ \
                          BUTTON CAPTION CENTER CITE \
                          CMD COLGROUP CREDIT DEL DIV \
                          DIV1 DIV2 DIV3 DIV4 DIV5 \
                          DIV6 FIELDSET FIG FN FONT \
                          FOOTNOTE FORM FRAMESET HIDE \
                          IFRAME INS LABEL LANG MAP \
                          MARGIN MATH MESSAGE \
                          NOFRAMES NOSCRIPT NOTE \
                          OBJECT OPTION PERSON QUOTE \
                          REMOVED SELECT SPAN TABLE \
                          TBODY TD TEXTAREA TFOOT TH \
                          THEAD TR

paragraph :               P

plaintext :               PLAINTEXT

public :                  "-//IETF//DTD HTML//EN"

section :                 H1 H2 H3 H4 H5 H6

short :                   ITEM

standalone :              CHANGED HR IMG INPUT RENDER \
                          STYLES WBR

standalone-nocheck :      AREA BASE BASEFONT COL \
                          FRAME ISINDEX META NEXTID \
                          OVERLAY PARAM

title :                   TITLE

verbatim :                LISTING NOBR PRE XMP

verbatim-nocheck :        SCRIPT STYLE

There will be additional data at the end of this output, in the form of the tag relationships discussed below, but we omit it here because of its length.

Notice that long lines can be continued on multiple physical lines, for improved readability, by terminating them with a backslash immediately before the newline. The output line width in this file is governed by the -width option.

As with the command-line -extend-style option, you can replace the colon with an equal sign, and the spaces between tags with commas, if you prefer.

In addition to the style classes listed above, the special style class default can be used to force an HTML tag to revert to the default rule for unknown tags: just treat it as ordinary text, although if the -unknown-tag-warning option was specified, a warning will be issued for this now unknown tag. You can use this to override earlier style file settings, and most of the built-in ones.

The special style class public is associated with the SGML DOCTYPE identifier string, rather than with a list of HTML tags. That string determines which grammar file is to be used for the document, and the value above corresponds to the most generic grammar level (2.0) recognized by almost all browsers, even though the style file includes tags from higher grammar levels. The public style class is the only class whose value is a quoted string.

If the tag list to the right of the colon (or equals sign) is empty, then all existing tags for that class are forgotten. If an empty tag list is used with the default style class, then all tags for all classes are forgotten. These two uses allow you to eliminate all built-in rules, or just certain style classes, so that your style file can start from a known rule base.

Style class names are case sensitive, and all of the ones recognized by html-pretty must be spelled with lowercase letters.

The style file is expected to contain lines of the forms:

style-class : TAG1 TAG2 ...

TAG : TAG1 TAG2 ...

relationship[TAG] : TAG1 TAG2 ...

-option -option ...
-option value
-option value -option value  ...

In the second form, the style class of the first tag is looked up and assigned to the remaining tags. That first tag must be entirely UPPERCASE in order to distinguish it from a style class. The remaining tag names can be in either lettercase, or in mixed case; they will be converted to uppercase internally.

The third form specifies a relationship between a tag, and one or more other tags. The two currently-recognized relationships are ContainedIn and CannotContain; any others that are specified will simply be ignored, although they will take up memory space. For example, the lines

ContainedIn[LH]           : DL OL UL

mean that a TITLE environment cannot contain LINK, META, SCRIPT, or STYLE tags, and that an LH tag can only be contained in a DL, OL, or UL environment. Unlike style class definitions, relationships are not cumulative; only the last one of each relationship type encountered for a particular tag will be used.

Because of the complexity of SGML grammars, tag relationships cannot be reliably determined manually. Instead, relationship tables are derived from the grammar files by software developed especially for, and included with, the html-pretty distribution, and then inserted into the standard style files. Thus, it is not anticipated that users will provide tag relationship data in their personal style files, although they are free to do so.

The built-in relationship tables are derived from a merger of the major grammar levels, so that they match tags in the built-in style classes, which are also derived from that merger.

Tag relationships are only checked when the -check-tag-nesting option is specified, and since that is not a default option, omitting tag relationships will normally not have a visible effect on the output of html-pretty.

In the remaining forms, command-line options appear, and any of the documented options may be used this way. You can use this facility to collect preferred option settings in one place, such as in directory- or user-specific initialization files. Observe that a -stylefile option here can be used to chain temporarily to another file, and this can go on recursively to a depth which is limited only by the run-time stack depth and the maximum number of simultaneously open files.

Blank lines, leading and trailing whitespace, and text from a percent (%) comment character to end of line, are ignored. To get a literal percent or quotation mark into a value, prefix it with a backslash; the backslash will be removed when the value is collected.

Whitespace separates items, and can be omitted around the colon. There is no significance to the order of items on a line, or lines in the file, except that later settings can override earlier ones. The same style class name may occur on multiple lines, and the tag lists will be accumulated, discarding duplicates.

The line length limit in style files is system-dependent, but guaranteed to be at least 2048 characters.

Here is an example of a small style file to add the new tags introduced in the HTML proposed 4.0 grammar (even though they are actually already present in the built-in rules):

% HTML proposed 4.0 additions
font : Acronym s

pair : BdO Button col colgroup Fieldset FrameSet IFrame \
       laBel NoFrames NoScript Object span TBody TFoot thEAD

public : "-//W3C//DTD HTML 4.0 Transitional//EN"

The last style class attached to command or environment name is the one that is used, so specifications in command-line -extend-style, -grammar-level, or -stylefile options can override those in the current directory style file, those in turn override settings from the home directory style file, and those override system-wide style file settings.