HTML-PRETTY 1 "04 December 1997" "Version 1.00" [section 8 of 14]



The SGML (and thus, HTML) comment syntax confuses many users, including browser programmers, and book writers, who either get it wrong, or fail to explain it properly.

SGML has a command called markup declaration open, or mdo for short, which defaults to the two-character sequence <!, and is matched by a markup declaration close, or mdc, which defaults to the one-character sequence >. The mdo must be immediately followed either by a keyword, such as ATTLIST, DOCTYPE, ELEMENT, ENTITY, NOTATION, SGML, SHORTREF, or USEMAP, or else by a comment start, which is the two-character sequence --. In the latter case, there must later be a second -- pair to signal a comment end. Anything after that second pair is non-comment text to be parsed. However, additional comments can appear, provided that they are surrounded by -- pairs. Eventually, the markup declaration is ended by an mdc command, although whitespace may follow the last comment. Thus,

<!--one-- --two-- --three-->
<!-- one -- -- two -- -- three -- >
<!-- one --
  -- two --
  -- three --

are all legal markup declarations, each of which consists of three comments. However, these are in error:

<! --one-- --two-- --three-- >
<!-- one -- two -- three -- >
         one --
      -- two --
      - - three - -

The first has an illegal space after the mdo, the second has the word two outside a comment, and the third has the text - - three - - outside the comment.

Given these rules, can SGML comments be nested? It might appear that they can, since

<!-- <!-- --> -->
<!-- <!-- --> <!-- --> -->

are both accepted by an SGML parser. However, the nesting is illusory, since the first has two comments, containing the text <! and >, and the third has three comments: <! and > <! and >. As soon as we put words between the hyphen pairs, such as

<!-- one <!-- two --> three -->
<!-- one <!-- two --> <!-- three --> four -->

the word two lies outside the comment in the first, and two and three are outside the comment in the second. Therefore, SGML and HTML comments cannot in general be nested.

Finally, because of the special significance of -- as both a comment start and a comment end inside a markup declaration, you must be careful about using adjacent hyphens in the comment text. In particular, if you use a line of hyphens to set off one block of text from another, the number of hyphens must be a multiple of 4, such as this 60-hyphen example:


Having 57, 58, 59, 61, 62, or 63 hyphens won't do!

html-pretty does not distinguish markup declarations, beginning with <!, from tags, beginning with <, except for comments, which have to be parsed separately anyway to handle the pairs-of-pairs-of-hyphens balance requirement. Thus, it considers !DOCTYPE to be a tag name, when strictly speaking, the ! is part of the markup declaration open. However, since the closing character > can always be represented by an SGML entity, &lt;, html-pretty can safely assume that anything between angle brackets is a `tag'. That is why the style files have a rule

doctype :            !DOCTYPE

with the exclamation mark prefixed to the word DOCTYPE.