Table of contents


NAME

tgrind - typeset nice program listings using TeX

SYNOPSIS

tgrind [ -d <description file> ] [ -f ] [ -fn fontname ] [ -h header ] [ -l <language> ] [ -n ] [ -o basename ] [ -p ] [ - ] filename ...

Command-line arguments are processed in strict order, so that options interspersed between file names affect only those files which follow.


DESCRIPTION

tgrind formats program sources in a nice style using tex(1). Comments are placed in italics, keywords in bold face and strings in typewriter font. Source file line numbers appear in the right margin (every 10 lines). The start of a function is indicated by the function name in large type in the right margin.

tgrind is not a prettyprinter: all line breaks and horizontal spacing in the input source files are preserved. Prettyprinters require more sophistication, and language-specific knowledge, than is possessed by tgrind. Consequently, you may find it useful to apply a prettyprinter to your source code before giving it to tgrind. Some of the available ones are: bibclean(1) for BibTeX, cb(1) and indent(1) for C and C++, pretty(1) for Fortran, sf3pretty(1) for Fortran and Sftran3, indent-sexp for GNU Emacs Lisp, mft(1) for Metafont, and pindent(1) for Pascal.

In regular mode tgrind processes its input file(s) and passes them to tex(1) for formatting and output, and then sends the output to a DVI driver for conversion to \*(Ps. All output files are normally left in the /tmp directory with numeric base names; the -o option (see below) can change this behavior. tgrind will not overwrite existing files in that directory.

In format mode (i.e., when the -f flag is used), tgrind processes its input file(s) and writes the result to standard output. This output can be saved for later editing, inclusion in a larger document, etc.

The options are:

-d <description file>
Specify the language definitions file (default is /usr/local/lib/tex/inputs/vgrindefs). This option is useful for testing new language definitions.
-f
Force format mode. This simply skips the TeX and DVI driver steps, copies the output TeX file to stdout, and deletes the output TeX file.
-fn fontname
Define the font family to be used in the output. The default font family if this option is not specified is Computer Modern. Otherwise, all fonts are virtual fonts that map to \*(Ps Type 1 outline fonts, making the \*(Ps output by a DVI driver more compact, and completely resolution-independent (i.e. no character bitmaps). The fontnames recognized may be given as a long font name, or as a short three-letter font file prefix. The names currently recognized are:
  • AvantGarde (pag),
  • Bookman (pbk),
  • Charter (bch),
  • Courier (pcr),
  • Helvetica (phv),
  • HelveticaNarrow (phn),
  • NewCentury or NewCenturySchoolbook (pnc),
  • Palatino (ppl),
  • Times (ptm), and
  • Utopia (put).

All of these fonts, except Charter and Utopia, and sometimes, HelveticaNarrow, are resident in standard \*(Ps laser printers. The exceptional fonts will be included in the output \*(Ps by the DVI driver program, if the driver's psfonts.map file correctly identifies them as non-resident fonts.

When this option is provided, the \*(Ps output will also use Courier for a typewriter font, and Symbol for certain special characters; both of these fonts are printer resident.

-l<language>
Specify the language to use. Unlike other tgrind options that take a value, this one allows the language value to be attached to the -l, or provided as a separate following argument. The <language> name often has two or three acceptable forms, one of which is the standard source file extension in UNIX; see the bar-separated names beginning each language entry in the vgrindefs file. Here are the <language> names currently recognized:
a68
Motorola 68xxx assembly language.
ada
Ada.
asm68
Another Motorola 68xxx assembly language.
awk
awk(1), gawk(1), and nawk(1).
bash
GNU Bourne-Again shell (bash(1)).
bibtex
BibTeX (bibtex(1)).
c
C (the default language).
caml
CAML.
c++
C++ and Objective C.
csh
C shell (csh(1)).
Elisp
GNU Emacs Lisp. Keywords are considered to be all of those low-level Lisp functions that are implemented in the Lisp interpreter itself (in the C programming language); higher-level Lisp functions written in Lisp are not keywords.
f
Fortran.
i
ISP.
I
Icon.
ksh
Korn shell (ksh(1)).
latex
LaTeX 2.09. Keywords are considered to be all of the control sequences named in the index of the first edition of Leslie Lamport, LaTeX User's Guide and Reference Manual, Addison-Wesley (1985), ISBN 0-201-15790-X. As with Emacs Lisp and other extensible languages, it seems reasonable to distinguish built-in `system' commands from `user' commands.
maple
Maple V. Keywords include the language keywords, operators, constants, and standard global variables.
maplex
Extended Maple V. The keywords also include all of the initially-loaded library functions.
matlab
Matlab.
m
MODEL.
m2
Modula-2.
Miranda
Miranda.
ml
MLisp and Emacs Lisp.
objc
Objective C.
p
Pascal.
prolog
Prolog.
ps
PostScript.
r
Ratfor.
russell
Russell.
sh
Bourne shell (sh(1)).
sf3
Sftran3.
src
Unknown source code (no keywords, comments, or strings are recognized).
tcsh
Extended C shell (tcsh(1)).
tex
TeX.
y
yacc.
-h text
Specify text to go on the left top margin of every output page (default is none).
-n
Do not boldface keywords.
-o basename
Specify an alternate basename for the output files. The default is /tmp/$$ which expands to something like /tmp/34275, for process number 34275. This option allows you to direct the output files anywhere you like. However, tgrind will still refuse to run if they already exist, in order to avoid overwriting a possibly important file.
-p
Send the output \*(Ps to the default printer.
-
Take input from standard input.

BUGS

The marginal-function-name mechanism depends on the quality of the language description in vgrindefs. The distributed vgrindefs file fails to recognize many legal C function declarations.

Arbitrary formatting styles for programs mostly look bad. The use of spaces to align source code often fails miserably (because of the variable width output font). If you plan to tgrind your program, try to use tabs.

The -f flag means different things to tgrind and vgrind(1).


PORTABILITY ISSUES

The table-driven preprocessor program, tfontedpr, is written in old (K&R) style C, and requires only a small number of header files. It compiles and runs on numerous UNIX systems, and should be readily portable to other operating systems. Its memory requirements are modest: less than 100KB on typical UNIX systems.

tgrind is a UNIX csh(1) script that handles argument parsing, and invocation of the preprocessor, indexer, TeX, and a DVI driver. It should be possible to reimplement this script in other operating systems, if they have a reasonably powerful shell command language.

The indexing program, tgrindex.awk, is written in nawk(1), and can be readily handled by GNU gawk(1) as well. Commercial and freely-distributable implementations of these languages are available for several personal computer operating systems, and for DEC VMS.

Volunteers for ports of tgrind to other operating systems will be most welcome!


ADDING SUPPORT FOR NEW LANGUAGES

The language translations implemented by tgrind are entirely table driven, using language descriptions given in the vgrindefs file, which is modeled after the colon-delimited key=value format of UNIX printcap(4) and termcap(4) capability files. Adding support to tgrind for a new language requires only additions to this file. Most language entries average about 9 lines, of which 6 or 7 are usually just enumerations of the language keywords.

Keys are either Boolean flags, in which case they take no =value string (the flag is set true if the key is present, and false if it is absent), or else string variables whose values are specialized patterns, jokingly referred to as irregular expressions, vaguely similar to the regular expressions recognized by the UNIX ex(1) editor and lex(1) lexical-analyzer generator.

In tgrind patterns, the characters `$', `(', `)', `:', `?', `^', `|', and `\' are reserved characters: they must be quoted with a preceding \ if they are to be interpreted as normal characters. Otherwise, they have these meanings:

^
Beginning of line.
$
End of line.
:
Key-value capability pair delimiter.
\
Escape character. Two such characters, \, represent a single backslash.
The extended patterns are:
\a
Matches any number of characters (like `.*' in lex(1)).
\d
Matches any number of whitespace delimiters (space, tab, newline, start of line).
\p
Matches any number of alphanumeric characters. In a procedure definition (the pb key), the string that matches this symbol is used as the procedure name.
|
Alternation.
(\^)
Grouping, used mostly for alternation and optionality.
?
Last item is optional (i.e. occurs zero times, or one time).
\
Preceding any string means that the string will not match an input string if the input string is preceded by an escape character (\). This is typically used for languages (like C) that can include the string delimiter in a string by escaping it.

Unlike other implementations of regular expressions, these patterns match words and not characters. Hence something like (foo\^|\^bar)mumble? would match foo, bar, foomumble, or barmumble. In tgrind patterns, alternation binds very tightly, so grouping parentheses are likely to be necessary in expressions involving alternation.

Here are the capability keys that are currently used in the vgrindefs file, and in the source code file, tfontedpr.c:

ab
Alternate comment begin.
ae
Alternate comment end.
bb
Begin statement block.
be
End statement block.
cb
Comment begin.
ce
Comment end.
ic
Define extra characters that may appear as initial characters of procedure names (those that match \p) and keywords, beyond the hard-wired defaults of letters, digits, and underscore. This supports languages that place restrictions on the initial characters of identifiers. If this key is not provided, then initial characters are treated the same as non-initial characters. This key does not exist in vgrind(1) implementations.
id
Define extra characters that may appear in procedure names (those that match \p) and keywords, beyond the hard-wired defaults of letters, digits, and underscore. This supports languages, like Lisp and TeX, that have a more extensive character set for identifiers. This key does not exist in older vgrind(1) implementations; it may have been introduced first by Sun Microsystems in the Solaris 2.x operating system release.
kw
Language keywords (a space separated list, usually in alphabetical order for readability, though that is not a requirement).
lb
Literal string begin.
le
Literal string end.
nc
Define characters that may not appear as initial characters of procedure names (those that match \p) and keywords. This provides a way to remove initial identifier characters from the hard-wired defaults of letters, digits, and underscore. Its value is examined after any ic value. This key is not available in vgrind(1).
ni
Define characters that may not appear in procedure names (those that match \p) and keywords. This provides a way to remove identifier characters from the hard-wired defaults of letters, digits, and underscore. Its value is examined after any id value. This key is unique to tgrind; it is not available in vgrind(1).
oc
(Boolean) one case flag: letter case is not significant.
pb
Procedure (function, subroutine) begin.
sb
Character string begin.
se
Character string end.
tc
If this key appears, it must be last. Its value is the name of another vgrindefs entry that is looked up and appended to the end of the current entry, minus the initial entry names. That entry in turn may end with a tc key that refers to yet another entry, and so on, up to a limit of 32 (to catch unterminating loops). If the same key appears more than once in the constructed entry, only the first value is used. Thus, tc can be used to prepare minor variations on a basic language definition.
tl
(Boolean) top lex flag: procedures may be defined only at top level, that is, nested procedures are not permitted.
The string value of id and kw is treated as an ordinary string, rather than a pattern: backslash has significance only at end-of-line, or before a colon.

Keys are always exactly two characters long, and the equals sign that separates them from their values must follow immediately, without intervening whitespace.

If you need a single backslash in a string, represent it like this: :id=\:. vgrind(1), and older versions of tgrind, do not permit this, because their simplistic scan assumes that backslash-colon does not terminate the string. Alternatively, since backslash is significant only before colon and newline in id and ni strings, you could also write :id=\a:, since `a' is already in the identifier character set.

Let's dissect a typical entry to see how this works:

modula2|mod2|m2:\
        :pb=(^\d?(procedure|function|module)\d\p\d|\(|;|\:):\
        :bb=\d(begin|case|for|if|loop|record|repeat|while|with)\d:\
        :be=\dend|;:\
        :cb={:\
        :ce=}:\
        :ab=\(*:\
        :ae=*\):\
        :sb=":\
        :se=":\
        :oc:\
        :kw=and array begin by case const definition div \
        do else elsif end exit export for from if \
        implementation import in loop mod module not of \
        or pointer procedure qualified record repeat \
        return set then to type until var while with:

Each line after the first conventionally begins with a tab, although this is not required, and if the next character is a colon, a key name follows. Terminal backslashes indicate line continuation.

Multiple key=value pairs can be given on one line, as long as they are separated by colons, so at the loss of readability, we could compact seven lines of this entry into just one, like this:

        :cb={:ce=}:ab=\(*:ae=*\):sb=":se=":oc:\

The first line in our sample entry says that this language may be named modula2, mod2, or m2 in the tgrind -l option.

The pb line says that a procedure definition begins a line with optional whitespace, followed by one of the keywords procedure, function, or module, followed by optional whitespace, followed by an alphanumeric procedure name. That name in turn may be followed by whitespace, an open parenthesis, a semicolon, or a colon, thanks to the tight binding of alternation. It would have been clearer to include grouping parentheses, writing `(\d|\(|;|\:)'.

The bb line says that a statement block starts with optional whitespace, and one of the keywords begin ... with, and the be line says a statement block ends with optional whitespace, followed by either the keyword end, or a semicolon. The cb and ce lines say that comments are delimited by braces, and the ab and ae lines say that comments may also be delimited by (* *). The sb and se lines say that strings are delimited by quotation marks, and the oc flag says that letter case is not significant in names (this seems to be in error: Niklaus Wirth's Programming in Modula-2, Springer-Verlag (1983), ISBN 0-387-12206-0, says that upper and lower case letters are distinct). Finally, the kw lines enumerate all of the Modula-2 language keywords, from and to with.


ADDING SUPPORT FOR NEW FONTS

Any -fn font name specified on the tgrind command line is written directly to the first line of the output TeX file in the form
    \def \FontName {NewCenturySchoolbook}
followed by a line
    \input tgrindmac
The interpretation of the font name is handled entirely in the TeX file, tgrindmac.tex. For this example, a line in that file says
    \ifstreq{\FontName}{NewCenturySchoolbook} \def \FontName {pnc} \fi
This replaces the definition of \FontName with pnc. A few lines later, we find
    \ifstreq{\FontName}{pnc} \setfonts pnc r ri b. \fi
When \FontName has the value pnc, \setfonts is executed with four arguments: the basename of the virtual font, and the suffixes to be added to it to name upright, italic, and bold fonts.

Thus, TeX will expect to find in its TEXFONTS search path the TeX font metric files pncr.tfm, pncri.tfm, and pncb.tfm, and the DVI driver will expect to find in the same search path the virtual font files pncr.vf, pncri.vf, and pncb.vf.

Besides these files, \setfonts will generate references to fonts pcrro and psyr for typewriter text and special symbols, so TeX will also need pcrro.tfm and psyr.tfm, and the DVI driver will need pcrro.vf and psyr.vf.

If all of the referenced fonts exist on the system, TeX and the DVI driver will handle the rest of the job automatically, and if you added two lines similar to the ones above to a private copy of tgrindmac.tex to define a new font family, you can stop reading this section now.

However, here's what goes on behind the scenes. The virtual font files contain references to the so-called `raw' TeX font metric files, prefixed by a letter `r', in this case, rpcrro.tfm and rpsyr.tfm. The correspondence between these raw font metric files and the actual long \*(Ps font names, such as Courier-Oblique and Symbol, is made in the psfonts.map file, with lines like these:

	rpcrro	Courier-Oblique
	rpsyr	Symbol
	...
	rptmro	Times-Roman ".167 SlantFont"
	...
	putb0	Utopia-Bold <putb0.pfb
	putbo0	Utopia-Bold ".167 SlantFont " <putb0.pfb
The first two simply identify the mapping between a file name and a font name. The third additionally specifies that the Times-Roman font is to be slanted to the right by one-sixth, to synthesize an oblique Times-Roman. The fourth tells the DVI driver that the font definition must be downloaded from the putb0.pfb Type 1 \*(Ps binary font file, and the fifth specifies both a slant and a source file to be downloaded.

FILES

/usr/local/lib/tex/inputs/doublecol.texdouble-column plain TeX macro package
/usr/local/lib/tex/inputs/psfonts.mapDVI driver \*(Ps font mapping file
/usr/local/lib/tex/inputs/tgrindex.awkindexing program
/usr/local/lib/tex/inputs/tgrindex.texindexing macro package
/usr/local/lib/tex/inputs/tgrindmac.textgrind macro package
/usr/local/bin/tfontedprtgrind preprocessor program
/usr/local/lib/tex/inputs/vgrindefslanguage descriptions

AUTHOR

Van Jacobson, Lawrence Berkeley Laboratory (based on vgrind(1) by Dave Presotto and William Joy of UC Berkeley).

Extensions for \*(Ps fonts, procedure indexing, space after the -l option, the -o option, the ic, id, nc, and ni keywords, language support for Ada, awk, bash, BibTeX, ANSI/ISO Standard C, C++, CAML, Elisp, Fortran, ksh, LaTeX, Maple, Matlab, MLisp, Miranda, Objective C, PostScript, Russell, Sftran3, and tcsh, plus major revisions of documentation and source distribution, by

Nelson H. F. Beebe
Center for Scientific Computing
Department of Mathematics
University of Utah
Salt Lake City, UT 84112
USA
E-mail: <beebe@math.utah.edu>.

AVAILABILITY

tgrind is freely distributable. You should be able to find the most recent version in the Comprehensive TeX Archive Network (CTAN) collections; for a list of CTAN hosts, do
finger ctan@pip.shsu.edu

SEE ALSO

awk(1), bash(1), bibclean(1), bibtex(1), cb(1), csh(1), dpsexec(1), ex(1), gawk(1), gs(1), indent(1), ksh(1), lex(1), matlab(1), maple(1), mft(1), nawk(1), pageview(1), pindent(1), postscript(1), pretty(1), printcap(4), sf3pretty(1), sh(1), tcsh(1), termcap(4), tex(1), vgrind(1), vgrindefs(5), xmaple(1), xsf3(1).