BIBCLEAN 1 "09 May 1998" "Version 2.11.4" [section 7 of 12]

.-3[OPTIONS]         .-2[ERROR RECOVERY AND WARNINGS]         .-1[INITIALIZATION FILES]
Top
.+1[SCRIBE BIBLIOGRAPHY FORMAT]     .+2[ENVIRONMENT VARIABLES]         .+3[FILES]


LEXICAL ANALYSIS

When -no-prettyprint is specified, bibclean acts as a lexical analyzer instead of a prettyprinter, producing output in lines of the form

<token-number><tab><token-name><tab>"<token-value>"

Each output line contains a single complete token, identified by a small integer number for use by a computer program, a token type name for human readers, and a string value in quotes.

Special characters in the token value string are represented with ANSI/ISO Standard C escape sequences, so all characters other than NUL are representable, and multi-line values can be represented in a single line.

Here are the token numbers and token type names that can appear in the output when -prettyprint is specified:

 0   UNKNOWN
 1   ABBREV
 2   AT
 3   COMMA
 4   COMMENT
 5   ENTRY
 6   EQUALS
 7   FIELD
 8   INCLUDE
 9   INLINE
10   KEY
11   LBRACE
12   LITERAL
13   NEWLINE
14   PREAMBLE
15   RBRACE
16   SHARP
17   SPACE
18   STRING
19   VALUE

Programs that parse such output should also be prepared for lines beginning with the warning prefix, %%, or the error prefix, ??, and for ANSI/ISO Standard C line number directives of the form

# line 273 "texbook1.bib"
which record the line number and file name of the current input file.

If a -max-width nnn command-line option was specified, long output lines will be wrapped at a backslash-newline pair, and consequently, software that processes the lexical token stream should be prepared to collapse such wrapped lines back into single lines.

As an example of the use of -no-prettyprint, the UNIX command pipeline

bibclean -no-prettyprint mylib.bib | \
    awk '$2 == "KEY" {print $3}' | \
    sed -e 's/"//g' | \
    sort
will extract a sorted list of all citation keys in the file mylib.bib.

A certain amount of processing will have been done on the tokens. In particular, delimiters equivalent to braces will have been replaced by braces, and braced strings will have become quoted strings.

The LITERAL token type is used for arbitrary text that bibclean does not examine further, such as the contents of a @Preamble{...} or a @Comment{...}.

The UNKNOWN token type should never appear in the output stream. It is used internally to initialize token type variables.


.-3[OPTIONS]         .-2[ERROR RECOVERY AND WARNINGS]         .-1[INITIALIZATION FILES]
Top
.+1[SCRIBE BIBLIOGRAPHY FORMAT]     .+2[ENVIRONMENT VARIABLES]         .+3[FILES]