Next: Putting it all together Up: Standard File Headers Previous: Background

What's in a header?

¸file headercontents

The BibTEX¸bibtex system for support of bibliographic data bases was developed by Oren Patashnik¸Patashnik, Oren at Stanford University, based on earlier work by Brian Reid¸Reid, Brian at Carnegie Mellon University on the Scribe document formatting system¸Scribe document formatting system [#!Unilogic:SDP84!#]. BibTEX is described in Leslie Lamport's¸Lamport, Leslie book [#!Lamport:LDP85!#] on L^ATEX.¸LaTeX It is based on the notion that bibliographic items can be divided into distinct classes : articles, books, reports, theses, and so on.

Each class of documents has certain features in common. For example, journal articles have authors, titles, volume numbers, often issue numbers, page numbers, and dates of publications. Theses and reports would have the name of an institution attached.

The number of classes of documents is not fixed; indeed, it may change with time, or between cultures and languages. Thus, a bibliographic system must be extensible . BibTEX provides this critical feature by an implementation in a programming language that knows how to parse the general structure of a bibliographic data base entry, without particular knowledge of the classes, or attributes of classes. That information is instead encoded in a style file , which is written in a much more compact form that is specialized for its job, and is presumably easier for users to change than BibTEX itself is.

The style file can specify which attributes are required to be present in a class (e.g. a Ph.D. thesis must have an institution), and which attributes are optional (a book may or may not have an International Standard Book Number,¸International Standard Book Number (ISBN) ISBN).

Some styles may not require all attributes in a particular class, so BibTEX simply ignores attributes not required by the current style, checking them only cursorily for proper syntax.

In addition, the style file can specify how individual bibliographic entries extracted by BibTEX from data base files are to be formatted. In a typesetting application, this flexibility is important, because there are a great many bibliography formatting styles, and each journal or publisher often has rather strict (and arbitrary) rules that authors must adhere to.

How does this relate to the question of file headers?

Clearly, the notion of classes and attributes applies to all computer files as well. The class is the file type, such as Lisp file, Pascal code file, and national census data file. The attributes are things like author(s), author's address, date of last modification, file name, revision history, character set name, and so on.

In many operating systems, file naming conventions have been adopted by which the name encodes information about the class to which the file belongs. For example, if the file name ends in .c, it is assumed to contain code written in the C programming language. Unfortunately, few file systems are general enough to permit the creators of computer files to encode additional header information that might be more detailed.

Since this additional information cannot be standardly encoded in the file system, it must be supplied in some way inside the files themselves. This is not universally possible, particularly with binary files.

However, textual data tends to be much more portable between computer systems, and all reasonable programming languages and text processing systems make some provision for comments ,¸comment that is, explanatory material inserted into the file which is otherwise ignored by the program which processes the file.

Such comments are generally identified by a unique start symbol, followed by the comment text, and a unique end symbol.

The start symbol is usually a particular special character, or special short character sequence, not otherwise required in the language in which the file is encoded. Sometimes the start symbol must begin in a certain column of the line, such as Fortran's C or * in column 1, or is implicitly present at a certain column (assembly languages for older computers often decreed something like ``a comment starts in column 32 of the input record'').

The end symbol is frequently an end-of-line condition, which need not be an actual character. This convention is simple, but limits comments to single lines. If a comment end symbol other than end-of-line is chosen, the comment body may span multiple lines. Thus, the PL/1 and C programming languages delimit comments by /* and */, and Pascal by (* and *), or by paired braces. Some programming languages even permit comments to be properly nested , so that one can comment out a block of code that itself contains comments.

Ideally, a comment syntax should be simple, yet permit any processor-representable characters to appear in the comment text, so as not to hinder freedom of expression.

In any event, with most programming languages, we should be able to encode file header information as comments in such a way that expression is not restricted, yet both humans and suitable computer programs can recognize the presence of the file header.

Putting it all together, Outline of file headers, What's in a header?, Top

Next: Putting it all together Up: Standard File Headers Previous: Background

Nelson H. F. Beebe
11/29/1997