orbibjoin [-check-missing] [-version] BibTeXfile(s) > outfile
bibjoin should be applied to a bibliography file only after entries have been suitably ordered so that candidates for joining appear consecutively. This can be done mostly automatically if standardized citation labels are first generated, then the bibliography is sorted by citation labels, such as by bibsort(1).
Only a human reader can reliably decide when two bibliography entries are truly the same. bibjoin can help automate much of this work, but manual editing will almost certainly still be necessary. If two entries are joined, these conditions must be satisfied:
When two `equal' value strings are found for the same key, one of them is deleted. Otherwise, both key/value pairs are output. Manual editing will then be required to choose between them.
- identical citation labels;
- identical year;
- if a journal article entry, identical volume, and if both have page numbers, identical initial page numbers.
Special handling is supplied for `pages' entries. If entries are found with identical initial page numbers, but one of them has question marks in place of the final page number, or has no final page number at all, such as "123--127", "123--??", and "123", then the ones with the question marks or no final page numbers will be dropped. This facilitates merging in data from library databases that do not record final page numbers.
Value strings are considered equal if they match after all non-alphanumeric characters are removed, and letter case is ignored. This choice helps to eliminate many match failures that arise from minor variations in punctuation, spacing, and capitalization. bibjoin has no way of determining which of the two strings should be preserved, so it uniformly discards the shorter one (which presumably has less `information'): this choice will frequently be wrong!
Syntax errors in the input stream will cause abrupt termination with a fatal error message and a non-zero exit code. The output will be incomplete, so you should always examine the output file before assuming that you can replace the input file with the output file.
Key/value pairs in output entries are sorted alphabetically by key name, so that duplicate keys arising from the join operation appear consecutively, simplifying the subsequent manual editing task.
After completion of manual corrections, it is recommended that the bibliography be processed by biborder(1) to standardize key/value order, and to check for any remaining duplicate keys or citation labels.
To avoid confusion with options, if a filename begins with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g. /tmp/-foo.bib or ./-foo.bib.
OPTvolume = "??",The OPT prefix ensures that the key is ignored by BibTeX, so that the question marks will not appear in an output .bbl file. The GNU Emacs bibtex-mode editing support has functions for removing the OPT prefixes, and so does bibclean(1).
The doubled question marks are distinguished from single ones that might legitimately appear in value strings, and also serve as a convenient regular-expression pattern for bibextract(1), allowing easy preparation of a printed listing of just those entries that have incomplete bibliographic data:
bibextract '' '[?][?]' BibTeXfiles | lpr
Nelson H. F. Beebe, Ph.D. Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 Tel: +1 801 581 5254 FAX: +1 801 581 4148 Email: <firstname.lastname@example.org> WWW URL: http://www.math.utah.edu/~beebe