Department of Mathematics - University of Utah

HomeComputingCourse SchedulesCSMECurrent PositionsFAQ (Computing)FormsGraduateHigh SchoolLecture VideosMailbox AccessMath BiologyMath EducationNewsletterPeopleResearchRTG GrantsSeminars

BibTeX FAQ

Last updates: Tue Dec 15 10:24:43 2015     …      Thu Mar 23 12:36:33 2017                Valid HTML 4.0! Valid CSS!

Table of contents

  1. What is BibTeX?
  2. How do I use BibTeX?
  3. Where is BibTeX defined?
  4. What BibTeX archives are available?
  5. How do I create a BibTeX entry by hand?
  6. How do I add comments to a BibTeX file?
  7. How do I find a BibTeX entry on our computers?
  8. How do I find a BibTeX entry on the Internet?
  9. What are BibTeX document types?
  10. What are BibTeX string abbreviations?
  11. What is a BibTeX citation label?
  12. Which BibTeX field values are mandatory?
  13. How do I represent special characters and mathematics in BibTeX?
  14. How do I check a BibTeX entry or file?
  15. How do I parse a BibTeX entry or file?
  16. How do I prettyprint a BibTeX entry or file?
  17. How do I standardize field order in a BibTeX entry or file?
  18. How do I sort entries in a BibTeX file?
  19. How do I extract entries from a BibTeX file?
  20. How do I join entries from a BibTeX file?
  21. How do I get BibTeX entries into an SQL database?
  22. How do I search for BibTeX entries in an SQL database?
  23. How do I handle missing data in a BibTeX entry?
  24. What is a DOI, and why do I want it?
  25. What is an ISBN, and why do I want it?
  26. What are CODENs and ISSNs, and why do I want them?
  27. What are MRnumbers and ZMnumbers, and why do I want them?
  28. How do I get a bibliography in each book chapter?
  29. How do I index cited authors and editors?
  30. How do I convert library catalog data to BibTeX entries?
  31. What other bibliography tools can I use?
  32. Why should I use quotes instead of braces around BibTeX field values?
  33. How do I handle unusual personal names and titles in BibTeX field values?
  34. How do I know which Jim Smith is the author?
  35. How do I use braces in BibTeX fields?
  36. How do I handle lists of URLs?
  37. Can I contribute to our local BibTeX archives?

Questions and answers

  1.   What is BibTeX?

    BibTeX is an easy-to-use system for markup of bibliographic data. Each BibTeX entry supplies data that describe a particular published document, such as a book, conference paper, journal article, manual, dissertation or thesis, Web site, and so on.

    Within each BibTeX entry, there are named fields (author, title, year, ...) with assigned values, and any literate human with some familiarity with the English language can readily understand a BibTeX entry without having to consult program documentation.

    BibTeX entries are written in plain text, so they can be cut-and-paste in window systems, e-mailed to correspondents, processed by almost any programming language that can handle textual data, produced by Web search engines, and so on. Plain ASCII, with just 95 printable characters, suffices to write any BibTeX entry in any of the many languages that use the Latin alphabet, possibly extended with accented and other special characters.

    In scholarly writing, publications tend to be cited multiple times, and often in multiple formats that differ from one another in abbreviations, field order, fonts, and punctuation. With multiple uses and output formats, it is sensible to create the data just once with suitable markup that identifies the various fields. A computer program can then handle the tedious work of finding the cited references, sorting them in a requested order, extracting the various fields (author, title, year, and so on), formatting them according to a particular style, and outputting them in a list whose entries can be automatically named or numbered. A document-formatting system, such as TeX and LaTeX, can then use that data to typeset a reference list in a document.

    Properly marked-up bibliographic data are reusable, shareable, and operating-system and computer-platform independent. It is therefore of high value to do a good job of the markup, and share the data with others. If the data have been carefully prepared, there is no reason for anyone else ever to have to do that job again for the same document.

    BibTeX has proved to be an outstanding system for preparation of bibliographic data, because it is well documented, highly portable, and importantly, free of charge. It runs on all modern computer platforms, from microcomputers to supercomputers, and even on mobile devices such as laptops, tablets, and watches.

    BibTeX is also exceedingly reliable: there have never been any show-stopping bugs in over 30 years of use on millions of computers, and there are only a handful of known design limitations that eventually must be addressed and properly handled. Most BibTeX users will never encounter such problems.

  2.   How do I use BibTeX?

    Although carefully written BibTeX entries can be used both with plain TeX and with LaTeX, most authors today use the latter, because of its higher-level markup, and its huge repertoire of support packages. Henceforth, we describe BibTeX use primarily with LaTeX.

    The author of a LaTeX document needs to do only two or three things to cite BibTeX entries:

    When La(TeX) processes your document, the various citation macros record in an auxiliary file with extension .aux each citation label in their arguments, and in addition, once the data have been processed by BibTeX, format the reference at the point of citation. On the first run, when the data are still unknown, they just output a question mark for each citation label.

    A subsequent run of BibTeX collects from the auxiliary file all of the cited labels, all of the names of the BibTeX files from the \bibliography{...} command, and the single BibTeX style name from the \bibliographystyle{...} command. It then collects all of the citation data, formats it in the requested style, and writes its results into another file with extension .bbl. (La)TeX tries to read that file when it processes the \bibliography{...} command, but on the first pass, it does not yet exist, so (La)TeX just issues a harmless warning message, and continues typesetting.

    At no time does BibTeX ever have to read, and process, any of your (La)TeX files: it just reads the files with extensions .aux and .bib, and outputs the .bbl file to be typeset, and a log file with extension .blg. If any warnings are produced, they are also written on the standard output (usually your terminal window); otherwise, BibTeX is silent.

    The separation of BibTeX from (La)TeX files is important: nothing that you do in the latter, no matter how complex they might be, can ever confuse BibTeX, because it never sees those files at all.

    It should now be clear from that description that multiple programs must be run to produce your final typeset document. In the simplest case, three steps are needed:

    % latex  myfile.ltx
    % bibtex myfile
    % latex  myfile.ltx
    

    However, it is possible for a BibTeX entry to itself contain citation commands, and those will produce more auxiliary file entries on the second (La)TeX run, requiring yet another pair of BibTeX and (La)TeX commands to reach consistency. In complex bibliographies, even more passes might be needed, such as when one entry cites a second, that cites a third, that cites a fourth, and so on. You can tell that it is time to stop running command pairs when there are no more warnings on standard output about missing entries.

    Because most authors run (La)TeX many times during the preparation of a document, the extra runs are of no importance: once the citations and their bibliographic data have stabilized, no further BibTeX runs are needed, and only a single (La)TeX run is needed to typeset your document with its latest changes.

    For complex documents where it is important to ensure consistency, you should record the commands to do so in a shell script that you can execute as just one command. Even better, put those commands in a Unix Makefile, and let the wonderful make utility figure out from your file dependency declarations and file timestamps what commands need to be run, and when it is time to stop. For more on that program, see a FAQ entry elsewhere in this document.

  3.   Where is BibTeX defined?

    BibTeX was first described as a literate program, bibtex.web containing both documentation and code. It was supplemented by two documents, BibTeXing (btxdoc.pdf) and Designing BibTeX Styles (btxhak.pdf) that describe how to use it, and how to modify existing reference-list styles and create new ones. Those files can all be found online in the Comprehensive TeX Archive Network (CTAN) or in any of the annual TeX Live software releases.

    BibTeX then was briefly introduced in Appendix B of the first (1985) and second (1994) editions of the LaTeX User's Guide and Reference Manual , and in more detail in a full chapter of each of the first (1994) and second (2004) editions of The LaTeX Companion.

    Those informal descriptions lack the grammatical rigor that is needed for writing robust software for processing BibTeX files by other tools. That defect was remedied in 1993 by the presentation of a formal markup language grammar for BibTeX in Bibliography prettyprinting and syntax checking and the development of special highly portable programs for lexing, unlexing, parsing, prettyprinting, and syntax checking of BibTeX data. Since that time, hundreds of additional tools for dealing with BibTeX data have been created in our department, and the software corpus now amounts to hundreds of programs and hundreds of thousands of lines of code. No other bibliographic markup system enjoys such a richness of software tools, some of which are described in a 2004 paper A Bibliographer's Toolbox, and in a 2009 paper BibTeX meets relational databases that shows how to turn clean BibTeX files into Structured Query Language (SQL) databases that allow powerful searching and selection.

  4.   What BibTeX archives are available?

    At our site, there are two large bibliographic collections:

    In Fall 2015, they contain almost 1,200,000 entries. Both collections are freely available, and the first paragraphs of their Web pages point to later sections on how to automatically mirror the collections.

    The first is primarily author-specific bibliographies in numerical analysis and quantum mechanics, with a few subject-specific bibliographies.

    The second, and much larger, archive contains more than 800 bibliographies of journals in computational mathematics, computational physics, computer science, databases, fonts and typography, mathematical physics, numerical analysis, probability and statistics, pure mathematics, and quantum physics. There are also dozens of bibliographies on specific subjects, such as computer arithmetic, cryptography, elementary and special functions, programming languages, and American, European, International, and Internet Standards related to computing.

    Several large publishers allow searching and downloading of bibliographic data in BibTeX (and other) forms. They include:

    Most scientific and professional societies have similar services, such as these:

    Some of those sources make citation download options easy to find, while others make you work harder. Some will only return one BibTeX entry at a time, while others can return search results with several BibTeX entries.

    In computer science, there are two large archives, each with with millions of entries:

    There are two useful online services for generating BibTeX entries:

    Mref lets you paste in a formatted reference, or BibTeX entry, and get back a BibTeX entry from the MathSciNet database. Although the latter is a licensed database, Mref is free. It relies on the data in your input being in MathSciNet; if you enter data for, say, a paper in psychology, you will not get a BibTeX entry for it.

    doi2bib takes a Digital Object Identifier (DOI), and tries to return a BibTeX entry for that publication. DOIs are discussed elsewhere in this document.

    We have developed numerous software tools for mining many important publisher databases. If you need to do extensive searching of the literature, such as to prepare a large bibliography of a particular author, or on a specific subject, please consult our computing staff.

  5.   How do I create a BibTeX entry by hand?

    The Emacs text editor has lots of support for dealing with BibTeX files, especially because of the many additional libraries developed at Utah for that purpose. Templates for entries can be created by two keystrokes.

    If you use a different text editor, the best approach seems to be to keep a small file of templates for each of the 13 BibTeX document types, and then use copy-and-paste to get them into your editor.

    Once an empty template is in your editor, your job is then just to fill in the empty strings in the field assignments, and create a suitable citation label.

    This document's author also makes extensive use of keyboard function keys to run commonly needed commands in BibTeX work, and he has dozens of common text fragments attached to Emacs registers that greatly reduce the amount of typing needed to create new BibTeX entries, and maintain old ones. Most of the Emacs extensions for BibTeX and LaTeX are available at their online site.

  6.   How do I add comments to a BibTeX file?

    The original, and still only, implementation of BibTeX did not address the issue of comments, yet three decades of use of BibTeX markup shows that it is essential to be able to include documentation for collections of BibTeX entries. BibTeX's scanner looks for patterns of the form @Identifier{...}, and ignores everything else. Inside the entry, braces and quotation marks must be properly balanced, and field assignments must be separated by commas. The last assignment can optionally be followed by a comma before the right brace that terminates the BibTeX entry. Thus, almost anything outside a BibTeX entry is a `comment', but it cannot contain an at-sign! Such a character would trigger the scan for a valid BibTeX entry.

    The 1993 formal grammar for BibTeX remedies that deficiency, and defines a rigorous comment syntax that follows the TeX convention that anything from percent to end of line, and any following leading whitespace on the next line, is a comment. The Utah bibliography archives are all formatted similarly, and each begins with a large comment block that describes its contents: see, for example, the bibliography of the discoverer of the quantum, Max Planck.

  7.   How do I find a BibTeX entry on our computers?

    There are three main tools for searching the Utah bibliography archives on our local machines.

    The bibsearch utility lets you enter search strings, optionally with Boolean operators. It does full-text searches on entire BibTeX entries, each of which it considers a `document', and returns its results with the most likely candidates at the beginning. Its opening banner gives hints on how to search.

    The bibsql tool provides an interface to any of three functionally equivalent databases: MariaDB, MySQL, and PostgreSQL. Each has more than one server, so that we can ensure service even if a machine is temporarily down, and so that we can compare performance between databases on different CPUs and disk technologies. You can choose the database type on the command line (there are minor syntactic differences in search syntax between PostgreSQL and the other two), and optionally, the server hostname, though that is rarely necessary, unless a particular server is unreachable. Here are some examples:

    # Shortest forms:
    % bibsql -s ma
    % bibsql -s my
    % bibsql -s p
    
    # Longer forms (use any unique abbreviation of the server type):
    % bibsql -s mariadb
    % bibsql -s mysql
    % bibsql -s postgresql
    

    See another FAQ entry in this document for examples of SQL queries.

    In addition to the two BibTeX-specialized tools, local users can always run any of the standard Unix search tools in the grep family, which also includes agrep, egrep, fgrep, rgrep, sgrep, and seegrep. All that you need to know is where the BibTeX files are stored on our systems:

    % grep 'Max Planck'        /u/ftp/pub/tex/bib/*.bib
    % grep 'Werner Heisenberg' /u/ftp/pub/bibnet/*/*.bib /u/ftp/pub/bibnet/*/*/*.bib
    

    Because of the large size of the archives, you are likely to get a lot of output, so you may wish to redirect it to a file for later examination in a text editor, or send it to a screen pager utility:

    % grep 'Max Planck' /u/ftp/pub/tex/bib/*.bib > foo
    % grep 'Max Planck' /u/ftp/pub/tex/bib/*.bib | less
    

    Because the grep family operate on lines of text, you cannot find a multiword string if it happens to contain line breaks. That is precisely why the entry-oriented bibsearch and the field-oriented bibsql tools were developed!

    You can remove the line-breaking problem by prettyprinting the BibTeX input into a temporary file with field values on one line:

    % bibclean --quiet --max-width 0 *.bib > /tmp/tmp.bib
    % grep '...pattern...'                   /tmp/tmp.bib
    
  8.   How do I find a BibTeX entry on the Internet?

    Another FAQ entry in this document contains pointers to numerous publisher databases and other archives that can all supply search results in BibTeX form. Some require licenses, and thus, might not be available to you off campus. The AMS Mref, CrossRef, and doi2bib resources discussed there are particularly useful. If that case, Web searches for known fragments of the reference might find a BibTeX entry that someone else has created.

    If you know the journal of the article whose BibTeX entry you seek, check whether that journal is covered in the Utah archives, or use a Web search to find its publisher site: you may be able to retrieve a BibTeX entry that way.

  9.   What are BibTeX document types?

    Because centuries-old bibliographic tradition distinguishes between different kinds of documents when their data are presented in a reference lists, BibTeX does too. For example, many styles set book titles in italics to emphasize them, but bizarrely, for journal articles, set the title in roman, and the less-important journal title in italics.

    BibTeX consequently identifies document types by the case-insensitive name that follows the at-sign: Article, Book, Booklet, InBook, InCollection, InProceedings, Manual, MastersThesis, Misc, PhDThesis, Proceedings, TechReport, and Unpublished.

    Because only a small page count distinguishes it from a book, the Booklet type is rarely used, as this SQL search in the Utah archives demonstrates:

    > select count(*), bibtype from bibtab
             group by bibtype
             order by bibtype;
    
    +----------+---------------+
    | count(*) | bibtype       |
    +----------+---------------+
    |  1033252 | article       |
    |    47687 | book          |
    |       65 | booklet       |
    |      115 | inbook        |
    |     5489 | incollection  |
    |    40982 | inproceedings |
    |     1910 | manual        |
    |     1417 | mastersthesis |
    |     5772 | misc          |
    |      441 | periodical    |
    |      987 | phdthesis     |
    |    17261 | proceedings   |
    |    10884 | techreport    |
    |     1827 | unpublished   |
    +----------+---------------+
    

    Similarly, the two thesis types are effectively identical. Neither defines the kind of degree: you do that yourself in the value assigned to the type field, as this sampling from the Utah archives shows:

    > select distinct bibtype, type from bibtab
             where bibtype like '%thesis%'
             order by bibtype, type;
    +---------------+---------------------------------------------------+
    | bibtype       | type                                              |
    +---------------+---------------------------------------------------+
    | mastersthesis | B.A. (Honours History)                            |
    | mastersthesis | Certificate of Postgraduate Study Dissertation    |
    | mastersthesis | Computer Science Thesis (M.S.)                    |
    | mastersthesis | Diplom Arbeit                                     |
    | mastersthesis | Hovedoppgave i datafag (Computer Science thesis)  |
    | mastersthesis | M.E.E. thesis                                     |
    | mastersthesis | Magisterarbeit                                    |
    | mastersthesis | Memoire de diplome d'ingenieur                    |
    | mastersthesis | Thesis (Doctor of Engineering)                    |
    | phdthesis     | Doktors der Naturwissenschaften (Dr. rer. nat.)   |
    | phdthesis     | Doktorsavhandlingar                               |
    | phdthesis     | Dr.-Ing.                                          |
    | phdthesis     | Ed.D. thesis                                      |
    | phdthesis     | Habilitationsschrift                              |
    | phdthesis     | Ph.D. Thesis in Mathematics                       |
    | phdthesis     | Thesis (D.Phil.)                                  |
    | phdthesis     | Thesis (Ph.D.) in geography                       |
    +---------------+---------------------------------------------------+
    

    Apart from degree and institution information, a thesis entry is much like a book, and should carry similar information. Here is an example:

    @PhdThesis{Pugh:2004:ALG,
      author =       "Glendon Ralpha Pugh",
      title =        "An Analysis of the {Lanczos} Gamma Approximation",
      type =         "{Ph.D.} thesis",
      school =       "Department of Mathematics, University of British
                     Columbia",
      address =      "Vancouver, BC, Canada",
      pages =        "viii + 154",
      year =         "2004",
      ISBN =         "0-612-99536-4",
      ISBN-13 =      "978-0-612-99536-9",
      LCCN =         "AW5 .B7 2005-995364",
      bibdate =      "Mon Nov 24 20:55:30 2008",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/l/lanczos-cornelius.bib",
      acknowledgement = ack-nhfb,
    }
    

    Notice that, through a regrettable historical design choice that has propagated to almost all BibTeX styles, the type field value is subject to downcasing, so the degree abbreviation needs protecting braces.

    The three InXXX types cause the most confusion, and many online sources of BibTeX data get at least two of them wrong.

    The InProceedings type is easy: it is used for a paper that appears in a conference proceedings.

    Use the InBook type when you wish to cite a particular chapter of a book where the chapters do not have identifiable separate authors. You should then supply chapter and pages field values in your entry. You can record the page information for the entire book in a bookpages field value, although only a few BibTeX styles recognize that value. All BibTeX styles silently ignore any field assignments that they are not programmed to handle, so no interstyle conflicts arise.

    Use the InCollection type to cite a portion of a book where the chapters have different authors: the covers of such books usually name the responsible individuals as editors. Those people should be given in an editor, rather than author, field assignment,

    Although the three InXXX types recognize editor, booktitle, publisher, and address fields, you are strongly urged not to use them. Instead, create a companion Book or Proceedings entry, and then use the crossref field in the publication's BibTeX entry to access that information. That way, you do not need to duplicate information when you have multiple chapters or papers in your bibliography from the same book or conference, and you can also cite those publications themselves when they each have their own BibTeX entry.

    The Misc (miscellaneous) type was provided in the original BibTeX styles as a catchall for documents that don't seem to fit any of the standard types. Examples include declarations, laws, manifestos, media productions (audio and video broadcasts and films), medieval manuscripts, musical scores, patents, treaties, Web sites, and so on. The only mandatory field in a Misc entry is the title field, but you do your readers a misservice if you fail to include sufficient extra information for them to locate the reference. Here are some examples:

    @Misc{Torkoly:1974:LK,
      author =       "T{\"o}rk{\"o}ly{ }Anna",
      title =        "{L{\'a}nczos Korn{\'e}l}",
      howpublished = "MTV Televideo Publishers, Budapest, Hungary",
      year =         "1974",
      bibdate =      "Tue Jun 14 15:51:14 2011",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/l/lanczos-cornelius.bib",
      note =         "Video film about Cornelius Lanczos. Possibly reissued
                     in 1993.",
      acknowledgement = ack-nhfb,
    }
    
    @Misc{Szilard:1926:CBL,
      author =       "Leo Szilard and Albert Einstein",
      title =        "[Correspondence between {Leo Szilard} and {Albert
                     Einstein}]",
      howpublished = "Personal and scientific correspondence between the
                     authors.",
      pages =        "132",
      year =         "1926--1945",
      bibdate =      "Thu May 21 06:36:09 2015",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/s/szilard-leo.bib;
                     http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      URL =          "http://library.ucsd.edu/dc/object/bb24247539",
      acknowledgement = ack-nhfb,
      remark =       "The draft of the 1939 Einstein--Roosevelt letter that
                     Szilard prepared does not appear in the here, but there
                     is a pointer to a copy of that letter in the Harold
                     Urey papers. There are, however, copies of the final
                     version; the original is held with the FDR Papers in
                     Hyde Park, NY.",
    }
    
    @Misc{Einstein:1930:R,
      author =       "Albert Einstein and Leo Szilard",
      title =        "Refrigeration",
      howpublished = "US Patent 1,781,541.",
      pages =        "4",
      day =          "11",
      month =        nov,
      year =         "1930",
      bibdate =      "Tue Sep 13 15:09:21 2011",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/s/szilard-leo.bib;
                     http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "Application filed December 16, 1927 (serial number
                     240,566) and in Germany, December 16, 1926. See
                     \cite{Dannen:1997:ESR,Dannen:1997:SRD} for accounts of
                     this invention.",
      URL =          "http://www.google.com/patents?id=t0BRAAAAEBAJ",
      acknowledgement = ack-nhfb,
    }
    
    @Misc{Einstein:1948:LUM,
      author =       "Albert Einstein",
      title =        "Letter on universal military training addressed to the
                     {Chairman} of the {Senate} committee",
      howpublished = "US Congress, Senate, Committee on Armed Services,
                     Hearings on Universal Military Training",
      pages =        "257--257",
      year =         "1948",
      bibdate =      "Sat Jul 17 12:42:40 2010",
      bibsource =    "http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "Read 24 March 1948.",
      acknowledgement = ack-nhfb,
      Calaprice-number = "255",
    }
    
    @Misc{Obenhous:1986:EBC,
      author =       "Mark Obenhous and Chrisann Verges",
      title =        "{Einstein} on the beach --- the changing image of
                     opera",
      howpublished = "Obenhous Films, Inc. (New York)",
      day =          "31",
      month =        jan,
      year =         "1986",
      bibdate =      "Tue Jan 31 06:59:57 2012",
      bibsource =    "http://www.math.utah.edu/pub/tex/bib/einstein.bib;
                     z3950.loc.gov:7090/Voyager",
      acknowledgement = ack-nhfb,
    }
    
    @Misc{Ledger:2014:WCM,
      author =       "James Ledger",
      title =        "When {Chaplin} met {Einstein}: for mixed chamber
                     ensemble",
      howpublished = "Australian Music Centre, Grosvenor Place, NSW,
                     Australia",
      year =         "2014",
      bibdate =      "Fri Aug 21 10:12:37 2015",
      bibsource =    "http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "21-page musical score.",
      acknowledgement = ack-nhfb,
      ISMN =         "9790720152752",
    }
    
    @Misc{vanderWaals:2010:EMK,
      author =       "J. D. {van der Waals, Jr.}",
      title =        "Eulogy: {Max Karl Ernst Ludwig Planck} (1858--1947)",
      howpublished = "Web site",
      day =          "24",
      month =        sep,
      year =         "2010",
      bibdate =      "Sat Aug 22 11:28:05 2015",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/p/planck-max.bib",
      URL =          "http://www.dwc.knaw.nl/DL/levensberichten/PE00002335.pdf",
      acknowledgement = ack-nhfb,
    }
    

    The last of the 13 standard document types is Unpublished, yet it is also perhaps the most problematic. The point of a literature reference is to allow the reader to investigate further, but if the cited work was never published, how is that possible? Such types should therefore be used sparingly, and journal editors may reject citations of them, unless you can supply additional information. Here is just one example, the first entry in Einstein's bibliography:

    @Unpublished{Einstein:1895:UAI,
      author =       "Albert Einstein",
      title =        "{{\"U}ber die Untersuchung des Aetherzustandes im
                     magnetishen Felde}. ({German}). [{On} the investigation
                     of the state of the ether in magnetic fields]",
      pages =        "5",
      year =         "1895",
      bibdate =      "Mon Sep 21 15:58:18 2009",
      bibsource =    "http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "Dated between 05/01/1895 and 09/25/1895.",
      URL =          "http://www.alberteinstein.info/db/ViewDetails.do?DocumentID=34390",
      acknowledgement = ack-nhfb,
      language =     "German",
      remark =       "Mahaffey:2009:AAN (p. 51) and Gardner:1993:CMA (p. 92)
                     cite a five-page paper written by Einstein in high
                     school, perhaps about 1896, titled (in English
                     translation) ``The Investigation of the State of Ether
                     in Magnetic Fields'' or ``Concerning The Investigation
                     of Ether in Magnetic Fields''. The Einstein Archives
                     contain this record, which seems to correspond to those
                     references, demonstrating that Einstein knew about the
                     problem of the Ether, and thus, likely also the results
                     of the Michelson--Morley Experiment before his 1905
                     work on Special Relativity. However, the matter remains
                     controversial: see discussion and remarks
                     \cite{Shankland:1963:CAE,Shankland:1973:CAE,Ogawa:1979:JEE,Einstein:1982:HCT,Itagaki:1999:EKL,vanDongen:2009:RMM}.",
    }
    

    The title field exhibits a practice recommended by the American Mathematical Society: whenever a title is in a language other than that of the main corpus of entries, add a parenthesized language name, a translation into the main language, and a language field value. Although you might be familiar with that foreign language, we live in a world with thousands of human languages and widespread Internet access. It is likely that many of your readers will not be able to understand the title, or even to identify its language.

    For foreign-language titles, if you can track down translations of the document to your language, or to languages more widely known to your readers, you can point to the translation(s) in a BibTeX note field.

    The remark field value is ignored by all current BibTeX styles, so such data never appear in the output reference list. It is a good place to record extra information that might be useful to anyone who can see the BibTeX entry. Here, we provide a Web URL that some BibTeX styles recognize and include in the reference list.

    Even if you do not understand the title of a foreign language publication, you may be able to produce an acceptable translation into your own language with the help of the Google Translate or Microsoft Bing Translator services. To this author's knowledge, they are based primarily on machine learning from documents produced by competent human translators (those of the United Nations and the European Union are good examples). Thus, while they do make many mistakes, and their accuracy varies from language to language, you, as a human, can probably correct their output. Here is an example of an original title, two machine translations, and its corrected translation by someone fluent in both languages, as well as in the title's subject matter:

    % Original with TeX markup in plain ASCII (and also valid Unicode UTF-8!)
      title =        "Consid{\'e}rations th{\'e}oriques g{\'e}n{\'e}rales
                     sur la structure du noyau",
    
    % Title in HTML markup and displayed in Unicode UTF-8 encoding:
      title =        "Considérations théoriques générales
                     sur la structure du noyau",
    
    % From Google Translate:
      title =        "General theoretical considerations on the core structure",
    
    % From Bing Translator:
      title =         "Theoretical considerations on the structure of the nucleus",
    
    % Final human-corrected translation:
      title =        "General theoretical considerations of the structure of the
                     nucleus",
    

    For field values in other than extended Latin scripts, there are two choices. One is to code the value in the computer character set of that language, and the other is to supply a transliteration into Latin letters. The first is problematic, because there are hundreds of different character set encodings, and a bibliography might well require several different ones. Also, the document formatting system somehow needs to recognize the script, and have fonts to represent it: that is unlikely to be possible in general. The Unicode Standard for a unified encoding of all of the world's major written scripts should eventually provide a partial solution, but a reader unfamiliar with that script is still helpless. There are still serious font limitations in a lot of software, including (La)TeX and BibTeX, that make portable handling of non-ASCII text difficult. At present, transliteration seems the best approach, and that is the choice of some online databases. Here is an example from a BibTeX entry for a Russian-language publication:

      title =        "Optimal'naja fokusirovka teplovogo potoka neodnorodnoj
                     teploprovodjashhej sredoj (zadacha o ``termolinze'').
                     ({Russian}) [Optimum focusing of a heat flux by a
                     nonuniform heat-conducting medium (the ``heat lens''
                     problem)]",
      language =     "Russian",
      URL =          "http://journals.ioffe.ru/jtf/1988/01/page-67.html.ru",
    

    In the BibTeX entry from which that example came, the original article was found at the indicated URL, and then the useful Cyrillic-to-Latin transliteration service took the cut-and-pasted Russian title, and converted it to the Latin alphabet. The translation to English was supplied by one of the article's multilingual authors. For comparison, the Google Translate version of the original Cyrillic text is The optimum focus thermally conductive heat flow inhomogeneous medium (the problem of ``thermal lens'') and the Bing Translator returns Optimum focussing heat flow non-uniform thermal environment (the ``termolinze'').

  10.   What are BibTeX string abbreviations?

    In any large collection of data, it is likely that many fragments are repeated. Thus, most programming languages have variables that you can assign values to for later use, and some, like C and C++, have a macro processor that you can use to given convenient names to fixed values. The general rule known to experienced computer programmers for decades is that if a magic number or string is repeated in your text, it must have a name to ensure a consistent value everywhere.

    Three fields in BibTeX entries are good candidates for such repeated names, and the Utah archives make extensive use of them with input like this:

    @String{j-ANN-PHYS-1900         = "Annalen der Physik (1900)"}
    
    ...
    
    @Article{Einstein:1901:FCG,
    ...
      journal =      j-ANN-PHYS-1900-4,
    ...
    }
    
    @String{pub-YALE                = "Yale University Press"}
    @String{pub-YALE:adr            = "New Haven, CT, USA"}
    
    @Book{Vargish:1999:IMR,
    ...
      publisher =    pub-YALE,
      address =      pub-YALE:adr,
    }
    

    BibTeX defines short internal abbreviations for a few journals in computer science, but that was a bad idea, because there are tens of thousands of journals, and there is no good reason to single out a handful for special recognition. Instead, it is better to allow the BibTeX user to supply BibTeX @String{...} abbreviations for journal names, and publisher names and addresses.

    BibTeX also has internal three-letter abbreviations for the English names of months of the year: you should always use those abbreviations. That way, BibTeX styles for use in other languages can redefine them and display dates in the conventions of that language. Thus, input like

      day =       "12",
      month =     aug,
      year =      "1923"
    

    might be typeset as August 12, 1923, or 12 August 1923, or 12 Aug. 1923 in an English-language style, but as le 12 août 1923 or 1923.08.12 in a French style.

    There are no single standard list of abbreviations of journal names. The practice in the Utah archives is generally to find the journal's entry in the US Library of Congress catalog, where variant names and abbreviations are usually listed. The BibTeX string abbreviation name is then created from the catalog abbreviation by replacing consecutive nonalphanumerics by a single hyphen, uppercasing the text, and then prefixing it with j-. Thus, the Journal of Chemical Education is found to have an abbreviation J. Chem. Educ., and the BibTeX string abbreviation name becomes j-J-CHEM-EDUC.

    That algorithm for producing BibTeX string abbreviations is reversible: it is trivial for a human to recover an abbreviated journal name from the string name.

    In some fields, popular journals are widely known by nicknames, or initials of their names. In such cases, it may be better to base BibTeX string abbreviations on the well-known names. Here are some examples:

    @String{j-CCCUJ                 = "C/C++ Users Journal"}
    
    @String{j-IJQC                  = "International Journal of Quantum Chemistry"}
    
    @String{j-JACS                  = "Journal of the American Chemical Society"}
    
    @String{j-LOGIN                 = ";login: the USENIX Association newsletter"}
    
    @String{j-NAMS                  = "Notices of the American Mathematical
                                      Society"}
    
    @String{j-PHYSIS                = "Physis: Rivista Internazionale di Storia
                                      della Scienza"}
    
    @String{j-SIGNUM                = "ACM SIGNUM Newsletter"}
    
    @String{j-SIGPLAN               = "ACM SIG{\-}PLAN Notices"}
    
    @String{j-TOMS                  = "ACM Transactions on Mathematical Software"}
    
    @String{j-TOPLAS                = "ACM Transactions on Programming
                                      Languages and Systems"}
    

    Because abbreviated journal names are frequently ambiguous, or unrecognizable to people unfamiliar with that journal's subject area, the Utah archives generally use the full name of the journal. If a user of the BibTeX data later wishes to instead have abbreviated journal names in her reference list, she can simply add new definitions after the main ones, such as in these examples:

    @String{j-IJQC                  = "International Journal of Quantum Chemistry"}
    @String{j-FOUND-CHEM            = "Foundations of Chemistry"}
    @String{j-J-APPL-PHYS           = "Journal of Applied Physics"}
    
    @String{j-IJQC                  = "Int. J. Quant. Chem."}
    @String{j-FOUND-CHEM            = "Found. Chem."}
    @String{j-J-APPL-PHYS           = "J. Appl. Phys."}
    

    For similar reasons, the Utah archives avoid the widespread practice of abbreviating the publisher address to just a city: instead, the default string abbreviations expand to the city, region, and country. A user who prefers a shorter, but possibly more ambiguous, address, can always supply redefinitions:

    @String{pub-ADENINE-PRESS:adr   = "Guilderland, NY, USA"}
    @String{pub-ELSEVIER:adr        = "Amsterdam, The Netherlands"}
    @String{pub-INTERSCIENCE:adr    = "New York, NY, USA"}
    
    @String{pub-ADENINE-PRESS:adr   = "Guilderland, NY"}
    @String{pub-ELSEVIER:adr        = "Amsterdam"}
    @String{pub-INTERSCIENCE:adr    = "New York"}
    

    Please note that it is extremely poor practice to use concatenation of multiple string abbreviations in place of a simple text string: one sometimes encounters nonsense like this in entries found on the Web:

    @String{gen     = "General "}
    @String{theory  = "Theory "}
    @String{rel     = "Relativity "}
    @String{the     = "The "}
    ...
      title = the # gen # theory # " of " # rel,
    

    While BibTeX accepts that, the entry would not be found in a string search for General Theory of Relativity.

  11.   What is a BibTeX citation label?

    The handle that connects an inline citation in a (La)TeX document with a particular BibTeX entry in a file listed in the \bibliography{...} command is the citation label, which appears as the first word following the open brace of a BibTeX entry.

    The original documentation of BibTeX is vague about what characters may be used in a label. The 1993 grammar recommends that citation labels and field names begin with an English letter, optionally followed by letters, digits, colon, hyphen, and possibly plus and slash. In the two decades of BibTeX use since then, the plus and slash have not been needed; they are found in only 32 entries out of more than 1.16 million, and then only in old and stable bibliography files.

    Some BibTeX users construct labels from all author names and abbreviated years, such as in this example:

    @Article{BohrKramersSlater24,
      author =       "Niels Bohr and Henrik A. Kramers and John Clarke
                     Slater",
      title =        "{{\"U}ber die Quantentheorie der Strahlung}.
                     ({German}) [{On} the quantum theory of radiation]",
    ...
    }
    

    The AMS, EMS, and JSTOR databases return entries labeled with a database entry number:

    @article {MR0029101,
        AUTHOR = {von Neumann, John},
         TITLE = {On rings of operators. {R}eduction theory},
       JOURNAL = {Ann. of Math. (2)},
      FJOURNAL = {Annals of Mathematics. Second Series},
        VOLUME = {50},
          YEAR = {1949},
         PAGES = {401--485},
          ISSN = {0003-486X},
       MRCLASS = {46.3X},
      MRNUMBER = {0029101 (10,548a)},
    MRREVIEWER = {F. I. Mautner},
    }
    
    @Article{zbMATH03052105,
        Author = {John {von Neumann}},
        Title = {{On rings of operators. Reduction theory.}},
        FJournal = {{Annals of Mathematics. Second Series}},
        Journal = {{Ann. Math. (2)}},
        ISSN = {0003-486X; 1939-8980/e},
        Volume = {50},
        Pages = {401--485},
        Year = {1949},
        Publisher = {Princeton University, Mathematics Department, Princeton, NJ; Mathematical Sciences Publishers (MSP), Berkeley, CA},
        Language = {English},
        DOI = {10.2307/1969463},
        Zbl = {0034.06102}
    }
    
    @article{10.2307/1969463,
     ISSN = {0003486X},
     URL = {http://www.jstor.org/stable/1969463},
     author = {John Von Neumann},
     journal = {Annals of Mathematics},
     number = {2},
     pages = {401-485},
     publisher = {Annals of Mathematics},
     title = {On Rings of Operators. Reduction Theory},
     volume = {50},
     year = {1949}
    }
    

    Such labels are completely unsuited to routine use in citation commands, because they are long and often unintelligible to humans.

    A better approach that was been widely used in the Utah archives, and has also been adopted by some publishers, is to form a colon-separated triple of first author family name, four-digit publication year, and up to three uppercase letters corresponding to the first three words of the title, ignoring prepositions and articles (whatever their language), and discarding mathematics. If the family name contains spaces, eliminate them.

    In the rare cases where that does not produce a unique citation label, such as three related papers in the same year differentiated only by Parts I, II, and III in their titles, add lowercase letters a, b, c, …, to reach uniqueness.

    Thus, our two examples have labels in the Utah archives of Bohr:1924:QSG and vonNeumann:1949:ROR.

    That scheme has proved robust in decades of use, can be easily done by a human, or by computer programs, biblabel and citesub for standalone use, or by simple keystrokes in the Emacs text editor that invoke functions in the biblabel.el library file. Generally, the three sources agree on the label, and the document author can usually tell from the label alone whether the intended paper is being cited.

    If you fail to produce citation labels that are unique across the set of bibliography files needed for a particular (La)TeX document, BibTeX warns you about multiply defined labels. It is imperative that you investigate and take corrective steps; otherwise, you risk getting a reference-list entry for the wrong document.

  12.   Which BibTeX field values are mandatory?

    The BibTeX documentation of each of the 13 standard document types lists the fields regarded as required (their omission results in a warning, but BibTeX still produces usable output), and several others that are recognized by the standard BibTeX styles, and for which values should be supplied, if such data can be found.

    The Emacs text editor BibTeX support that generates easy-to-fill-in templates for entry types distinguishes between the required and optional fields by prefixing the latter with OPT. Thus, you can, with a couple of keystrokes, or a menu selection, get templates like these:

    @Article{,
      author =       "",
      title =        "",
      journal =      "",
      volume =       "",
      number =       "",
      pages =        "",
      month =        "",
      year =         "",
      DOI =          "",
      URL =          "",
      OPTnote =      "",
      OPTkeywords =  "",
      OPTremark =    "",
      OPTfjournal =  "",
      OPTjournal-URL = "",
      bibdate =      "Wed Dec 16 11:58:16 2015",
    }
    
    @Misc{,
      OPTauthor =    "",
      OPTtitle =     "",
      OPThowpublished = "",
      OPTyear =      "",
      OPTmonth =     "",
      OPTnote =      "",
      bibdate =      "Wed Dec 16 11:58:22 2015",
    }
    
    @TechReport{,
      author =       "",
      title =        "",
      institution =  "",
      year =         "",
      OPTtype =      "",
      OPTnumber =    "",
      OPTaddress =   "",
      OPTmonth =     "",
      OPTnote =      "",
      bibdate =      "Wed Dec 16 11:58:38 2015",
    }
    

    The tab key moves to the next field value, and there are Emacs commands to remove the OPT prefixes when you have supplied values for those fields. The BibTeX prettyprinter and syntax checker, bibclean, has an option that requests prefix removal, and another that completely removes empty optional field assignments. Thus, their presence is of little hindrance in preparation of bibliographic data, and saves you the trouble of trying to remember which fields are strongly recommended.

  13.   How do I represent special characters and mathematics in BibTeX?

    BibTeX is designed to be used by TeX and LaTeX, and such needs are therefore relegated to those markup systems. Although you might be successful in using non-ASCII characters in field value strings, there are serious portability issues. For example, some text editors, including the widely used Emacs editor, attempt to automatically identify the character-set encoding of a file, and when the file is saved, may convert the coding to what that editor believes is the `preferred' one. Such transformations can be disastrous and irreversible. TeX's accent mechanism is capable of handling the accent needs of most of the world's languages, including Vietnamese, which can stack accents two deep, and put them on the top, bottom, and sides of letters.

    Some Web sites display mathematics using Unicode characters, so if you cut-and-paste Web data into a BibTeX field value, you then have text that may approximate the original intent, but is highly likely to be wrong, because Unicode, like ordinary text, is one-dimensional, while mathematics displays are often two-dimensional, and may have operators, such as the square root, whose operands must be suitably delimited.

    The Elsevier ScienceDirect database sadly converts all mathematics in article titles to little bitmap pictures with unpredictable filenames, and the pictures are often too small to be recognizable on modern high-resolution screen displays. You may then have to view the original article on paper, or in a PDF viewer, to figure out what was meant.

  14.   How do I check a BibTeX entry or file?

    For each BibTeX file that you maintain, it is worthwhile to keep a companion LaTeX file that can typeset the whole bibliography. Such files differ little between bibliographies, so you can just copy an existing one, and perhaps change just the title. Here is an example of such a minimal LaTeX file:

    \documentclass{article}
    \begin{document}
        \nocite{*}
        \bibliographystyle{is-unsrt}
        \bibliography{\jobname}
    \end{document}
    

    You then just need to run LaTeX and BibTeX a couple of times, as shown elsewhere in this document, and then display the output DVI file, or if you instead use lualatex, or pdflatex, or xelatex, the output PDF file. Pay attention to the output .blg and .log files: they may contain errors or warnings about your bibliographic data that you need to correct.

    Every BibTeX file in the Utah archives goes through steps like those before a new version can be posted to the Web for others to use. However, that isn't sufficient, because neither BibTeX nor LaTeX care much about the contents of the field values, and BibTeX is sloppier than it should be in parsing its input.

    The bibcheck tool applies numerous heuristic checks on field values, just like the chktex and lacheck programs do for LaTeX files, and antic, cppcheck, flawfinder, its4, lint, rats, splint, and uno do for C and C++ programs. bibcheck will likely catch many problems in your data that you may wish to repair.

    The Utah archives are also subjected to another extensive set of checks that are most conveniently made by checking patterns and values of BibTeX entries in an SQL database. Consult our local systems staff if you wish to learn more about that.

  15.   How do I parse a BibTeX entry or file?

    Because BibTeX entries come from many sources, and are prepared by humans, they are likely to contain errors. It is therefore convenient to have a way of making a quick test, faster than running LaTeX and BibTeX, that the input at least conforms to the rigorous 1993 grammar for BibTeX.

    Such a check is provided by the bibparse tool: a run with no output, and a Unix success return code, means that the input adheres to the grammar. Every one of bibliography files in the Utah archives is first subjected to that check; if it fails, all further processing, including Web installation, is blocked until the errors are fixed.

  16.   How do I prettyprint a BibTeX entry or file?

    The 1993 rigorous grammar for BibTeX was accompanied by free software tools for lexing, unlexing, prettyprinting, and syntax checking BibTeX data. The tool for the latter job is called bibclean. More than 80 options can be used to control its behavior, but reasonable defaults are provided for all them, so most users can get by with a simple command like this:

    % bibclean myrefs.bib > new-myrefs.bib
    

    For more complex bibliographies that require special formatting, the author of this document uses a private script that is essentially equivalent to this set of options:

    % bibclean -keep-string         \
               -keep-preamble       \
               -keep-parbreaks      \
               -remove-OPT-prefixes \
               myrefs.bib           > new-myrefs.bib
    
    

    For documentation on bibclean and the BibTeX grammar, see the original paper Bibliography prettyprinting and syntax checking. For briefer descriptions, try these commands:

    % bibclean --help
    
    % man bibclean
    
  17.   How do I standardize field order in a BibTeX entry or file?

    The bibclean tool discussed elsewhere in this document cleans up entries, aligning assignments and wrapping long lines for improved readability and file maintenance. However, it intentionally never changes the order of the field-name assignments within each entry, nor does it ever change the order of entries.

    BibTeX does not care about the order of field assignments within a single entry. As is usual in programming languages, if there are multiple assignments to the same field name, the last one seen is the defining one, and BibTeX makes only a single pass over each entry, in exactly the order it is written. BibTeX issues warnings for multiple assignments to the same field, like this:

    Warning--I'm ignoring Malcolm:1972:ARP's extra "title" field
    --line 11 of file myrefs.bib
    

    While field-assignment order does not matter to BibTeX, long experience has shown that it is highly desirable for fields to have a consistent order in all entries, because the human eye can then more quickly find data of interest. The tool that does that job is called biborder, and for most users, its command-line options need never be used. Here is an example:

    % biborder myrefs.bib > new-myrefs.bib
    

    Consult its compact help display, or its manual pages, for more details on what else it can do:

    % biborder --help
    
    % man biborder
    

    The default field order closely matches the order that is almost universal in reference lists: author, title, journal, volume, number, pages, day, month, year, and so on.

  18.   How do I sort entries in a BibTeX file?

    When a bibliography file contains only a handful of entries, the order of the entries may not matter much. Many BibTeX users probably just keeping adding new entries to the end of an existing file.

    As the entry count grows, more discipline is desirable, and the bibsort tool provides a score of options, and at least 16 ways to sort the entries.

    In the Utah archives, three particular sort orders have been found desirable, and are used most frequently.

    The first way is appropriate for journal- or series-specific bibliographies, for which one of the options --byarticleno, --byday, --bypages, --byseriesvolume, or --byvolume sorts the entries into publication order; the choice between those options is delicate, and different journals and series may require different sort options.

    The second way is to omit the command-line options, which corresponds to a default order by citation label: that works nicely when labels are chosen systematically, as they are for the Utah archives. Such an order is useful in a bibliography of books from a single publisher, or by a single author, or for the volumes on your bookshelf: the BibTeX file order then matches the author order, and entries for multiple editions of a book by the same author are grouped together.

    The third way, supplied by the --byyear option, sorts entries by year, and within each year, by citation label. Subject-specific bibliographies seem best handled that way, because the oldest publications appear at the beginning, and the newest at the end, making it easy to find both historical and recent material.

    bibsort has an option --reverse to invert the sort order, if you want to have the newest entries at the start. The Utah archives never need it.

  19.   How do I extract entries from a BibTeX file?

    A decent text editor will let you copy BibTeX entries from one file to another, but sometimes you want to pull out entries that share common characteristics, such as having the same document type, or the same author, or the same phrase in a title. The bibextract tool does just that: here are some examples from its manual pages:

    # Extract all entries mentioning chaos in any field:
    % bibextract "" "chaos" bibfile(s) >new-bibtex-file"
    
    # Extract entries with names Brown or Smith occurring in either of the
    # author or editor fields:
    % bibextract "author|editor" "brown|smith" bibfile(s) >new-bibtex-file
    
    # Extract entries for titles containing the letter `z' anywhere after a
    # vowel; note that single quotes are necessary to provide the necessary
    # protection from shell expansion:
    % bibextract "title" '[aeiou].*z' bibfile(s) >new-bibtex-file
    
    # Extract all conference proceedings entries:
    % bibextract "" '@proceedings' bibfile(s) >new-bibtex-file
    
  20.   How do I join entries from a BibTeX file?

    Many, and perhaps even most, sources of bibliographic data on the Web, and even in some publisher databases, are riddled with errors, and often lack data in particular fields. Journal issue numbers, and issue month and day, are commonly missing, even though they may be essential for avoiding a tedious manual search through many issues in print or online. One way to improve bibliographic data quality is to collect data from multiple sources, and then sort and merge entries that appear to belong to the same publication. The tool that does that job is called bibjoin, and its manual pages document how it determines whether adjacent entries can be safely merged. This document's author frequently uses it inside the Emacs text editor, applying it to just a specific highlighted region of the editor buffer.

  21.   How do I get BibTeX entries into an SQL database?

    Two tools support getting data into, and out of, Structured Query Language (SQL) databases: bibtosql and bibsql. They are described in detail in the 2009 paper BibTeX meets relational databases where you can find many examples of SQL searches, from simple string matching to complex queries that involve matching, grouping, and ordering.

    Running an SQL database with MariaDB, MySQL, or PostgreSQL is a challenging task that is documented in the paper, and in the manual pages for bibtosql. It requires administrator access to your computer so that you can create new database accounts, and implement suitable protective firewall rules to guard your server against attack. That is far more than most BibTeX users need, or are prepared to do. Fortunately, as long as your bibliographic archives remain modest in size (say, fewer than 100,000 entries), you don't need all of that complexity, and power.

    Most users with small collections of clean BibTeX data can get by quite nicely with the much simpler, extremely portable, and public-domain, SQLite3 database. Furthermore, it needs only a single database file that requires no special privileges to create, and that is independent of operating system, CPU, and byte order: once it is created, you can use the same file everywhere, burn it on a CD-ROM or DVD, carry around on a thumb drive or mobile device, and if you wish, put it on the Web for others to use.

    SQLite3 is available on all common desktop and server operating systems, and there is a good chance that you already have it on your own computer, because it was automatically installed with some other software. If it isn't there yet, you can almost certainly install it with your system's default package manager. Once you have it, creation of a single SQL database from all of your BibTeX files is a one-line command:

    % bibtosql *.bib | sqlite3 myrefs.db
    

    You can search your new database like this:

    % sqlite3 myrefs.db
    sqlite> ... your input here ...
    

    In SQLite3, you can do almost everything with searches that you can do in a powerful client/server database, except that regular-expression searches are not yet available in that system.

    This document's author has used SQLite3 to search BibTeX entries for the references of a new book to find missing or erroneous data. For example, a query like this

    sqlite> select filename, journal, label from bibtab
              where (bibtype = 'article')
                and ((DOI is NULL) and (URL is NULL))
              order by filename, journal, year, label;
    

    quickly displays a table of all of those entries for journal articles where we have not yet located Web addresses for the full text.

    SQLite3 automatically detects when its database file has been updated, so you can rerun bibtosql as often as you like without having to restart the database client.

    SQL queries are verbose, but that problem is trivially dealt with by keeping a personal file of recent queries. For each new search, you just pick a similar old select command, and edit it to produce a new command tailored for the new search. Over a period of several days work on the book's bibliography, we accumulated more than 200 such commands, and many of them can be reused in future bibliography projects.

  22.   How do I search for BibTeX entries in an SQL database?

    We have used bibsql several times in this document, so if you have read it through already, instead of jumping directly to this section, you have seen easily understandable SQL queries. If not, you can find many examples in the 2009 paper BibTeX meets relational databases.

    Quite often in bibliographic data searches, you first want to identify desired entries by matching on a particular field, such as the author, title, and journal values, and displaying just those parts that are of interest. Once you have identified particular documents, you can then pull them out of the database with subsequent SQL commands, or if you have only a few BibTeX files, just use a text editor, or the bibextract tool, to find the entries that you want.

    For example, you might want to find publications by a particular author, say, Max Planck, published in 1920 or 1921:

    sqlite> select filename, label, substr(title, 1, 60) from bibtab
                where (   (author like '%Max Planck%')
                       or (author like '%M. Planck%')
                       or (editor like '%Max Planck%')
                       or (editor like '%M. Planck%') )
                and (year between '1920' and '1921')
                order by filename, year, label;
    +----------------+------------------+--------------------------------------------------------------+
    | filename       | label            | substr(title, 1, 60)                                         |
    +----------------+------------------+--------------------------------------------------------------+
    | einstein.bib   | Planck:1920:EBEa | Die Entstehung und bisherige Entwicklung der Quantentheorie. |
    | einstein.bib   | Planck:1920:EBEb | Entstehung und bisherige Entwicklung der Quantentheorie: Nob |
    | einstein.bib   | Planck:1920:GPS  | The Genesis and Present State of Development of the Quantum  |
    | planck-max.bib | Planck:1920:EBEa | Die Entstehung und bisherige Entwicklung der Quantentheorie. |
    | planck-max.bib | Planck:1920:EBEb | Entstehung und bisherige Entwicklung der Quantentheorie: Nob |
    | planck-max.bib | Planck:1920:GPS  | The Genesis and Present State of Development of the Quantum  |
    | planck-max.bib | Planck:1920:WLV  | Das Wesen des Lichts: Vortrag, gehalten in der Hauptversamml |
    | planck-max.bib | Planck:1921:AEC  | Absolute Entropie und chemische Konstante. (German) [Absolut |
    | planck-max.bib | Planck:1921:EAM  | Einfuhrung in die allgemeine Mechanik: zum Gebrauch bei Vort |
    | planck-max.bib | Planck:1921:HPQ  | Henri Poincare und die Quantentheorie. (German) [Henri Poinc |
    | planck-max.bib | Planck:1921:PEE  | Das Prinzip der Erhaltung der Energie. (German) [The princip |
    | planck-max.bib | Planck:1921:TT   | Treatise on Thermodynamics                                   |
    | planck-max.bib | Planck:1921:VTG  | Vorlesungen uber Thermodynamik. (German) [Lectures on thermo |
    | planck-max.bib | Planck:1921:VTW  | Vorlesungen uber die Theorie der Warmestrahlung. (German) [L |
    | planck-max.bib | vonLaue:1921:AMV | Antrittsrede M. v. Laues und Erwiderung von M. Planck. (Germ |
    +----------------+------------------+--------------------------------------------------------------+
    

    Next, you examine that list and decide that you want the entry with label Planck:1920:GPS. You now know the filename in which it is found, so you could easily use a text editor to collect the full entry. However, you can quickly extract it from the database like this:

    sqlite> select entry from bibtab where (label = 'Planck:1920:GPS');
    
    @Misc{Planck:1920:GPS,
      author =       "Max Planck",
      title =        "The Genesis and Present State of Development of the
                     Quantum Theory",
      howpublished = "Nobel Lecture, June 2, 1920",
      day =          "2",
      month =        jun,
      year =         "1920",
      bibdate =      "Sat Aug 22 13:23:45 2015",
      bibsource =    "http://www.math.utah.edu/pub/bibnet/authors/p/planck-max.bib;
                     http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "The Nobel Prize in Physics for 1918 was awarded to Max
                     Karl Ernst Ludwig Planck ``in recognition of the
                     services he rendered to the advancement of Physics by
                     his discovery of energy quanta''.",
      URL =          "http://www.nobelprize.org/nobel_prizes/physics/laureates/1918/planck-lecture.html",
      acknowledgement = ack-nhfb,
      author-dates = "1858--1947",
    }
    

    SQLite3 does not put additional delimiters around the data when only a single database field is requested, so you only need a simple cut-and-paste operation to put that entry in some other BibTeX file.

    For PostgreSQL, you can temporarily suppress any unwanted database field delimiters like this:

    % bibsql -s p
    > \pset format unaligned
    > select entry from bibtab where label = 'Planck:1920:GPS';
    > -- now restore default field-separated behavior
    > \pset format aligned
    

    For MariaDB and MySQL, delimiter suppression can only be requested at client startup time with two options passed through bibsql to the client:

    % bibsql -s m -o '-r -s'
    > select entry from bibtab where label = 'Planck:1920:GPS';
    
  23.   How do I handle missing data in a BibTeX entry?

    SQL databases differentiate empty strings and missing data by giving the latter the special value NULL. SQL queries can distinguish between the two cases with expression fragments like these:

        (fieldname is NULL)      -- NULL, not empty string
        (fieldname = '')         -- empty string, not NULL
    

    The convention in the Utah archives is that an empty string is used when no value is expected for that field. However, when a value is needed, but has not yet been found, then the value is set to two or more question marks, as in this example:

    @Article{Malcolm:1972:ARP,
      author =       "Michael A. Malcolm",
      title =        "Algorithms to Reveal Properties of Floating-Point
                     Arithmetic",
      journal =      j-CACM,
      volume =       "15",
      number =       "??",
      pages =        "949--951",
      month =        "????",
      year =         "1972",
      CODEN =        "????",
      DOI =          "????",
      ISSN =         "????",
      ISSN-L =       "????",
      note =         "",
      remark =       "",
    }
    

    The consecutive question marks in the BibTeX entry serve as a reminder to the file's maintainer that more information must be found to properly complete the entry.

    A few BibTeX styles have been extended to distinguish between those two cases. Both are treated as an empty value for output to the .bbl file. Other styles simply output the question marks as a value, so they appear in the typeset reference list. You can easily check whether there are required, but missing, fields like this:

    % grep '??' *.bbl
    

    In some cases, author or editor data are missing from the publication source. Many ACM and IEEE conference proceedings, for example, credit no editors: instead, unrecognized society staff do the editorial work. In such cases, the Utah archives use field assignments like these:

      author =       "Anonymous",
      editor =       "{ACM}",
      editor =       "{IEEE}",
    
  24.   What is a DOI, and why do I want it?

    The Digital Object Identifier (DOI) system introduced in 2000 is a worldwide service that decouples electronic document addresses from their physical location at a particular Web site. That way, when a publisher or journal is sold, the DOI values remain unchanged, even though the URLs to which they map now reflect the Web sites of the new owner. For more information on DOIs, see online encyclopedias such as the Wikipedia DOI entry, and the site Frequently Asked Questions about the DOI System.

    Many publishers now assign DOIs to their journal articles, and also to their books. For example, Springer-Verlag books often have a URL formed from the book's electronic ISBN-13 value: http://link.springer.com/book/10.1007/978-3-642-23588-7 From that value, you can sometimes deduce a DOI value: http://dx.doi.org/10.1007/978-3-642-23588-7 .

    DOI values always begin 10.dddd/, where dddd is a four-digit value, so look for such patterns in URLs. What follows the prefix is up to the publisher. If you construct a candidate URL or DOI that way, test it in a Web browser to make sure that it can be resolved, before you put it in a BibTeX entry.

    Many publishers are now happy to have author-supplied DOI values in reference lists, because it greatly eases the job of checking that data, and of cross-referencing it. Some of them are even retrofitting DOI values into the electronic forms of already-printed reference lists.

    When publications have unique identifiers, substantial automation of the cross-referencing job is possible. The venerable Science Citation Index (SCI), and the later Social Science Citation Index (SSCI), CrossRef, and and Web of Science organizations all contribute to the indexing and cross-referencing of academic publications.

    Thousands of papers have been written about journal metrics and article metrics, which are attempts to quantify the importance or popularity of journals and publications, and at many institutions, enter into job, performance, and promotion evaluations. The Utah archives include coverage of several journals in that area.

    The DOI agency strongly recommends the inclusion of DOI values in reference lists, and preferably, as the last entry in each list, without following punctuation. Here is an example of what they propose:

    Wang, C.-Y. and J. D. Achenbach 1993. A new method to obtain 3-D
    Green's functions for anisotropic solids. Wave Motion
    18(3):273–289. ISSN-L 0165-2125. doi:10.1016/0165-2125(93)90076-R
    

    For BibTeX use, the Utah archives encode the DOI as a Web address, like this:

      DOI =          "http://dx.doi.org/10.1016/0165-2125(93)90076-R",
    

    That way, humans exposed to the Internet quickly recognize it as a Web address, and some Web browsers do as well, even if it is not properly encoded as a hyperlink. The few BibTeX styles that have been extended to handle DOI data strip off the DOI agency prefix when they generate the data for the reference list. The raw LaTeX that produced our sample DOI-enriched reference list entry looks like this in the .bbl file:

    \bibitem[\protect\citeauthoryear{Wang and Achenbach}{Wang and
      Achenbach}{1993}]{Wang:1993:NMO}
    \bblauthor{Wang, C.-Y.} and \bblauthor{J.~D. Achenbach} \bblyear{1993}.
    \newblock \bbltitle{A new method to obtain {$3$-D} {Green}'s functions for
      anisotropic solids}.
    \newblock {\em \bbljournal{Wave Motion}\/} \bblvolume{18}\penalty0
      (\bblnumber{3}):\penalty0 \bblpages{273--289}.
    \showEXTRA{\showfjournal{\bblfjournal{Wave Motion}}
    \showCODEN{\bblCODEN{WAMOD9}}
    \showISSN{\bblISSN{0165-2125 (print), 1878-433X (electronic)}}
    \showISSNL{\bblISSNL{0165-2125}}
    \showMR {\bblMRreviewer{M. O. Bashele{\u\i}shvili}}
            {\bblMRnumber{1256481 (94m:73033)}}
            {\bblMRclass{73D15 (73B40 73D10)}}
    \showbibdate{\bblbibdate{Wed Nov 4 2015}}
    \ifshowDOI {\newblock {\showDOIshort \url{10.1016/0165-2125(93)90076-R}}}\fi
    }
    

    Few humans would ever care to produce such complex markup by hand, but a computer program can easily do so. Notice that most of the field values are wrapped in macros, and the user can redefine their default values to hide some of the data. For example, in the typeset output given earlier, the full ISSN is dropped in favor of just the ISSN-L value, and the three MathReviews values, and some others, are hidden as well.

  25.   What is an ISBN, and why do I want it?

    Modern book titles are often short, and as a result, ambiguous: consider how many entry-level college books have been called Algebra, Calculus, Economics, Evolutionary Biology, Geometry, Mechanics, or Organic Chemistry. The name collisions are a serious source of confusion for their users, including students, instructors, and librarians.

    To remedy that problem, and prepare for extensive computerization of publisher records, the International Standard Book Number (ISBN) organization was created in 1972, and most books published worldwide since then include an ISBN and associated scanner bar code on their back covers. Books published before 1972, but later reprinted, are often retrofitted with ISBN values. The Utah archives contain entries for books originally published as long ago as 1909 that have been retroactively assigned ISBNs.

    ISBNs are unique handles for each form of a book: if it appears in hardcover, paperback, and electronic editions, then it has three ISBNs.

    Because they are unique (with rare exceptions — 4 duplicate assignments out of about 80,000 in the Utah archives — caused by human error at publishers and the ISBN registration service), ISBNs can be used in most library catalogs for unambiguous lookups, and most booksellers need only the ISBN to order the book for you.

    Many journals publishers have recognized the importance of unique identifiers for documents, because it allows them to check, and, if necessary, correct author-submitted values. Such identifiers also allow publishers to cross-index their data: it is often useful to know who cites whom, because you can then go both forwards and backwards in the literature to follow a line of research.

    The importance of ISBNs for bibliographic databases was recognized at the beginning of the Utah bibliography projects, and more than 80% of the book entries have ISBN data, with BibTeX field assignments like these:

      ISBN =         "0-201-41979-3",
    
      ISBN =         "0-19-968028-0 (hardcover)",
    
      ISBN =         "0-387-97622-1 (New York), 3-540-97622-1 (Berlin)",
    
      ISBN =         "0-395-59472-1 (??invalid checksum??)",
    
      ISBN =         "0-201-32841-0 (set), 0-201-32842-9 (paperback),
                     0-201-31151-8 (disc 1), 0-201-31152-6 (disc 2),
                     0-201-31153-4 (disc 3), 0-201-31154-2 (disc 4),
                     0-201-31155-0 (disc 5), 0-201-31156-9 (disc 6)",
    
      ISBN =         "0-521-84931-4 (hardcover), 0-521-61410-4 (paperback),
                     0-511-41278-9 (e-book), 0-511-41370-X (e-book)",
    

    As the examples show, ISBNs consist of ten decimal digits separated by hyphens into four groups. The last digit may also be X or x, but lettercase is not significant. Although some publishers used space separators instead of hyphens in early books, that practice is now strongly deprecated. Library catalogs and book sellers differ in their search practices: some require the hyphens, while others refuse to accept them.

    The first group is the country or language: English (0 and 1), French (2), German (3), Japanese (4), Russian (5), Chinese (7), and so on up to Ruandan (99977).

    The second group is the publisher number: small numbers correspond to big publishers, and vice versa.

    The third group is the book number within the publisher. When that sequence is exhausted, the publisher applies for a new publisher number. For example, the well-known computer-book publisher O'Reilly and Associates began with a 2-digit book number, and later grew to 3- and 4-digit book numbers. Cambridge University Press and Wiley have 5-digit book numbers, and Elsevier, Harper, McGraw-Hill, Oxford University Press, and Pearson each have 6-digit book numbers.

    The last group is a single digit that is a check digit computed from the first 9 digits. Occasionally, publishers manage to register an ISBN with an incorrect check digit, as shown in one of the examples earlier. That should not happen, because the number should be verified by both the publisher and the ISBN agency, but the Utah archives contain data for about 80 books (out of about 80,000) where the published check digit is incorrect.

    Hyphenation makes ISBNs more readable, and serves to identify the four groups. However, humans cannot reliably hyphenate an ISBN, because there are more than 1700 rules about how the digits are grouped. The Emacs isbn.el library file contains functions for hyphenating ISBNs and validating their check digits; much of its code is derived automatically from data supplied by the ISBN agency.

    Similarly, bibclean has compiled-in rules derived from the same ISBN agency data, but because new data appear from time to time, updates are needed. To avoid the need to rebuild and reinstall the software, bibclean also reads an optional startup file that contains the current ISBN hyphenation rules, once again automatically derived from the ISBN agency data.

    By about the year 2000, there was concern that some publishers would soon run out of assignable ISBNs. As a result, in 2007, the ISBN agency changed from a 10-digit number to a 13-digit number based on the European Article Number system. The new 13-digit values get a new prefix of 978-, and a different checksum algorithm is used to produce the final digit. Otherwise, the other three fields remain the same, and software can convert between the 10- and 13-digit forms. However, when a publisher has exhausted its 10-digit assignment, it moves to a new 979- prefix group for which there is no 10-digit form. In late 2015, only one book in the Utah archives has such a prefix.

    Here are some examples of 13-digit ISBN values in BibTeX entries, all of them companions to the previous 10-digit examples:

      ISBN-13 =      "978-0-201-41979-5",
    
      ISBN-13 =      "978-0-19-968028-3 (hardcover)",
    
      ISBN-13 =      "978-0-387-97622-8 (New York), 978-3-540-97622-6
                     (Berlin)",
    
      ISBN-13 =      "978-0-395-59472-8 (??invalid checksum??)",
    
      ISBN-13 =      "978-0-201-32841-7 (set), 978-0-201-32842-4
                     (paperback), 978-0-201-31151-8 (disc 1),
                     978-0-201-31152-5 (disc 2), 978-0-201-31153-2 (disc 3),
                     978-0-201-31154-9 (disc 4), 978-0-201-31155-6 (disc 5),
                     978-0-201-31156-3 (disc 6)",
    
      ISBN-13 =      "978-0-521-84931-9 (hardcover), 978-0-521-61410-8
                     (paperback), 978-0-511-41278-3 (e-book),
                     978-0-511-41370-4 (e-book)",
    

    The biborder tool has an option to generate the 13-digit forms, and ISBN and ISBN-13 checksums are two of many things that bibclean validates. That validation has often discovered errors in book-review titles that include ISBNs: humans are unreliable sources of data!

    A few BibTeX styles recognize field names ISBN and ISBN-13, and include the ISBN values in the output reference list. Many publishers encourage that practice, and more work needs to be done to extend the remaining style files to handle such values.

  26.   What are CODENs and ISSNs, and why do I want them?

    Just as ISBNs serve as unique identifiers for books, periodicals too need unique codes.

    The CODEN system was introduced in 1953, originally with four alphanumeric characters, then expanded to five, and later, with the addition of a check digit, to six. The Chemical Abstracts Service Source Index (CASSI) is the registry and lookup service for CODEN values. Most journals in the physical sciences have such values, and some in mathematics do as well. Here are a few examples:

      CODEN =        "AALEE5",
    
      CODEN =        "ACMSCU",
    
      CODEN =        "ACSYEC",
    
      CODEN =        "VLDBFR",
    
      CODEN =        "WETEFA",
    
      CODEN =        "XRESEA",
    

    CASSI can lookup publications by title, ISBN, ISSN, and CODEN, and for the journals that it covers, it can provide rather useful data. Here is an example of its output for one of the most prestigious journals in physics:

    Entry Type                      Changed Title Serial
    Title                           Physical Review
    Abbreviated Title               Phys. Rev.
    CODEN                           PHRVAO
    ISSN                            0031-899X
    Former Title(s)                 Physical Review [Section] A
                                    Physical Review [Section] B
    Language of Text                English
    Summaries In                    English
    History                         v1 July 1893-s2 v132 Dec. 15, 1963;
                                    s2 v141 n1 Jan. 1966-s2 v188 n5
                                    Dec. 25, 1969
    Successor Title Note            Divided into
    Successor Title(s)              Physical Review A: Atomic, Molecular,
                                    and Optical Physics
                                    Physical Review B: Solid State
                                    Physical Review C: Nuclear Physics
                                    Physical Review D: Particles and Fields
    Alternate Title(s)              Proceedings of the American Physical Society
    Abbreviated Alternate Title(s)  Proc. Am. Phys. Soc.
    

    The International Standard Serial Number (ISSN) system was proposed in 1971 and adopted in 1975. Most current large periodicals throughout the world now have ISSN values, and so do many defunct ones. As with ISBNs, an ISSN is unique to a publication form, so many journals have two, for print and electronic editions. BibTeX entries in the Utah archives contain field assignments like these:

      ISSN =         "0898-901X",
    
      ISSN =         "0891-6837",
    
      ISSN =         "0018-9235 (print), 1939-9340 (electronic)",
      ISSN-L =       "0018-9235",
    
      ISSN =         "1932-3243 (??invalid checksum??)",
    

    As with ISBNs, ISSNs contain only decimal digits, except that the final check digit may also be X or x, (lettercase is not significant). Mistakes are sometimes made in ISSN assignments, as the last example shows.

    As multiple ISSNs for the same periodical became increasingly common, in 2007 the ISSN agency introduced a new number, called the linking ISSN, designated ISSN-L. In most cases, it is the same as the print ISSN, and is used to identify the periodical when you do not care about its publication form. In this author's view, the ISSN-L value may be preferred over the possibly longer ISSN value for inclusion in reference list entries. BibTeX styles should be extended to handle both, and give the user a LaTeX option of typesetting either, or both, of them.

    Both CODEN and ISSN values can be used for unambiguous lookups in many catalogs, and their inclusion in reference lists helps to remove confusion when journal names are abbreviated. For example, does J. Chem. Phys. mean Journal of Chemical Physics, or Journal of Chemistry and Physics, or Journal of Chemotherapy and Physiotherapy, or Journal de chemie et physique, or Journal für Chemie und Physik, or Journal für chemische Physik?

    The Utah archives make extensive use of ISSN and ISSN-L data, and the latter are supplied automatically by software that uses mapping tables kindly supplied to us by the ISSN agency.

    The local tool journal can be used to standardize journal names and supply CODEN and ISSN data. It can be used like this:

    % journal old.bib > new.bib
    

    It is a short shell wrapper around an awk program of about 20,000 lines that contains mappings between various names of about 3000 journals and their preferred full names and BibTeX names, as well as between the BibTeX names and CODEN and ISSN data. That program is available on the Web. There is a corresponding local program publisher for providing publisher name and address abbreviations. It too is available on the Web. Both receive frequent updates, so if you download them for use on another site, check back occasionally to get their latest versions.

  27.   What are MRnumbers and ZMnumbers, and why do I want them?

    In the mathematics community, two society databases are of supreme importance:

    Each of them can return BibTeX data for their search results, and each such entry includes a unique internal database number that identifies the publication in their databases, much like the DOI does with more generality. The Utah archives preserve such numbers, as shown in this entry:

    @Article{Einstein:1935:EDE,
      author =       "Albert Einstein",
      title =        "Elementary derivation of the equivalence of mass and
                     energy",
      journal =      j-BULL-AMS,
      volume =       "41",
      number =       "??",
      pages =        "223--230",
      year =         "1935",
      CODEN =        "BAMOAD",
      ISSN =         "0002-9904 (print), 1936-881X (electronic)",
      ISSN-L =       "0002-9904",
      MRnumber =     "61.0852.02",
      bibdate =      "Sat Oct 28 08:28:24 2006",
      bibsource =    "http://www.math.utah.edu/pub/tex/bib/einstein.bib",
      note =         "J. W. Gibbs Lecture to the American Association for
                     the Advancement of Science (AAAS), 28 December 1934.
                     Reprinted in \cite{Einstein:2000:EDE}.",
      URL =          "http://www.ams.org/bull/2000-37-01/S0273-0979-99-00805-8/S0273-0979-99-00805-8.pdf",
      ZMnumber =     "Zbl 0011.28108",
      acknowledgement = ack-nhfb,
      Calaprice-number = "203",
      fjournal =     "Bulletin of the American Mathematical Society",
      journal-URL =  "http://www.ams.org/journals/bull/all_issues.html",
      Whittaker-number = "166",
    }
    

    The MRnumber and and ZMnumber values can take you directly to more data about the publication in those two databases, including possibly reviews and lists of related and referencing publications. You might also find new data, such as DOIs, that have become available since you last updated the BibTeX entry.

    A MathSciNet search result sometimes includes Math Reviews classification codes and reviewer names, as in this fragment:

      MRclass =      "68P20 (Information storage and retrieval); 65F15
                     (Eigenvalues, eigenvectors (numerical linear algebra));
                     65F20 (Overdetermined systems, pseudoinverses
                     (numerical linear algebra))",
    
      MRnumber =     "2124404 (2006a:68026)",
    
      MRreviewer =   "Jean-Marie Chesneaux",
    

    More commonly, the parenthesized descriptions of the classification codes are omitted:

      MRclass =      "65F35 (15A60)",
    
      MRclass =      "65H17 (76D99 76M25)",
    
      MRclass =      "46A30 (47L05 54E50 57N17)",
    
      MRclass =      "46-03 (01A60 46C05 47-03 47A05 47A10)",
    
      MRclass =      "01-08, 65-03, 65F05, 65F35, 65G50, 65M12, 68-03, 65-03
                     (01A60 65F35 68-03)",
    

    The parenthesized codes are considered secondary ones, and the bigger the publication, the longer the list of codes is likely to be.

    The MRclass values act like a precise form of keywords, except that you have to look up the meaning of each five-character code at a Web site where you can also find a downloadable PDF file that documents all of the classifications. Those codes can be used in database searches to find related articles, even when their titles might not suggest any relationship.

    Only a few BibTeX styles recognize the fields described in this section, but their inclusion in reference lists should be encouraged, because they provide additional targets for precision searching of the mathematics literature.

    The two mathematics databases often enhance article titles with bracketed inline cross-references like these:

      title =        "Discussion: {``Tukey's paper after 40 years''
                     [\booktitle{Technometrics} {\bf 48} (2006), no. 3,
                     319--325; MR2248365] by C. Mallows}",
    
      title =        "Erratum: {``Characterization of the Gaussian
                     distribution by lacunae of a sequence of cumulants''
                     [Teor. Veroyatnost. i Primenen. {\bf 33} (1988), no. 4,
                     687--693; MR 90a:60027]}",
    
      title =        "The unreasonable effectiveness of mathematics in the
                     natural sciences {[Comm. Pure Appl. Math. {\bf 13}
                     (1960), 1--14; Zbl 102, 7]}",
    

    We generally preserve such useful additional data, but if the cross-referenced entry is in the same bibliography file, we might instead connect the two entries with a See … note field in each. That way, if the main paper is cited, the reference list will automatically also get entries with comments, corrigenda, discussions, errata, rebuttals, and replies related to that paper. Regrettably, most databases outside the mathematics community do not yet have such relational links, which might even lead you to a later retraction or withdrawal of the paper.

  28.   How do I get a bibliography in each book chapter?

    Most publications that have reference lists need only one, conventionally placed near the end, just before any index or afterword. However, edited books that contain chapters authored by different people, and complex documents such as collected works, handbooks, and encyclopedias, might more usefully have a separate bibliography for each chapter. Current TeX distributions all contain support for that practice in the form of the LaTeX chapterbib style file. It is documented in the file chapterbib.pdf, found, for example, in TeX Live installations in the path texmf-dist/doc/latex/cite/.

  29.   How do I index cited authors and editors?

    Imagine yourself skimming a lengthy bibliography at the back of a fat book, encountering a reference to a publication of interest to you, and then wondering why the author cited that item, and what she had to say about it. With most printed books, the only way to answer that question is to read through the whole book. It would be much better if each reference list entry carried a link back to each of the places in which it was cited, and in addition, if every personal name in the bibliography were indexed. Then, having recalled a single author name, you could easily find every reference to publications by that author that are included in the reference list. In extreme cases, a reference list in a large book could have scores, or even a hundred or more, pages.

    Almost no published books have ever included such indexes, because they were impossible to produce reliably by hand, and because software to generate them by computer is not widely available.

    The author of this document created such software in 1996 for the Prentice–Hall book Case Studies in Mathematical Modeling—Ecology, Physiology, and Cell Biology, and it has been used since for a few more books authored or edited in his Department.

    His authidx package builds on both BibTeX and MakeIndex to do the job with only a few extra commands in the LaTeX file, and a few extra steps in the document's Makefile. Apart from that, no other effort on the part of the document author(s), or book production staff, is needed, and the resulting index is guaranteed to be complete, and reliable. Here is an outline of a book's top-level LaTeX file, with highlighted additions to get such an index:

    \documentclass[twoside]{book}
    
    \usepackage{authidx}
    \usepackage{makeidx}
    \usepackage{tgbonum}
    
    \author{Jane Doe}
    \title{My Collected Works}
    
    \makeauthoreditorindex
    \makeindex
    
    \begin{document}
    
    \pagenumbering{roman}
    
    \maketitle
    
    \include{copyright}
    \include{preface}
    \include{dedication}
    \include{preface}
    
    \tableofcontents
    \listoffigures
    \listoftables
    
    \pagenumbering{arabic}
    
    \include{chap-01}
    \include{chap-02}
    …
    \include{chap-15}
    \include{app-a}
    \include{app-b}
    \include{app-c}
    …
    
    \bibliographystyle{is-plain}
    \bibliography{\jobname}
    
    \begin{normalsize}
        \theauthoreditorhook{%
                              Primary authors are shown in \textsc{Small
                              Caps}, and secondary authors in roman.  Page
                              numbers of citations of primary authors are
                              in \textbf{bold}.
                              \bigskip
                              \label{authidx}%
                              \renewcommand{\indexname}{Author/editor index}%
                              \addcontentsline{toc}{chapter}{\indexname}%
                            }
        \renewcommand{\indexname}{Author/editor index}
        \printauthoreditorindex
    \end{normalsize}
    
    \printindex
    
    \end{document}
    

    The book's almost-generic Makefile contains macros and rules, with highlighted additions for creation of the author/editor index:

    AUTHIDX         = $(AWK) -f authidx.awk
    
    AUTHIDXFLAGS    = ROTATE=1
    
    AWK             = nawk
    
    BIBFILES        = book.bib
    
    BIBINPUTS       = .
    
    BIBTEX          = bibtex
    
    ## Update this value when the book basename changes
    BOOK            = jdcw
    
    MAKEINDEX       = makeindex
    
    MAKEINDEXFLAGS  = -c
    
    MV              = /bin/mv
    
    PDFLATEX        = pdflatex
    
    RM              = /bin/rm -f
    
    SHELL           = /bin/sh
    
    TEXFILE         = copyright.tex preface.tex dedication.tex \
                      preface.tex chap-*.tex app-*.tex
    
    #=======================================================================
    # Targets:
    
    all:    $(BOOK).pdf
    
    author-editor-index:      $(BOOK).ina
    
    back-matter:
            -$(MAKE) bibliography
            -$(MAKE) author-editor-index
            -$(MAKE) index
    
    bibliography:   $(BOOK).bbl
    
    $(BOOK).bbl:       $(BOOK).aux
            $(BIBTEX) book
            $(AUTHIDX) $(AUTHIDXFLAGS) $(BOOK).aei $(BOOK).bbl >$(BOOK).bbl.new
            $(MV) $(BOOK).bbl $(BOOK).bbl.old
            $(MV) $(BOOK).bbl.new $(BOOK).bbl
    
    $(BOOK).ida:       $(BOOK).aei
            BIBINPUTS=$(BIBINPUTS) $(AUTHIDX) $(AUTHIDXFLAGS) $(BOOK).aei >$@
    
    $(BOOK).ina:       $(BOOK).ida
            $(MAKEINDEX) $(MAKEINDEXFLAGS) -s $(BOOK).ist -c -o $@ -t $(BOOK).alg $(BOOK).ida
    
    $(BOOK).ind:       $(BOOK).idx $(BOOK).ist
            $(MV) $(BOOK).idx $(BOOK).idx.old
            $(MAKEINDEX) $(MAKEINDEXFLAGS) -s $(BOOK).ist $(BOOK).idx
    
    ### NB: Because of extensive indexing, and cross-referencing in the
    ### bibliography entries, we need FOUR passes to reach consistency!
    ### You may need to alter the command count if you use this file
    ### for a different book.
    $(BOOK).pdf:       $(BOOK).ltx $(BIBFILES) $(TEXFILES)
            -$(MAKE) one-pass
            -$(MAKE) one-pass
            -$(MAKE) one-pass
            -$(MAKE) one-pass
    
    clean:
            -$(RM) *.ckd *.dw *.i *.o *.ser *~ \#* a.out core core.*
    
    distclean:      clean
            -$(RM) *.aei *.alg *.ina
            -$(RM) *.aux *.bbl *.bbl.new *.bbl.old *.dvi *.ida *.idx
            -$(RM) *.idx.old *.ind *.lof *.lot *.ps *.toc
            -$(RM) $(BOOK).pdf
            -$(RM) $(PROGRAM)
            -$(RM) *.aux *.bbl *.ind *.idx
    
    index:  $(BOOK).ind
    
    one-pass:
            -echo R | $(PDFLATEX) $(BOOK).ltx
            -$(MAKE) back-matter
    

    To compile the complete book, the author only needs to run a single short command:

    % make
    

    During writing and screen proofing, the intermediate steps needed for complete consistency are not important, so the author then just uses this command to perform a single step:

    % make one-pass
    

    Fewer than 30 lines are needed in the top-level LaTeX file and the Makefile to add support for an author/editor index, and except for the single assignment to the BOOK macro, none of them is specific to this book, so they can trivially be replicated verbatim in the files created for the author's future publications.

    For those unfamiliar with the Unix make utility, here is a brief explanation:

    In our sample Makefile, some targets are just convenient aliases that name tasks, rather than files.

    To test changes to a Makefile, use the dry-run option -n:

    % make -n one-pass
    echo R | pdflatex jdcw.ltx
    make back-matter
    make bibliography
    make author-editor-index
    make index
    

    That option shows what would be done, but does not execute the displayed commands.

    Preparation of the author/editor index involves running the authidx.awk program to augment the BibTeX-generated .bbl with a list of page numbers in each entry that record where in the book that entry is cited, and using the makeindex program to convert raw index entries created in the LaTeX step to a sorted author/editor index file. The hook macro in the LaTeX file allows insertion of some explanatory material at the start of the author/editor index.

  30.   How do I convert library catalog data to BibTeX entries?

    Although we identified numerous databases elsewhere in this document that can produce search results in BibTeX form, few library catalogs can do so.

    Two particularly useful catalogs are those of the US Library of Congress (believed to be the world's largest library), and the KVK — Karlsruhe Virtual Catalog. The latter lets you select any of about 100 national catalogs, and then searches them in parallel. The author of this document has found it extremely useful for finding publications in languages other than English, conference proceedings, and recent books. Those kinds of documents are poorly covered by the Library of Congress.

    There is much more variation in library catalog interfaces than in publisher database Web interfaces, so it would be a daunting task to write software that could convert catalog-specific HTML pages to BibTeX.

    Fortunately, there is a better solution, via the Z39.50 binary catalog protocol that was introduced in the 1970s, and is used by many libraries around the world to exchange catalog data.

    This author's cattobib tool, first introduced in 2005, leverages several widely available, and reasonably portable, programs and libraries, including expect tcl, and yaz-client, to provide an easy-to-use command-line tool to lookup catalog data by ISBN (the default), CODEN, ISSN, author, editor, or title. Here is an example of its use for two searches:

    % cattobib 0-596-00595-4
    %%% -*-BibTeX-*-
    
    %% Searching [z3950.loc.gov:7090/Voyager] for [0596005954]: flags = [@attr 1=7]
    
    @Book{Robbins:2005:CSS,
      author =       "Arnold Robbins and Nelson H. F. Beebe",
      title =        "Classic shell scripting",
      publisher =    "O'Reilly",
      address =      "Sebastopol, CA",
      pages =        "xxii + 534",
      year =         "2005",
      ISBN =         "0-596-00595-4",
      ISBN-13 =      "978-0-596-00595-5",
      LCCN =         "QA76.76.O63 R563 2005",
      bibdate =      "Thu Dec 17 13:41:12 MST 2015",
      bibsource =    "z3950.loc.gov:7090/Voyager",
      URL =          "http://www.loc.gov/catdir/enhancements/fy0715/2005296240-d.html;
                     http://www.loc.gov/catdir/enhancements/fy0912/2005296240-b.html;
                     http://www.loc.gov/catdir/enhancements/fy1001/2005296240-t.html",
      acknowledgement = ack-nhfb,
      subject =      "Operating systems (Computers); Programming languages
                     (Electronic computers)",
      usmarc-016 =   "016 7 \$a 013209231 \$2 Uk",
    }
    
    % cattobib -s melvyl -ti 'Companion to the papers '
    %%% -*-BibTeX-*-
    
    %% Searching [melvyl.cdlib.org:210/CDL90] for [Companion to the papers]: flags = [@attr 1=4]
    
    @Book{Knuth:2011:CPD,
      author =       "Donald Ervin Knuth",
      title =        "Companion to the papers of Donald Knuth",
      volume =       "202",
      publisher =    "CSLI Publications",
      address =      "Stanford, CA, USA",
      pages =        "xiii + 441",
      year =         "2011",
      ISBN =         "1-57586-635-8 (hardcover), 1-57586-634-X (paperback)",
      ISBN-13 =      "978-1-57586-635-2 (hardcover), 978-1-57586-634-5
                     (paperback)",
      LCCN =         "QA76.9.A43 K5852 2011",
      bibdate =      "Thu Dec 17 13:46:22 MST 2015",
      bibsource =    "prodorbis.library.yale.edu:7090/voyager",
      series =       "CSLI lecture notes",
      acknowledgement = ack-nhfb,
      author-dates = "1938--",
      subject =      "Computer algorithms; Computer programming",
      usmarc-016 =   "016 7 \$a 016015654 \$2 Uk",
      usmarc-049 =   "049 \$a YUSS",
      usmarc-245-c = "Donald Knuth",
      usmarc-830 =   "830 0 \$a CSLI lecture notes ; \$v no. 202.",
    }
    

    cattobib currently understands three different catalog-data markup systems, and so far, they seem to suffice for the more than 90 major libraries that it knows about.

    cattobib attempts to convert as much catalog data as possible to BibTeX form, even if you might not be interested in all of it. The various usmarc-nnn fields correspond the horrid numbered fields of the US MARC catalog markup system whose documentation you can find by following the link in this sentence. Dozens of those numbered fields are recognized and converted to BibTeX equivalents, and the remainder are output as shown. In most cases, they can simply be deleted, but sometimes, they contain additional useful information.

    A decade of extensive experience with cattobib solidly demonstrates that library catalog data are likely to be dirty, inconsistent, unreliable, and frequently wrong. There are only two reasonable solutions to those problems: either have the original publication at hand, and make any needed corrections to the returned BibTeX entries, or search for the same publication in several catalogs, and then merge the results, with the help of bibjoin. Majority vote can then make field values somewhat more reliable.

    cattobib has options to search in just the major national library catalogs that it knows about, or in all of its known catalogs. You can also easily customize it by adding your own list of catalogs, and groups of catalogs, in a personal startup file. For one-time use, you can supply alternate catalogs on the command line.

    This author has an as-yet-unreleased parallelized version of cattobib: it runs one lookup per catalog simultaneously, collecting output in catalog-specific temporary files. It then waits for the last of them to complete, and concatenates the temporary files onto the standard output. That works quite well in our environment on our many large multicore servers, but it is likely to be paralyzing, rather than parallelizing, on a small personal machine with insufficient resources.

    There are three good resources for finding additional Z39.50 servers that can be used with cattobib:

    Once a reasonably clean BibTeX entry for a book has been produced, you can often find the first few pages of the book via ISBN searches at the Web sites of various large online booksellers, and then make any final tweaks from actual views of the book's cover and front matter. If your catalog searches produce a book DOI, you may be able to find its front matter at the original publisher's Web site, which should be the most reliable.

    URL data returned from catalogs are often not to the book's own Web site, but rather to other catalogs, booksellers, and databases; such URLs are unlikely to be of use in a professional reference list entry.

  31.   What other bibliography tools can I use?

    In late 2015, we added two more command-line tools for BibTeX data lookup: doi-to-bib, which retrieves data from the DOI agency, and scholar, which gets its results from the Google Scholar service. Here are examples of their use:

    # specify any number of DOIs, with or without http://dx.doi.org/
    % doi-to-bib 10.1145/2688497
    
    %% DOI: 10.1145/2688497
    
    @article{Haigh_2014,
            doi = {10.1145/2688497},
            url = {http://dx.doi.org/10.1145/2688497},
            year = 2014,
            month = {dec},
            publisher = {Association for Computing Machinery ({ACM})},
            volume = {58},
            number = {1},
            pages = {40--44},
            author = {Thomas Haigh},
            title = {The tears of Donald Knuth},
            journal = {Communications of the {ACM}}
    }
    
    % scholar --count=2 --bibtex 'The tears of Donald Knuth'
    @article{saidtears,
      title={The Tears of Donald Knuth},
      author={Said, What Knuth}
    }
    
    @article{haigh2014tears,
      title={The Tears of Donald Knuth},
      author={Haigh, Thomas},
      journal={Communications of the ACM},
      volume={58},
      number={1},
      pages={40--44},
      year={2014},
      publisher={ACM}
    }
    
    % scholar --count=2 'The tears of Donald Knuth'
             Title The Tears of Donald Knuth
               URL http://youryblog.wordpress.com/
         Citations 0
          Versions 2
    Citations list None
     Versions list http://scholar.google.com/scholar?cluster=7261672421647269026&hl=en&num=2&as_sdt=0,45
              Year None
      Citation URL http://scholar.google.com/scholar.bib?q=info:ojyJ75ekxmQJ:scholar.google.com/&output=citation&hl=en&ct=citation&cd=0
               PDF None
    
             Title The Tears of Donald Knuth
               URL http://dl.acm.org/citation.cfm?id=2688497
         Citations 0
          Versions 7
    Citations list None
     Versions list http://scholar.google.com/scholar?cluster=176440648728687867&hl=en&num=2&as_sdt=0,45
              Year 2014
      Citation URL http://scholar.google.com/scholar.bib?q=info:-7Rpz9XXcgIJ:scholar.google.com/&output=citation&hl=en&ct=citation&cd=1
               PDF None
    

    The doi-to-bib tool does effectively the same job as the doi2bib Web service, but has the great advantage over that service that its use can be automated, and that it can handle any number of DOIs on the command line, with or without the address prefix.

    Notice that scholar returns data in two forms, and the BibTeX form has less information than the field/value form provides. You may therefore need to do two lookups, and some editing, to accumulate all of the data that you want.

    The quality of data returned by those two tools is often rather poor: filled with errors, and incomplete. You really do need to collect and merge data from multiple sources before you can have some confidence that your BibTeX entries are reliable.

    There is another serious, and possibly unexpected, problem with the two tools: if you make `too many' requests with them, your Internet address gets blocked for hours or days by the remote site, on the probable grounds that you are a `hostile attacker'.

    One solution to that problem is to wrap those commands inside another script that only invokes them at long random intervals. A wrapper script something like this does the trick:

    #! /bin/ksh -
    for f in "$@"
    do
        echo ======================================== "$@"
        scholar "$@"
        seconds=`expr \( 600 \* $RANDOM \) / 32768`
        echo Sleeping for $seconds seconds ...
        sleep $seconds
    done
    

    That script introduces a random delay between 0 and 10 minutes per lookup: shorter delays lead to address blocks.

    Despite the problems of dirty data from many Web sources, the tools briefly described in this document can make your bibliographic work easier, and if you are willing to spend the time to improve your BibTeX entries, and even better, share them freely with others, then you have made things a tiny bit better for humanity.

    Other tools that you may find useful in working with BibTeX data are chkdelim (for delimiter-balance checks), dw (doubled-word checks), bibdup, biblabel, biblex, bibsplit, bibunlex, citefind, citesub, citetags, and html-pretty. All of them are documented in Unix manual pages, and many of them recognize --help and --version options for brief reminders.

    If you are not already an Emacs editor user, you are well advised to take the time to learn it. Having an extensible powerful programmable editor under your fingertips, and that runs on all major desktop operating systems, means that you will likely never need to be proficient with any other text editor, and you will find that your work can be made much easier with the power of Emacs. That editor has many work-alikes without programmability, including jed, jove, qe, qemacs, xjed, and zile. They let your fingers work mostly like they do in Emacs, but they start up faster.

  32.   Why should I use quotes instead of braces around BibTeX field values?

    If you have read through this document, you may have observed that all of the sample data from the Utah archives delimit field values with quotation marks, while most of the publisher-provided BibTeX data use braces.

    The 1980-vintage Scribe document formatting system was the first to introduce high-level markup for document entry, and both LaTeX and BibTeX built on its ideas. Scribe permitted strings to be delimited by angles, braces, parentheses, and brackets, but BibTeX's author decided to accept for backward compatibility just Scribe's brace delimiters, which seemed to be what most people then used. The preferred delimiter for BibTeX field values, however, is ASCII quotation marks, and the bibclean tool intentionally produces output only with those delimiters. That way, later tools can rely on having only a single standard form to deal with.

    In 1993, when BibTeX's grammar was formalized, it appeared that the way bibliography tools would be developed would be to first parse BibTeX and Scribe data with biblex, then have the tool read the lexed output stream, and finally, send a possibly modified data stream further to bibunlex to recover a normal BibTeX file. Unix pipelines make that a trivial command:

    % biblex old.bib | mytool | bibunlex > new.bib
    

    Since then, it has generally been sufficient to use pattern matching on prettyprinted BibTeX data directly, using programs written in a suitable scripting language, for which the easy-to-learn awk language has been a resoundingly successful choice. Cleaning up, and standardizing, BibTeX data with bibclean before attempting to do anything else to it has been, in hindsight, exactly the right choice. bibclean is a complex piece of software, with more than 16,000 lines of code in late 2015, but most of the other tools are much smaller, because they only have to deal with one form of BibTeX data. Also, and importantly, most of those later tools use regular-expression pattern matching to recognize the data that they need to act on. Braces are already heavily used in (La)TeX markup, for grouping, for macro arguments, and for BibTeX downcase protection. Regular expressions cannot count, and using braces for yet another purpose in BibTeX files means that it can be extremely difficult to write reliable match patterns. By contrast, ASCII quotation marks are used in (La)TeX only for accents, and in verbatim mode, so they are rarely encountered inside BibTeX field values, making match patters much simpler.

  33.   How do I handle unusual personal names and titles in BibTeX field values?

    Like TeX, BibTeX minimizes markup to make the data entry job easier. One of the ways that it does so is the handling of personal names in author and editor field values. Names of people are separated by the word and in the values, and usually appear in their normal order:

      author =       "Jane Doe and Joe Blow and Jemma Clemmens",
    

    Complexity arises when the personal or family names have multiple parts:

      author =       "Ludwig von Beethoven",
    
      author =       "Manuela dos Santos",
    
      author =       "Peter van den Besselaar",
    
      author =       "Hans Christian Andersen",
    

    Once again, BibTeX applies a simple rule: a full name consists of two or three parts: a possibly multiword personal name, an optional lowercase von-like part, and a final single-word family name. Without bracing, a von-like part is treated as part of the family name. Thus, in a bibliography style that abbreviates personal names, those examples would correctly reduce to L. von Beethoven, M. dos Santos, P. van den Besselaar, and H. C. Andersen, or if name inversion is called for by the style, to von Beethoven, L., dos Santos, M., van den Besselaar, P., and Andersen, H. C..

    Unfortunately, the lowercased von-like parts are sometimes capitalized, so you need to provide suitable protective bracing, as in these examples:

      author =       "Manuela {Dos Santos}",
    
      author =       "Mark G. J. {Van Den Brand}",
    
      author =       "Francisco M. {De La Vega}",
    
      author =       "William Atherton {Du Puy}",
    
      author =       "Friedhelm {Meyer auf der Heide}",
    
      author =       "Padraig {{\'O} Cath{\'a}in}",
    

    The Dutch word Van (meaning of) is common in Dutch, Flemish, and English family names. However, it also appears as a personal name, and in other languages, notably, in Vietnamese, where the meaning is obviously different:

      author =       "Nguyen Van Loi",
    
      author =       "Van Bang Le",
    
      author =       "Van Jacobson",
    
      author =       "Van R. Kane",
    
      author =       "W. Van Snyder",
    

    In none of those cases is bracing required.

    There is sufficient complexity that one of the Emacs functions developed for the Utah archives work, show-braceable-authors, finds author or editor names that might need bracing. Human intelligence is needed to decide whether to accept or reject its suggestions.

    Human names are more complex, however, notably for Hungarian, Portuguese, Spanish, and several oriental languages, and also for the nobility of Europe.

    Hungarian and oriental languages place the family name first, but such names are often, but not always, inverted outside their country of origin, and the name words may be altered. If you wish to preserve the original order, code entries like these:

      author =       "L{\'a}nczos{ }Korn{\'e}l",
    
      author =       "Erd{\H{o}}s{ }P{\'a}l",
    
      author =       "{K'ung Fu-tzu}",
    
      author =       "{Mao Tse-Tung}",
    

    Alternatively, you could reorder and revise them like these:

      author =       "Cornelius L{\'a}nczos",
    
      author =       "Paul Erd{\H{o}}s",
    
      author =       "Fu-tzu K'ung",
    
      author =       "Confucius",
    
      author =       "Tse-Tung Mao",
    

    Despite its large population, China has only about 300 common family names, whereas United States Census 2000 data record more than 150,000 family names used in the USA. It is common practice with Chinese names for an individual to have two less-common personal names, as in Mao Tse-Tung, which requires just three Chinese characters. However, when such names are used outside the Orient, they may be altered to place the family name last, and the personal names differ in hyphenation, capitalization, and abbreviation, so that Tse-Tung, Tse Tung, Tse-tung, T.-T., and T.-t. might all be found in author names. The 1 + 2 and 2 + 1 forms of such Chinese names are a clue as to which is the family name.

    Some Chinese, however, have only a single personal name, so with a common family name, as in Wang Min, you might guess that Wang is the family name. However, such guesses are frequently wrong, and that is why we really do need additional markup to identify family names, because they are used in sorting of reference lists.

    The Spanish tradition is that many people carry the family names of both father and mother, in that order, but the mother's name can be abbreviated, or even dropped. Braces keep them together:

      author =       "Maria {Garc{\'\i}a Romero}",
    
      author =       "Maria {Garc{\'\i}a R.}",
    
      author =       "Maria Garc{\'\i}a",
    

    The same can be done for noble family names and compound family names, but braces are not needed if the name is hyphenated:

      author =       "James {Clerk Maxwell}",
    
      author =       "Robert {Graf von Westheim}",
    
      author =       "Magnus Gustav Mittag-Leffler",
    

    The lack of clear marking of compound names as family names in databases means that you may have to search for each word individually. For more on this problem, see the articles Author name disambiguation in scientific collaboration and mobility cases and Technical report: the trend of author compound names and its implications for authorship identity identification. Several others can be found in the Utah archives by a search like this:

    > select filename, label, substr(title,1,60) from bibtab
            where (title like '%name disambiguation%')
            order by filename, year, label;
     +------------------------+--------------------+--------------------------------------------------------------+
    | filename               | label              | substr(title,1,60)                                           |
    +------------------------+--------------------+--------------------------------------------------------------+
    | jaist.bib              | Ferreira:2014:STA  | Self-training author name disambiguation for information sca |
    | jaist.bib              | Liu:2014:AND       | Author name disambiguation for PubMed                        |
    | jaist.bib              | Liu:2015:FMB       | A fast method based on multiple clustering for name disambig |
    | jasist.bib             | Torvik:2005:PSM    | A probabilistic similarity metric for Medline records: a mod |
    | jasist.bib             | Cota:2010:UHB      | An unsupervised heuristic-based hierarchical method for name |
    | jasist.bib             | DAngelo:2011:HAA   | A heuristic approach to author name disambiguation in biblio |
    | jasist.bib             | Strotmann:2012:AND | Author name disambiguation: What difference does it make in  |
    | jdiq.bib               | Fan:2011:GBN       | On Graph-Based Name Disambiguation                           |
    | jinformetrics.bib      | Milojevic:2013:ASI | Accuracy of simple, initials-based methods for author name d |
    | lncs2012l.bib          | Wang:2012:ULT      | Using Lexical and Thematic Knowledge for Name Disambiguation |
    | scientometrics2010.bib | Tang:2010:BFN      | Bibliometric fingerprints: name disambiguation based on appr |
    | scientometrics2010.bib | Wang:2012:BTM      | A boosted-trees method for name disambiguation               |
    | scientometrics2010.bib | Wu:2013:AND        | Author name disambiguation in scientific collaboration and m |
    | scientometrics2010.bib | Huang:2014:IND     | Institution name disambiguation for research assessment      |
    | scientometrics2010.bib | Shin:2014:AND      | Author name disambiguation using a graph model with node spl |
    | scientometrics2010.bib | Zhu:2014:RHN       | Robust hybrid name disambiguation framework for large databa |
    | sigmod.bib             | Ferreira:2012:BSA  | A brief survey of automatic methods for author name disambig |
    | tcbb.bib               | Hsiao:2014:GND     | Gene name disambiguation using multi-scope species detection |
    | tkdd.bib               | Torvik:2009:AND    | Author name disambiguation in MEDLINE                        |
    +------------------------+--------------------+--------------------------------------------------------------+
    

    Titles that precede the personal name pose more of a problem, because abbreviated styles might incorrectly reduce them to an initial letter and a dot. It has been proposed to BibTeX's author that a good way to deal with that problem is to revise BibTeX to never abbreviate a braced word. In anticipation of such a change, the Utah archives therefore contain entries like these:

      author =       "{Baron} P. M. S. (Patrick Maynard Stuart) Blackett",
    
      author =       "John {Bourchier, 1st Earl of Bath}",
    
      author =       "{Earl} John Bourchier",
    
      author =       "Earl James Jones",
    
      author =       "James Earl Jones",
    
      author =       "{Lady Mary} Heath",
    
      author =       "{Lord} Kelvin",
    
      author =       "{Prince Louis} de Broglie",
    
      author =       "{Queen Juliana of The Netherlands}",
    
      author =       "{Sir Paul} McCartney",
    
      author =       "{Viscount} Herbert Louis Samuel and Albert Einstein",
    

    Notice the bracing: Lady, Prince, Sir, Queen and Sir are attached to the personal name, and Baron, Lord and Viscount to the family name. The words of noble titles can also appear as ordinary names, as two examples for Earl show.

    In some countries and regions, notably, South India, many people have only a single name. BibTeX treats a one-word name as a family name, and no special handling is required. However, to prevent later confusion, it may be useful to add a remark to the entry about that name:

      author =       "Aurum",
      remark =       "Yes, the author has only one name.",
    

    Rarely, names have only a single letter, and should not later be `repaired' by adding a dot after that letter, and possibly inverting the name. Here are some examples:

      author =       "Dong-U Lee",
    
      author =       "Sen P",
    
      author =       "Suil O",
    
      author =       "U Jin Choi",
    
      author =       "Wienan E",
    

    For names with suffixes and academic degrees, the Utah archives reject the advice in early BibTeX documentation about reversing such names, and instead, bind the suffixes to the name with outer braces:

      author =       "August {Ziggelaar, S.J.}",
    
      author =       "Ronald A. {Sarno, S.J., A.B., M.A.}",
    
      author =       "Philip M. {Lewis, II}",
    
      author =       "Tucker {Carrington, Jr.} ",
    
      author =       "Michael A. {Callander, Sr.}",
    

    In styles that reverse names in output reference lists, that means that one of those examples might appear as Carrington, Jr., T. instead of Carrington, T., Jr.. However, this author's view is that such a difference is a trivial one that few people would ever notice, and no one should raise concerns about it.

    Modern practice in English-language literature in the sciences is to omit professional titles and academic degrees from author names. However, European practices differ, and you might encounter fields like these in some journals, notably, those in the history of science, and particularly, the journal Annals of Science which used such embellishments from 1936 to 1974, when the practice was discontinued.

      author =       "M. B. {Donald, M.Sc., F.I.C., M.I.Chem.E.}",
    
      author =       "M. J. J. {Laboulle, B.A.} and H. {Levy, M.A., D.Sc.}",
    
      author =       "G. R. {de Beer, M.A., D.Sc., F.R.S., F.L.S., F.Z.S.}",
    
      author =       "R. C. {Oldfield, M.A.} and {Lady Kathleen} {Oldfield,
                     M.A.}",
    
      author =       "R. {Michel Dipl.-Math.} and {Prof. Dr.} J. Pfanzagl",
    
      author =       "{Prof. Dr.} W. Eberl and {Dr. techn.} R. {Hafner
                     Dipl.-Ing.}",
    
      author =       "{Prof. Dr. Dr.} Gundolf Keil",
    
      author =       "{Hofrat Prof. Dr. Dr. h. c.} W. Winkler",
    
      author =       "{Prof. Dr. Dr. Medizin und Sozialer} Wandel",
    
      author =       "{Sir} Philip J. {Hartog, K.B.E., C.I.E., Ll.D.}",
    

    When such decorations are rarely used in a bibliography of a particular journal, the Utah archives tend to eliminate them. However, in some journals, they are so common, and reflect long-standing historical practice in some fields, that they are preserved.

    Some databases display authors and/or titles in UPPERCASE LETTERS. Such deplorable practices must be repaired in creating BibTeX entries. When a name may be spelled in mixed case, uppercasing introduces an ambiguity that must be resolved by finding other instances of the name (the reference list in the paper is a good place to start, as is any author affiliation in a footnote or at the end of the paper): does MCKAY correspond to McKay or to Mckay? Both forms are in common use.

    In Scandinavia, son-of family names like Andersen, Andersson, Hansen, Jensen, Jansson, and Jonsson are extremely common. In Aarhus, Denmark, a city of 250,000, where the author of this document used to live, there were 11 pages of Jensens in the directory, and an entire column for Jens Jensen! Consequently, telephone directories often include occupations and addresses to reduce confusion. For the same reason, many Scandinavians are given a distinctive middle name to set them apart. Former Danish Prime Minister Anders Fogh Rasmussen is an example: unlike the case of Spanish names, Fogh is not part of the family name, even though colleagues might refer to him as Fogh Rasmussen (different from all of the other Rasmussens in the office), and close friends and family members would call him Anders. The correct BibTeX encoding of such author names does not need braces:

      author =       "Per Brinch Hanson",
    
      author =       "Svend Knak Jensen",       % called Knak by his friends
    
      author =       "Anders Fogh Rasmussen",
    

    The famous Hungarian polymath, Neumann Janós adopted the German form, Johann von Neumann, when his father was awarded a noble title, then changed it to John von Neumann when he moved to the USA. His brother Miklós became Michael Neumann in the USA, and his other brother became Nicholas Vonneuman (with one final n). Human names can sometimes be a mess!

    Sometimes, a publication has many authors. Try, if possible, to include them all, instead of using the BibTeX convention of a final name of others. In extreme cases, you might have to shorten the list, as in these examples:

      author =       "B. Abbott and 373 others",
    
      author =       "David R. Bentley and Shankar Balasubramanian and
                     Harold P. Swerdlow and Geoffrey P. Smith and John
                     Milton and Clive G. Brown and Kevin P. Hall and Dirk J.
                     Evers and Colin L. Barnes and Helen R. Bignell and 183
                     others",
    

    Their full BibTeX entries are augmented by a fullauthor field that lists all 374 and 193 people. No BibTeX style knows about that field name, but at least the names are recorded, and can be found by searches.

    The point is that, while the bibliography style used to produce a reference-list entry might elide many author names under et al., the BibTeX data should strive to be complete. After all, someone later might search for one of the omitted authors, and a BibTeX entry found on the Web would be just what they need.

    Eventually, BibTeX must be extended to provide more flexible handling of author and editor names without contorted input order.

  34.   How do I know which Jim Smith is the author?

    Many personal names are shared by many people. Jim Smith is such a common English-language name that there is a whole society whose members all have that name. If you are attempting to track the work of a particular researcher with a common name, you may find it difficult to tell whether the publications that you find really do belong to that individual.

    The first thing that you should do is to search for a publication list or curriculum vitae (CV) at that person's Web site, if there is one.

    The American Mathematical Society MathSciNet database and European Mathematical Society zbMATH database both have made attempts at author-name disambiguation.

    At least two organizations, ORCID and ResearcherID, assign unique identifiers to authors. ResearchGate is a social media site for scientists that provides some tracking and notification of citations of members' publications. No one is registered with those services automatically: you have to do so manually. That, of course, means that long-deceased authors will not be found there.

    Other resources for author name disambiguation include Scopus and Web of Science.

  35.   How do I use braces in BibTeX fields?

    We discuss how braces are used for grouping and protection of words in author and editor fields elsewhere in this document.

    Most BibTeX field values do not require any additional bracing beyond that required for (La)TeX markup. However, at least three fields, booktitle, title, and type (of academic degree), are subject to downcasing in most BibTeX styles. Proper nouns and uppercase letters in mathematics must then be protected by surrounding braces, and to avoid interfering with simple text searching, it is recommended that you minimize bracing by protecting the longest groups that need it, as in these examples:

      title =        "Quasi-{Monte Carlo} Numerical Integration on
                     {$\mathbb{R}^s$}: Digital Nets and Worst-Case Error",
    
      title =        "Generalized pulse-spectrum technique for {$2$-D} and
                     $2$-phase history matching",
    
      title =        "{$ A $}-stable parallel block methods for ordinary and
                     integro-differential equations",
    
      title =        "Peaks, plateaus, numerical instabilities in a
                     {Galerkin} minimal residual pair of methods for solving
                     {$ A x = b $}",
    

    In German, and also in Danish before the 1948 orthographic reform, all nouns are capitalized. The best solution in those cases is to brace the entire title, like these:

      title =        "{Die geheimen Leben des Albert Einstein: eine
                     Biographie}. ({German}) [{The} private lives of {Albert
                     Einstein}: a biography]",
    
      title =        "{Einstein: sein Leben}. ({German}) [{Einstein}: His
                     Life]",
    
      title =        "{Studier over Metallernes Elektronteori}. ({Danish})
                     [{Studies} on the electron theory of metals]",
    
      title =        "{Atomernes Bygning og Stoffernes fysiske og kemiske
                     Egenskaber}. ({Danish}) [{Atomic} structure and
                     physical and chemical properties of matter]",
    

    In a modern Danish title, such bracing is not needed:

      title =        "H{\o}jdepunkter i dansk naturvidenskab. ({Danish})
                     [{Highlights} in {Danish} natural science]",
    
  36.   How do I handle lists of URLs?

    The URL field is one where it is sometimes desirable to have multiple values, and the convention adopted in the Utah archives, and some of the associated bibliographic software, is that such values are separated by semicolons. Bibliography styles that handle such fields need to identify individual members, because they require special markup in (La)TeX. Thus, once an element separator is chosen, it should never be changed.

    In most cases, semicolons as separators are not a problem, but one publisher, Wiley, has used them inside DOI and URL values. According to the rules of HTML, they can be hidden by representing them in hexadecimal as %3B. Unfortunately, because URL values need to be parsed by TeX similar to the way verbatim environments and the \verb|…| and \verb*|…| macros are handled, that percent gets in the way because it appears in the argument of another macro, but is treated as a comment start by TeX during the argument scan. The only practical solution is to replace the offending URL entirely, which is easily done with the help of the TinyURL service. For example, one such BibTeX entry was altered like this:

      DOI =          "http://tinyurl.com/p4lnqjf",
    
      xxDOI =        "http://dx.doi.org/10.1002/1096-9128(200005)12:6<423::AID-CPE483>3.0.CO;2-L",
    

    Wiley, and most other publishers, have since recognized the problems of long URLs and special characters in them, and their current practice is to keep them reasonably short, such as in this recent Wiley example:

      DOI =          "http://dx.doi.org/10.1002/qua.25006",
    
  37.   Can I contribute to our local BibTeX archives?

    Yes, with several possibly difficult-for-you conditions:

    1. Your BibTeX data must be freely distributable, either explicitly declared to be public domain, or be licensed with a suitable open-source license that allows others to freely use and redistribute the data.
    2. The data must be clean and conform to the prettyprinted, ordered, and consistently sorted conventions used in the archives, and each file must have a comment header similar in style to other files in the Utah archives.
    3. Entries should have, where available, document Web addresses in DOI and/or URL fields.
    4. Entries must include accurate bibdate fields, because those values are heavily used in automated searching of the archives for new data for author- and subject-specific bibliographies.
    5. The BibTeX data must have been spell checked, delimiter-balance checked, and doubled-word checked, and must pass the scrutiny of both bibcheck and the sanity checks that are routinely applied to the rest of the Utah archives.
    6. Each BibTeX file must be accompanied by a small LaTeX file that can be used to demonstrate that all entries can be typeset without errors or serious warnings from BibTeX or (La)TeX.
    7. If the bibliographic data are for a journal, you must be willing to keep it up to date as new issues appear. Author- and subject-specific bibliographies are permitted to be snapshots of data available to the time of their admission to the archives, and then frozen without further updates or other maintenance.
    8. Where feasible, English translations of foreign-language titles should be supplied, as shown in some of the samples in this document, and in tens of thousands of entries in the Utah archives.
    9. There must be a revision history of your BibTeX files in a system such as RCS (preferred), Bzr, CVS, Git, Mercurial, or Subversion. That history is essential for recovering from an editing disaster that destroys or damages part of the file.

    Decades of work have gone into making the Utah archives clean, reliable, and of high quality compared to most other sources of bibliographic data, and into the preparation of hundreds of software tools to assist in those objectives. Final acceptance of your contributed data therefore remains with the initial creators and maintainers of those archives, and primarily, with the current maintainer, who wrote this document.


Dept Info Outreach College of Science Newsletter

Department of Mathematics
University of Utah
155 South 1400 East, JWB 233
Salt Lake City, Utah 84112-0090
Tel: 801 581 6851, Fax: 801 581 4148
Webmaster


Entire Web                     Only http://www.math.utah.edu/