From @vm.gmd.de:PEB@DM0MPI11.BITNET Fri Jul 10 10:28:01 1992 Flags: 000000000001 Return-Path: <@vm.gmd.de:PEB@DM0MPI11.BITNET> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA19707; Fri, 10 Jul 92 10:27:57 MDT Received: from vm.gmd.de by MATH.AMS.COM (PMDF #2306 ) id <01GM7O1DP2LS9I4H8F@MATH.AMS.COM>; Fri, 10 Jul 1992 11:56:02 EST Received: from DM0MPI11 by vm.gmd.de (IBM VM SMTP V2R2) with BSMTP id 6858; Fri, 10 Jul 92 17:55:09 MET Received: from DM0MPI11 (PEB) by DM0MPI11 (Mailer R2.08) with BSMTP id 8969; Thu, 09 Jul 92 16:58:43 GMT Date: 09 Jul 1992 16:40:48 +0000 (GMT) From: Peter Breitenlohner Subject: announcing TeX--XeT.change To: Tex-implementors@MATH.AMS.COM, tex-info@shsu.edu, uktex@tex.ac.uk, ivritex%taunivm@vm.gmd.de Message-Id: <01GM7O1DWKO29I4H8F@MATH.AMS.COM> Organization: Max-Planck-Institut fuer Physik, Muenchen Content-Transfer-Encoding: 7BIT Recently I have finished tex--xet.change, a system-independent web change file for TeX 3.141 for mixed direction typesetting. There are two files: tex--xet.change and tex--xet.doc I would like to thank all those who have helped in beta testing. Here an ectract from tex--xet.doc: TeX--XeT contains the code necessary for mixed left-to-right and right-to-left typesetting. This code is inspired by but different from TeX-XeT as presented by Donald E. Knuth and Pierre MacKay in TUGboat 8, 14--25, 1987. In order to avoid confusion with TeX-XeT the present implementation of mixed direction typesetting is called TeX--XeT. It differs from TeX-XeT in several important aspects: (1) Right-to-left text is reversed explicitely by the ship-out routine and is written to a normal DVI file without any begin-reflect or end-reflect commands; (2) TeX--XeT produces exactly the same line breaks as TeX when applied to pure left-to-right text, in fact TeX--XeT passes the TRIP test with very few and well understood modifications; (3) therefore TeX--XeT is designed to be used instead of and not in addition to TeX and consequently the pool file name is not changed; (4) as an enhancement over TeX-XeT right-to-left text interrupted by a displayed equation is automatically resumed after that equation. In fact we are using the VM/CMS version of TeX--XeT here in Munich instead of TeX for quite some time (not for mixed direction texts) with no problems and there are already (or will be in the near future) other TeX--XeT implementations. As far as I know the two files are availble at present via anonymous ftp from ftp.uni-stuttgart.de (129.69.1.12) in the directory /soft/tex/tex-sources/tex--xet from Niord.SHSU.edu (192.92.115.8) in the directory [FILESERV.TEX--XET] >From the SHSU fileserver To retrive the package of 2 files, include the command: SENDME TEX--XET in the body of a mail message to FILESERV@SHSU.BITNET (FILESERV@SHSU.edu). >From a SPAN archive (if I only knew what that is) The files are stored in, and can be copied from, 39003::$1$dia1:[tex.tex_source.tex--xet] Peter Breitenlohner From ramsdell@linus.mitre.org Sun Aug 9 05:44:17 1992 Flags: 000000000001 Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA13040; Sun, 9 Aug 92 05:44:13 MDT Return-Path: ramsdell@linus.mitre.org Received: from linus.mitre.org by MATH.AMS.ORG (PMDF #2306 ) id <01GNDB63NCWWAH346E@MATH.AMS.ORG>; Sun, 9 Aug 1992 07:21:12 EST Received: from circe.mitre.org by linus.mitre.org (5.61/RCF-4S) id AA11371; Sun, 9 Aug 92 07:21:03 -0400 Received: by circe.mitre.org (5.61/RCF-4C) id AA03826; Sun, 9 Aug 92 07:21:02 -0400 Date: 09 Aug 1992 07:21:01 -0400 From: "John D. Ramsdell" Subject: LaTeX Version 2.09 <14 January 1992> bug in newtheorem. To: tex-implementors@MATH.AMS.ORG Cc: ramsdell@linus.mitre.org Message-Id: <9208091121.AA03826@circe.mitre.org> Content-Transfer-Encoding: 7BIT Posted-Date: Sun, 09 Aug 92 07:21:01 -0400 The following LaTeX source demonstrates what appears to be a LaTeX bug. John #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # thm.tex # thm.log # This archive created: Sun Aug 9 07:16:30 1992 export PATH; PATH=/bin:$PATH if test -f 'thm.tex' then echo shar: will not over-write existing file "'thm.tex'" else cat << \SHAR_EOF > 'thm.tex' \documentstyle{article} \newtheorem{thm}{Theorem} \newtheorem{dfn}[thm]{Definition} \newtheorem{lmm}[dfn]{Lemma} \begin{document} Test. \end{document} SHAR_EOF fi # end of overwriting check if test -f 'thm.log' then echo shar: will not over-write existing file "'thm.log'" else cat << \SHAR_EOF > 'thm.log' This is TeX, C Version 3.14 (format=lplain 92.8.4) 9 AUG 1992 07:16 **thm (thm.tex LaTeX Version 2.09 <14 January 1992> (/usr/local/lib/tex/inputs/article.sty Standard Document Style `article' <14 Jan 92>. (/usr/local/lib/tex/inputs/art10.sty) \c@part=\count79 \c@section=\count80 \c@subsection=\count81 \c@subsubsection=\count82 \c@paragraph=\count83 \c@subparagraph=\count84 \c@figure=\count85 \c@table=\count86 ) \c@thm=\count87 LaTeX error. See LaTeX manual for explanation. Type H for immediate help. ! No theorem environment `dfn' defined. \@latexerr ...for immediate help.}\errmessage {#1} \@ifundefined ...fx \csname #1\endcsname \relax #2 \else #3\fi l.4 \newtheorem{lmm}[dfn]{Lemma} ? x Here is how much of TeX's memory you used: 159 strings out of 4458 1563 string characters out of 63167 26510 words of memory out of 262141 2124 multiletter control sequences out of 9500 18996 words of font info for 72 fonts, out of 72000 for 255 14 hyphenation exceptions out of 607 12i,0n,15p,129b,17s stack positions out of 300i,40n,60p,3000b,4000s No pages of output. SHAR_EOF fi # end of overwriting check # End of shell archive exit 0 From CET1@phx.cam.ac.uk Sun Aug 9 17:08:12 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA17228; Sun, 9 Aug 92 17:08:07 MDT Received: from snow.csi.cam.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GNDYSN5IVKAH32XL@MATH.AMS.ORG>; Sun, 9 Aug 1992 18:37:33 EST Received: from phx.cam.ac.uk by ppsw1.cam.ac.uk with NIFTP (PP-6.0) as ppsw.cam.ac.uk id <17894-0@ppsw1.cam.ac.uk>; Sun, 9 Aug 1992 23:37:04 +0100 Date: 09 Aug 1992 23:36:57 -0300 (BST) From: Chris Thompson Subject: Re: [LaTeX Version 2.09 <14 January 1992> bug in newtheorem.] In-Reply-To: <9208091121.AA03826@circe.mitre.org> To: tex-implementors@MATH.AMS.ORG Cc: ramsdell@linus.mitre.org, Schoepf@sc.zib-berlin.de Message-Id: Content-Transfer-Encoding: 7BIT John Ramsdell writes > The following LaTeX source demonstrates what appears to be a LaTeX > bug. > \documentstyle{article} > \newtheorem{thm}{Theorem} > \newtheorem{dfn}[thm]{Definition} > \newtheorem{lmm}[dfn]{Lemma} > \begin{document} > Test. > \end{document} > LaTeX error. See LaTeX manual for explanation. > Type H for immediate help. > ! No theorem environment `dfn' defined. > \@latexerr ...for immediate help.}\errmessage {#1} > > \@ifundefined ...fx \csname #1\endcsname \relax #2 > \else #3\fi > l.4 \newtheorem{lmm}[dfn]{Lemma} This input has never worked, in the sense that prior to version <14 Jan 92> an attempt to use \begin{lmm} would generate ! You can't use `\relax' after \advance. \c@dfn as a result of trying to \refstepcounter the non-existent counter 'dfn'. The change (#205 in latex.bug) to diagnose such problems at definition time seems to me to be an improvement. It is true that the LaTeX reference manual, on page 174, doesn't say that the 'numbered_like' environment has to have been defined by the first form of \newtheorem rather than the second. Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From schoepf@sc.ZIB-Berlin.DE Mon Aug 10 06:25:46 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25605; Mon, 10 Aug 92 06:25:43 MDT Received: from sc.ZIB-Berlin.DE by MATH.AMS.ORG (PMDF #2306 ) id <01GNEQXE0HDCAH38GC@MATH.AMS.ORG>; Mon, 10 Aug 1992 08:17:54 EST Received: from dagobert.ZIB-Berlin.DE by sc.ZIB-Berlin.DE (4.0/SMI-4.0-sc/19.6.92) id AA12940; Mon, 10 Aug 92 14:00:29 +0200 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/6.5.92 ) id AA27354; Mon, 10 Aug 92 14:00:26 +0200 Received: by quattro.ZIB-Berlin.DE (4.1/SMI-4.1) id AA18562; Mon, 10 Aug 92 14:00:02 +0200 Date: 10 Aug 1992 14:00:26 +0200 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: Re: LaTeX Version 2.09 <14 January 1992> bug in newtheorem. In-Reply-To: <9208091121.AA03826@circe.mitre.org> To: "John D. Ramsdell" Cc: TeX-implementors@MATH.AMS.ORG Reply-To: Schoepf@sc.ZIB-Berlin.DE Message-Id: <9208101200.AA27354@sc.zib-berlin.dbp.de> Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Content-Transfer-Encoding: 7BIT References: <9208091121.AA03826@circe.mitre.org> "John D. Ramsdell" writes: > The following LaTeX source demonstrates what appears to be a LaTeX > bug. > \documentstyle{article} > \newtheorem{thm}{Theorem} > \newtheorem{dfn}[thm]{Definition} > \newtheorem{lmm}[dfn]{Lemma} > \begin{document} > Test. > \end{document} You can easily get around by writing \newtheorem{lmm}[thm]{Lemma} I will see that the error message is changed. (Actually, Frank has been pestering me for some time now that this error message should complain about a nonexistant counter rather than a nonexistant theorem environment.) I'm not sure whether it is worth the effort to make your input work; I can't see an easy way to get around it t the moment. Rainer Sch"opf LaTeX2.09 maintenance From ramsdell@linus.mitre.org Wed Aug 12 07:24:30 1992 Flags: 000000000001 Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA24165; Wed, 12 Aug 92 07:24:26 MDT Return-Path: ramsdell@linus.mitre.org Received: from linus.mitre.org by MATH.AMS.ORG (PMDF #2306 ) id <01GNHDBMUHKGAH3B7S@MATH.AMS.ORG>; Wed, 12 Aug 1992 05:06:07 EST Received: from celebes.mitre.org by linus.mitre.org (5.61/RCF-4S) id AA11616; Wed, 12 Aug 92 05:05:53 -0400 Received: by celebes.mitre.org (5.61/RCF-4C) id AA20929; Wed, 12 Aug 92 05:05:51 -0400 Date: 12 Aug 1992 05:05:50 -0400 From: "John D. Ramsdell" Subject: Please delete old versions of ImakeTeX. To: TeX-implementors@MATH.AMS.ORG Cc: ramsdell@linus.mitre.org Message-Id: <9208120905.AA20929@celebes.mitre.org> Content-Transfer-Encoding: 7BIT Posted-Date: Wed, 12 Aug 92 05:05:50 -0400 I have recently received two requests for help installing ImakeTeX. In both cases, the senders were attempting to install obsolete versions of ImakeTeX. If you provide ImakeTeX with file names other than imaketex202.tar.Z or imaketex202a.tar.Z, please delete them now. If you provide Labrea source with file names other than labrea9103.tar.Z or labrea9103a.tar.Z, please delete these also. I know old versions of ImakeTeX are available at Labrea.stanford.edu. New versions are available at Dartmouth and UCI. John Subject: Re: Installing TeX is painless In-Reply-To: Your message of "Fri, 29 May 92 13:22:00 -0400." <9205291622.AA13347@merlin.acadiau.ca> Date: Fri, 29 May 92 12:38:29 -0400 From: John D. Ramsdell I am happy to announce two new locations for ImakeTeX. ImakeTeX coos.dartmouth.edu Automated installation of UnixTex. pub/TeX/ImakeTeX/{labrea*,imake*} ImakeTeX ics.uci.edu Automated installation of UnixTex. karl/{labrea*,imake*} ImakeTeX automates much of the installation of UnixTeX. A stable version of ImakeTeX (2.02a) has been released for TeX 3.14 and MF 2.7. The aim of the new version is to eliminate the use of features that are not available in the important Unix implementations. The approach used has been to eliminate any feature that does not appear in Draft 10 of POSIX 1.002. labrea9103a.tar.Z The TeX distribution from Labrea.stanford.edu of March, 1991, augmented with the January 1992 release of LaTeX. imaketex202a.tar.Z (624695 bytes) The ImakeTeX distribution. This makes it relatively easy to install TeX on Suns, and easier to port to other machines. This version updates documentation that was provided in imaketex202.tar.Z. From CHAA006@VAX.RHBNC.AC.UK Wed Dec 16 10:21:17 1992 Flags: 000000000001 Return-Path: Received: from math.ams.org (MATH.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA14460; Wed, 16 Dec 92 10:21:13 MST Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GSDSGSDWYOEHYGQJ@MATH.AMS.ORG>; Wed, 16 Dec 1992 11:58:55 EST Date: 16 Dec 1992 16:22:14 -0300 (BST) From: CHAA006@VAX.RHBNC.AC.UK Subject: TeX V3.141, Section 963. Sender: "JANET CHAA006@UK.AC.RHBNC.VAX" To: tex-implementors Reply-To: Philip Taylor (RHBNC) Message-Id: <2F20E0EE_000761F8.009652E6B463E640$27_1@UK.AC.RHBNC.VAX> Content-Transfer-Encoding: 7BIT Via: uk.ac.rhbnc.vax; Wed, 16 Dec 1992 16:21:09 +0000 Actually-To: Originally-To: NSFNET%"tex-implementors@math.ams.com" Mailer: Janet_Mailshr V3.5 ( 13-OCT-1989 14:07:27 ) In Section 963, Brian Hamilton Kelly has ``while l <= k'', where my sole copy of Volume B, for TeX 2.0, has ``while l < k''; Brian's code blows up shortly beyond this point. Is there any reason to believe that Knuth has changed this code since TeX 2.0? Philip Taylor, RHBNC. From BNB@MATH.AMS.ORG Wed Dec 16 10:42:59 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.ORG (VAX01.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA15449; Wed, 16 Dec 92 10:42:56 MST Received: from MATH.AMS.ORG by MATH.AMS.ORG (PMDF #2306 ) id <01GSDTCSCGS0EGORQW@MATH.AMS.ORG>; Wed, 16 Dec 1992 12:24:35 EST Date: 16 Dec 1992 12:24:29 -0500 (EST) From: bbeeton Subject: Re: TeX V3.141, Section 963. In-Reply-To: <2F20E0EE_000761F8.009652E6B463E640$27_1@UK.AC.RHBNC.VAX> To: P.Taylor@Vax.Rhbnc.Ac.Uk Cc: tex-implementors@MATH.AMS.ORG Message-Id: <724526669.139547.BNB@MATH.AMS.ORG> Content-Transfer-Encoding: 7BIT Mail-System-Version: i've just checked all the errata files. section 963 is on page b400. there's only one change on that page, from 1988, in errata.four; it's in section 961 -- the module name. so i conclude that knuth has not made the change you asked about. -- bb From CET1@phx.cam.ac.uk Thu Dec 17 09:42:06 1992 Flags: 000000000001 Return-Path: Received: from math.ams.org (MATH.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA09829; Thu, 17 Dec 92 09:42:02 MST Received: from gray.csi.cam.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GSF50U72QOEUEBP6@MATH.AMS.ORG>; Thu, 17 Dec 1992 11:09:37 EST Received: from phx.cam.ac.uk by ppsw1.cam.ac.uk with NIFTP (PP-6.0) Cambridge as ppsw.cam.ac.uk id <15047-0@ppsw1.cam.ac.uk>; Thu, 17 Dec 1992 16:08:46 +0000 Date: 17 Dec 1992 16:08:38 +0000 (GMT) From: Chris Thompson Subject: Re: [TeX V3.141, Section 963.] In-Reply-To: <2F20E0EE_000761F8.009652E6B463E640$27_1@UK.AC.RHBNC.VAX> To: tex-implementors Cc: Philip Taylor (RHBNC) Message-Id: Content-Transfer-Encoding: 7BIT Phil Taylor writes: > In Section 963, Brian Hamilton Kelly has ``while l <= k'', where my > sole copy of Volume B, for TeX 2.0, has ``while l < k''; Brian's code > blows up shortly beyond this point. Is there any reason to believe that > Knuth has changed this code since TeX 2.0? This was part of change #360 (tex82.bug numbering), the addition of support for multiple hyphenation tables, incorporated into TeX 2.992 (the TeX 3.0 "beta release"). You should notice several other changes to section 963 from that instance of Volume B: in particular the initialisation of hc[0] to the language code, and the reversal of order of "incr(l)" and "c:=hc[l]", which explain the change of test. Phil has sent me an example of the "blowing up", but as it involves a \hyphenation command and section 963 is concerned solely with \patterns, I rather doubt whether they are related. I will take this matter offline from tex-implementors. Good to see that tex-implementors is useful for something---it has been very quiet recently! Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From @vm.gmd.de:PEB@DMUMPIWH.BITNET Thu Dec 17 09:53:28 1992 Flags: 000000000001 Return-Path: <@vm.gmd.de:PEB@DMUMPIWH.BITNET> Received: from VAX01.AMS.ORG (VAX01.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA10150; Thu, 17 Dec 92 09:53:24 MST Received: from vm.gmd.de by MATH.AMS.ORG (PMDF #2306 ) id <01GSF6803UBKEUEC22@MATH.AMS.ORG>; Thu, 17 Dec 1992 11:43:23 EST Received: from DMUMPIWH by vm.gmd.de (IBM VM SMTP V2R2) with BSMTP id 3418; Thu, 17 Dec 92 17:41:56 MET Received: from DMUMPIWH (PEB) by DMUMPIWH (Mailer R2.08) with BSMTP id 2792; Thu, 17 Dec 92 17:42:03 GMT Date: 17 Dec 1992 17:36:04 +0000 (GMT) From: Peter Breitenlohner Subject: Re: [TeX V3.141, Section 963.] In-Reply-To: Message of 17 Dec 1992 16:32:49 -0300 (BST) from To: Tex-implementors@MATH.AMS.ORG Message-Id: <01GSF680AJG2EUEC22@MATH.AMS.ORG> Organization: Max-Planck-Institut fuer Physik, Muenchen Content-Transfer-Encoding: 7BIT Hi all there, I (and may be others as well) would greatly appreciate if Vax-specific (was it Vax?) implementation problems would be discussed on a Vax-specific list and not on the general tex-implementors list (as, e.g., on TEX-IBM for IBM mainframe specific TeX problems). This list should remain reserved to questions of general interest. Sincerely (and somewhat annoyed) Peter Breitenlohner From CHAA006@VAX.RHBNC.AC.UK Thu Dec 17 09:54:00 1992 Flags: 000000000001 Return-Path: Received: from math.ams.org (MATH.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA10158; Thu, 17 Dec 92 09:53:54 MST Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GSF5T7XKB4EUEBZF@MATH.AMS.ORG>; Thu, 17 Dec 1992 11:32:25 EST Date: 17 Dec 1992 16:32:49 -0300 (BST) From: CHAA006@VAX.RHBNC.AC.UK Subject: Re: [TeX V3.141, Section 963.] Sender: "JANET CHAA006@UK.AC.RHBNC.VAX" To: tex-implementors Reply-To: Philip Taylor (RHBNC) Message-Id: <2F2146DE_000788B8.009653B159F147E0$58_2@UK.AC.RHBNC.VAX> Content-Transfer-Encoding: 7BIT Via: uk.ac.rhbnc.vax; Thu, 17 Dec 1992 16:31:38 +0000 Actually-To: Originally-To: CBS%UK.AC.CAMBRIDGE.PHOENIX::CET1 Mailer: Janet_Mailshr V3.5 ( 13-OCT-1989 14:07:27 ) >>> Phil has sent me an example of the "blowing up", but as it involves >>> a \hyphenation command and section 963 is concerned solely with >>> \patterns, I rather doubt whether they are related. I will take this >>> matter offline from tex-implementors. Just to clarify, the `blowing up' which I sent Chris is a trivial example of a problem in the same general area which I located whilst trying to track down the real problem in BHK's TeX. The original problem occurred in the statement cited: WHILE L<=K DO BEGIN C:=HC[L];L:=L+1;P:=TRIEL[Q];FIRSTCHILD:=TRUE; ^^^^^^ L := 64 when the real instance fails, one beyond its upb. In the trivial example, I asked TeX: \hyphenation { %%% probably anything longer than 63/64 chars will suffice foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo-foo} which caused a sub-range error in BHK's TeX. As the DEC Pascal compiler is particularly good at picking up such run-time errors (I always compile TeX with /check=all /noopt), I wonder whether other implementations simply ignore this condition, or whether it occurs only in BHK's implementation? Philip Taylor, RHBNC. From CHAA006@VAX.RHBNC.AC.UK Thu Dec 17 10:35:35 1992 Flags: 000000000001 Return-Path: Received: from math.ams.org (MATH.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA11631; Thu, 17 Dec 92 10:35:30 MST Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GSF74Y4NKWEUEC4Q@MATH.AMS.ORG>; Thu, 17 Dec 1992 12:10:01 EST Date: 17 Dec 1992 17:09:02 -0300 (BST) From: CHAA006@VAX.RHBNC.AC.UK Subject: Vax-specific question? Sender: "JANET CHAA006@UK.AC.RHBNC.VAX" To: tex-implementors Reply-To: Philip Taylor (RHBNC) Message-Id: <2F211656_00077F60.009653B66906B940$1486_3@UK.AC.RHBNC.VAX> Content-Transfer-Encoding: 7BIT Via: uk.ac.rhbnc.vax; Thu, 17 Dec 1992 17:07:51 +0000 Actually-To: Originally-To: CBS%UK.AC.NSFNET-RELAY::DE.GMD.VM::BITNET.DMUMPIWH::PEB Mailer: Janet_Mailshr V3.5 ( 13-OCT-1989 14:07:27 ) Dear Peter --- >>> I (and may be others as well) would greatly appreciate if Vax-specific >>> (was it Vax?) implementation problems would be discussed on a Vax-specific >>> list and not on the general tex-implementors list (as, e.g., on TEX-IBM >>> for IBM mainframe specific TeX problems). This list should remain >>> reserved to questions of general interest. >>> Sincerely (and somewhat annoyed) Peter Breitenlohner I am very sorry that my question has displeased you. However, it was asked on this list for the very reason that I was not (and am not) convinced that it is VAX-specific at all. Indeed, as my follow-up postings have indicated, I am concerned that there may be a problem in TeX which implementations other than VAX/VMS DEC Pascal are simply failing to detect, if they are not compiled in full debugging mode with sub-range checking enabled. I hope you will accept my assurance that this question was asked in good faith. Philip Taylor, RHBNC. From CET1@phx.cam.ac.uk Thu Dec 17 16:40:43 1992 Flags: 000000000001 Return-Path: Received: from math.ams.org (MATH.AMS.COM) by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA21841; Thu, 17 Dec 92 16:40:39 MST Received: from gray.csi.cam.ac.uk by MATH.AMS.ORG (PMDF #2306 ) id <01GSFKBI11HCEUEBAW@MATH.AMS.ORG>; Thu, 17 Dec 1992 18:27:04 EST Received: from phx.cam.ac.uk by ppsw1.cam.ac.uk with NIFTP (PP-6.0) Cambridge as ppsw.cam.ac.uk id <25728-0@ppsw1.cam.ac.uk>; Thu, 17 Dec 1992 23:26:50 +0000 Date: 17 Dec 1992 23:26:37 +0000 (GMT) From: Chris Thompson Subject: (Not a) Vax-specific question In-Reply-To: <2F211656_00077F60.009653B66906B940$1486_3@UK.AC.RHBNC.VAX> To: tex-implementors Cc: Philip Taylor (RHBNC) Message-Id: Content-Transfer-Encoding: 7BIT In response to Peter Breitenlohner's outbreak of seasonal goodwill, Philip Taylor writes: > However, it was asked on this list for the very reason that I was not > (and am not) convinced that it is VAX-specific at all. And indeed, it isn't. Phil has discovered two bugs in TeX, both introduced in the aforementioned change #360 (TeX 2.992): 1. In |new-hyph_exceptions|, |n| is declared as |small_integer| but may now reach 64, as a result of an extra increment at the start of section 939. Provoked by very long strings in \hyphenation. 2. In |new_patterns|, |l| is declared as |small_integer| but may now reach |k+1|, which may be 64, as a result of the changes to section 963 described before. Provoked by very long strings in \patterns. (|small_integer| is |0..63|.) I turned on runtime checking in my MVS (Pascal/VS) implementation and get just the same effects as Phil was observing in his VMS version. > Indeed, as my follow-up postings have indicated, I am concerned that > there may be a problem in TeX which implementations other than VAX/VMS > DEC Pascal are simply failing to detect, if they are not compiled in > full debugging mode with sub-range checking enabled. This is right, but the failure to detect the problem will be harmless in almost all conceivable implementations. The out-of-range value is never used in (2), and in (1) only to address an array |hc| which is in fact large enough. If anyone were still using 6-bit bytes, and had a Pascal compiler that stored a |small_integer| in one, it might be another matter. Peter Breitonlohner writes > This list should remain reserved to questions of general interest. Is the question of the amount of runtime checking that is turned on in production versions of TeX of sufficiently general interest? It would seem that few such versions have sub-range checking turned on, at any rate, or these bugs would have been discovered before now. (Although apparently the TRIP test contains no cases of hyphenation exceptions or patterns longer than 63 characters.) I admit I had some difficulty making an MVS version of TeX with all Pascal/VS runtime checking options turned on. Some local mods turned out to be impure (suprise!), and I had to disable the checks in the |line_break| routine, which it is notoriously difficult to get Pascal/VS to compile in the first place. Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From @MATH.AMS.COM:jmr@nada.kth.se Tue Dec 3 14:40:23 1991 Flags: 000000000001 Return-Path: <@MATH.AMS.COM:jmr@nada.kth.se> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA20509; Tue, 3 Dec 91 14:40:12 MST Received: from nada.kth.se by MATH.AMS.COM via SMTP with TCP; Tue, 3 Dec 91 16:03:49-EDT Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA24532; Tue, 3 Dec 91 22:00:57 +0100 Date: Tue, 3 Dec 91 22:00:56 +0100 From: Jan Michael Rynning To: tex-implementors@math.ams.com Subject: Swedish hyphenation report Message-Id: % This is the report on Swedish hyphenation which I presented to the % Nordic TeX Users Group meeting in November. It may be of use to % others who are trying to generate hyphenated dictionaries and/or % hyphenation patterns for other languages. Feel free to pass it on. % I would appreciate comments, in particular from others with experience % in this field. \documentstyle[a4]{article} \newcommand\word[1]{{\em #1\/}} \author{Jan Michael Rynning\\ Royal Institute of Technology\thanks{ Postal address: Department of Numerical Analysis and Computing Science, Royal Institute of Technology, S-100~44 Stockholm, Sweden. Internet: jmr@nada.kth.se. BITNET: jmr@sekth. Telephone: +46~8~7906288.}\\ Stockholm} \date{November 1991} \title{Swedish Hyphenation for \TeX} \begin{document} \maketitle \section{Introduction} The \TeX~\cite{knuth:texbook} typesetting system takes two interesting approaches to hyphenation. Firstly, \TeX{} hyphenates using a language-independent pattern recognition algorithm, and a set of language-dependent patterns. Secondly, \TeX{} comes with an auxiliary program named PATGEN~\cite{liang:hyphenation}, which can scan a hyphenated dictionary and generate a set of hyphenation patterns for \TeX. Most other typesetting systems use patterns which someone has guessed, and have them hard-wired into the programs. In this paper I'll describe the work I've done so far on generating Swedish hyphenation patterns for \TeX{} and what remains to be done. I'll also try to share my experience from generating a hyphenated dictionary and give some good advice to anyone who is trying to do the same thing. \section{The Swedish language} \subsection{Spelling} Swedish spelling is mostly consistent (at least compared to English spelling), and resembles the pronunciation. Foreign words used in Swedish are usually given a Swedish spelling after a while, and the original spelling is given up within a couple of decades. Two of the words which are currently in the phase of switching over are \word{tape}$\rightarrow$\word{tejp} and \word{juice}$\rightarrow$\word{jos}. One exception to this is a number of words ending with \word{age}, like \word{bandage} and \word{garage}, which have been used for ages, and show no signs of changing their spelling. \subsection{Compound words} Swedish has lots of compound words. Theoretically there is no limit to the number of parts which can be used to make up a compound word, or to the length of a compound word. Hence there is no limit to the number of compound words. In practice you hardly ever see compound words with more than four parts. One peculiarity of Swedish compound words is that we often add and/or remove letters at the end of the parts when we form them (except for the last part, which never changes): \word{bord}~+ \word{duk}~$\rightarrow$ \word{bord\underline{s}duk} (\word{table-cloth}), \word{gat\underline{a}}~+ \word{namn}~$\rightarrow$ \word{gat\underline{u}namn} (\word{street name}), \word{skol\underline{a}}~+ \word{l\"arare}~$\rightarrow$ \word{skoll\"arare} (\word{schoolteacher}). One thing which is even more peculiar is that this sometimes depends on whether the compound word consists of two parts or more than two parts: \word{grund}~+ \word{skol\underline{a}}~+ \word{l\"arare}~$\rightarrow$ \word{grundskol\underline{e}l\"arare} (\word{comprehensive school teacher}). \subsection{Hyphenation} The preferable place to hyphenate is between the parts of a compound word. Simple words and parts of compound words which have more than one syllable may also be hyphenated. Swedish hyphenation is strictly syllabic. That makes it easy to hyphenate. The foreign words which have retained their original spelling is a small complication to hyphenation. If \word{tape} and \word{juice} were pronounced the way they are spelt, they would be hyphenated \word{ta-pe} and \word{ju-i-ce}. However, they are both one-syllabled words and should therefore not be hyphenated at all. Similarly, \word{bandage} and \word{garage} should not be hyphenated between the \word{a} and \word{ge}, as they would be if they were pronounced the way they are spelt. This increases the size of the patterns, but is really not a problem. It's two aspects of compound words which really complicate Swedish hyphenation. Firstly, many words start and/or end with several consonant (some words start with four consonants and some derived forms end with five). Secondly, the transformations which occur when we form compound words, in particular the removal of the final vowel and/or the adding of an \word{s}, sometimes makes the boundary between the parts very hard to find by computer. There are even compound words which look the same, but are hyphenated differently depending on how they are formed: \word{bil}~+ \word{drulle}~$\rightarrow$ \word{bil-drulle} (\word{road hog})---\word{bild}~+ \word{rulle}~$\rightarrow$ \word{bild-rulle} (\word{film reel}). It's usually better to avoid hyphenating these words than to get them wrong. Fortunately there are very few of them. Finally, there are compound words made up from one part ending with a double consonant and the next part starting with the same consonant. One of those consonants is taken out when the parts are put together, but when the word is hyphenated, it is restored: \word{til\underline{l}}~+ \word{l\aa ta}~$\rightarrow$ \word{till\aa ta}~$\rightarrow$ \word{till-l\aa ta} (\word{allow}). \TeX's pattern-based hyphenation can't handle this peculiarity, so for the sake of pattern-generation we'll have to pretend that those words can't be hyphenated at such places. Fortunately there are very few of them. \section{What I have done so far} \subsection{Swedish letters} PATGEN only treats \word{a}--\word{z} as letters, not \word{\aa}, \word{\"a}, and \word{\"o}. Fixing that was easy. While I was at it, I also made PATGEN ask for the values to use for \word{lefthyphenmin} and \word{righthyphenmin}. Later I received a copy of Peter Breitenlohner's modified PATGEN, which reads the character set representation from a file, so you don't need to modify the program to use your national letters. It also allows you to set \word{lefthyphenmin} and \word{righthyphenmin}. \subsection{Inconsistencies} I set out with the hyphenated dictionary and PATGEN parameters which Stefan Kronborg of \AA bo Akademi used for generating Swedish hyphenation patterns. Those parameters were very different from the ones Frank Liang lists in his report~\cite{liang:hyphenation}. They also required \word{trie\_size} and \word{triec\_size} to be increased my more than an order of magnitude. That seemed wrong, so I tried to generate Swedish patterns using Frank Liang's parameters. Stefan Kronborg had told me that I would run into a lot of inconsistencies in the dictionary if I did that, and sure I did! So, I decided to fix the inconsistencies I had found. After I had done that, Frank Liang's parameters worked very well, and the size of the generated patterns shrunk a lot. I still use Frank Liang's parameters. They are probably not optimal for Swedish, but they work well enough. There is no point in trying to optimize the parameters as long as the dictionary is still changing rapidly, so I have postponed that until I have a ``stable'' dictionary. \subsection{Long compound words} The dictionary originally only had simple words and most of their derived forms. Hence, my next step was to start adding compound words and derived forms which were not in the dictionary. I had about 150,000 such words, and I realized that I couldn't do them all in one day. After some thinking, I decided to start with the longest words, the ones with 22 letters or more. My reasoning was that by starting with the longest, I would cover both the peculiarities of compound words in general and of those with more than two parts in particular. I also expected to obtain a more-or-less random selection of words that way. However, when I hyphenated them I quite often found two or three derived forms of the same word. Those forms usually only differed by a letter or two. It would probably have been a better choice to start with a random selection of words with, let's say, 20 letters or more. Hyphenating lots of words by hand is tedious. Letting the computer hyphenate the words, and then proof-read and correct the errors can save a lot of time, provided that the computer does a good job. I tried to use the patterns I had for hyphenating the words, but in my first test that got 66\% of them wrong. Then I wrote a rather simple-minded program which tried to find the parts of the compound words in the dictionary, taking into account some of the most common transformations used when forming compound words and the missing derived forms. That program succeeded to find the parts of 65\% of the words, and only got 6.6\% of them wrong. I decided to use the patterns for the remaining 35\% of the words. Now I was down to correcting 27\% of the words. Most of those words only had one or two hyphens wrong, so I had to correct less than 10\% of the hyphens. \subsection{One-syllabled words} Many of the words which my program couldn't sort out were made up from common, simple words. I checked the hyphenated dictionary, and those words weren't there. After a while I realized that the hyphenated dictionary only contained hyphenatable words, no one-syllabled ones. As a consequence, some long one-syllabled words got hyphenated by the patterns! I added all the one-syllabled words I could find to the dictionary. \subsection{Iterative improvement} After that, my program managed to sort out 81\% of the words. I decided to take a turn with all the 15-letter words I had. Proof-reading words, most of which are correctly hyphenated, isn't very rewarding, and adding them to the dictionary isn't much of an improvement, so I soon gave that up in favour of another approach. I used the patterns I had generated to hyphenate all the words I had, which were not in the hyphenated dictionary. I then searched the result for all ``syllables'' with only consonants or three or more vowels. I really should have looked for two or more vowels, but there were far too many such words at the time. Now, all the words I had to proof-read were incorrectly hyphenated (well, nearly all), and correcting them and adding them to the dictionary would be an improvement. After I had added them to the dictionary, I generated a new set of patterns and iterated. Fixing the ``syllables'' with only consonants removes hyphens, and fixing the ones with two or more vowels adds hyphens. By working from both ends simultaneously, I expect to converge in the middle more rapidly than if I work from one end at the time. Since I started this approach I have fixed more than 12,700 words. The number of consonants-only ``syllables'' I find in each iteration has decreased by an order of magnitude. The three-vowel ``syllables'' soon became so rare that I started to look at two-vowel ``syllables'' with 10 letters or more. I'm now down to 8, and I've found one word with a real 8-letter syllable: \word{in-skr\"ankts} (the passive form of the perfect tense of \word{restrict}). \section{What remains to be done} \subsection{More compound words} If I continue to iterate, I will finally reach a point where no more words to be fixed show up. I would guess that I have done between half and two thirds of the work it takes to get there. I plan to continue to iterate as long as I see that a lot of everyday words get incorrectly hyphenated. \subsection{Derived forms} The genitive of nouns and passive/reflexive form of verbs are both missing from the hyphenated dictionary. Some such words get incorrectly hyphenated. Both forms are formed by adding an \word{s} to the end of the word. That \word{s} doesn't influence the hyphenation. I plan to extract all words I have, which are identical to a word in the dictionary followed by an \word{s}, hyphenate them identically, skim through them, and add them. \subsection{Restoring the \'e accent} Swedish has a small number of words of foreign origin spelt with an \word{\'e} (a small, but unlimited number, since they can be used to form compound words). These words are spelt without the accent in the dictionary. I plan to add the accent to them. I can see no way of doing this automatically, since some words and derived forms are identical, except for this accent: \word{ide} (\word{animal's winter quarters})---\word{id\'e} (\word{idea}), \word{armen} (\word{the arm})---\word{arm\'en} (\word{the army}). \subsection{Off-by-one hyphenation} Hyphenating words just after the first letter or just before the last is technically correct for some words, but a thing you normally want to avoid. In \TeX{} you can easily prevent this from happening by setting \word{lefthyphenmin} and \word{righthyphenmin} to values greater than one. Hyphenating compound words one letter away from the beginning or end of one of their parts can be very confusing. For the word \word{skolelev}, the hyphenation \word{skol-elev} (\word{school pupil}) is desirable, but \word{skole-lev} (\word{schoolp upil}) is not. The only way to prevent this from happening is to generate patterns from a dictionary which has these hyphens taken out. I've had complaints about this, and I disapprove of these hyphenations myself, so I plan to fix it. \subsection{Weighting words} I've noticed that infrequently used words prevent some common words from being hyphenated. PATGEN allows you to weight words. I could weight the most commonly used words, to make sure they get hyphenated. I have some frequency information for Swedish words, but I haven't yet decided if I'm going to weight words or how I'm going to do it. \subsection{More inconsistencies} I expect to find and fix more inconsistencies in the dictionary as I go along. \section{Some good advice} \subsection{Be consistent} Make sure your dictionary is consistently hyphenated! That's my first and most important piece of advice. Inconsistencies in 1\% of the words can easily make the patterns 20\% larger. The inconsistencies also make the patterns miss more hyphens. \subsection{Include non-hyphenatable words} It may not be obvious that non-hyphenatable words should be included in the hyphenated dictionary, but if you leave them out, the generated patterns may erroneously hyphenate some of them. \subsection{Include all derived forms} It's also important to include all derived forms of the words. The computer doesn't know that they are derived forms. If you don't include them, they are just letter patterns which the computer has never seen before. \subsection{Use patterns for hyphenating new words} Using existing patterns to hyphenate new words, and correcting the errors, can save a lot of time compared to putting all the hyphens in by hand. If you start from scratch with a set of unhyphenated words and no patterns, one way to hyphenate your first batch of words is to use the patterns for another language with a similar hyphenation. I've tested the hyphenation patterns for various languages on the Swedish hyphenated dictionary, with varying success (see table~\ref{table:languagetest}). It's a bit of a surprise that the Finnish patterns produce a better result than the patterns for some of the languages which are similar to Swedish. The Finnish vocabulary has no similarities to Swedish, but the hyphenation is obviously similar. The most interesting result of this test is that the languages I tested seem to form two groups: one whose patterns finds about 80\% of the hyphens and the other about 45\%. It would be interesting to see the outcome of this test for other languages than Swedish. \begin{table}[ht] \begin{center} \begin{tabular}{lr@{}lr@{}l} \hline Patterns&\multicolumn{2}{l}{\% good}&\multicolumn{2}{l}{\% bad}\\ \hline Danish&80.62&&17.68\\ German&82.88&&14.59\\ Dutch&79.82&&12.35\\ Portuguese&79.01&&15.08\\ Finnish&81.83&&10.26\\ US English&45.69&&16.63\\ French&45.20&&59.93\\ \hline \end{tabular} \end{center} \caption{\label{table:languagetest}Patterns tested on the Swedish hyphenated dictionary.} \end{table} \subsection{Use iterative improvement} Using the iterative improvement which I've described seems to lead to good patterns more quickly than any other method. \section{Conclusion} Two years ago someone told me that it would be impossible to use \TeX's hyphenation for Swedish, because of all the compound words we have. I asked the person to elaborate, and the answer I got was that it was impossible because the patterns would be very large and that would make the hyphenation terribly slow. My comment to that was that it doesn't matter if it takes the computer a fraction of a second longer per page, as long as the hyphenation is good, because no human being can hyphenate faster than that anyway. I have forgotten who it was, but I would like to thank that person for encouraging me. I love to prove people wrong when they use the word ``impossible'' carelessly. We now have a dictionary with 100,000 hyphenated words. 14,000 of them are compound words. The patterns generated from that dictionary find 98.85\% of the hyphens and don't generate any which aren't there. I've tested the patterns on a random sample of 100 words which are not in the hyphenated dictionary. They found 269 hyphens out of 278 (96.76\%), missed 9 (3.24\%), and generated 3 incorrect hyphens (1.08\%). Since most common words are in the hyphenated dictionary, this should give an error rate well below 1\% for ordinary text. The Swedish patterns are still smaller than the US English patterns. I estimate that they will grow to about the same size or become slightly larger than the US English ones. \begin{thebibliography}{9} \bibitem{knuth:texbook} Donald~E. Knuth. \newblock {\em The \TeX book}. \newblock Addison Wesley, Reading, Massachusetts, 1986. \bibitem{liang:hyphenation} Franklin~M. Liang. \newblock {\em Word Hy-phen-a-tion by Com-put-er}. \newblock Stanford University, Stanford, California, 1983. \newblock Report No. STAN-CS-83-977. \end{thebibliography} \end{document} From @MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:mattes@azu.informatik.uni-stuttgart.de Mon Dec 23 05:38:13 1991 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:mattes@azu.informatik.uni-stuttgart.de> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA16589; Mon, 23 Dec 91 05:38:10 MST Received: from ifi.informatik.uni-stuttgart.de by MATH.AMS.COM via SMTP with TCP; Mon, 23 Dec 91 07:24:11-EDT Received: from azu.informatik.uni-stuttgart.de by ifi.informatik.uni-stuttgart.de with SMTP; Mon, 23 Dec 91 13:17:16 +0100 From: Eberhard Mattes Date: Mon, 23 Dec 91 13:23:32 +0100 Message-Id: <9112231223.AA18457@azu.informatik.uni-stuttgart.de> Received: by azu.informatik.uni-stuttgart.de; Mon, 23 Dec 91 13:23:32 +0100 To: tex-implementors@math.ams.com Subject: overflow when converting real -> integer (scaled) % This TeX input causes an overflow in the following line of |vlist_out|: % % rule_ht:=rule_ht+round(float(glue_set(this_box))*stretch(g)); % % The |round| Pascal function usually prints an error message and aborts % if the |real| value cannot be converted to an |integer| (as in this case). % % What should a decent TeX implementation do? % % Does TeX need a special |round| implementation which uses some $x$ or % $-x$ (depending on the sign of the argument) on overflow, where both $x$ % and $-x$ are |integer| values (|maxint|, for instance)? But that `solution' % would break other combinations of values. Is there any real solution? % % DEK writes in glue.web: % % surely we need not bother trying to accommodate such anomalous % combinations of values. % % but there are real-life texts, where this happens, see below. LaTeX uses % % \vskip 0pt plus .0001fil % % in \raggedbottom! (Maybe that can be changed.) % % The behavior of TeX is machine/implementation-dependent for this file, % maybe @^Overflow in arithmetic@> should be added to tex.web for that line % and similar lines. % \shipout \vbox to 100pt{% \vskip0pt plus 1fil \vskip0pt plus -1fil \vskip0pt plus 0.00001fil } \end % % Here's a short LaTeX/REVTEX input file which shows the problem (the % problem is not due to the almost empty tabular environment!): % \documentstyle[revtex]{aps} \begin{document} \begin{table} \begin{tabular}{rr} \\ \end{tabular} \end{table} \pagebreak \end{document} % % Yours, % Eberhard Mattes (mattes@azu.informatik.uni-stuttgart.de) % From @MATH.AMS.COM,@blue.arbortext.com:jjs@arbortext.com Mon Dec 23 09:15:12 1991 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@blue.arbortext.com:jjs@arbortext.com> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA17052; Mon, 23 Dec 91 09:14:50 MST Received: from blue.arbortext.com by MATH.AMS.COM via SMTP with TCP; Mon, 23 Dec 91 10:48:44-EDT Received: by blue.arbortext.com (5.54/ati-3.1) id AA01630; Mon, 23 Dec 91 10:41:40 EST Received: by ironwood.arbortext.com (4.1/ati-3.0) id AA22242; Mon, 23 Dec 91 10:35:26 EST Date: Mon, 23 Dec 91 10:35:26 EST From: jjs@arbortext.com Message-Id: <9112231535.AA22242@ironwood.arbortext.com> To: mattes@azu.informatik.uni-stuttgart.de, tex-implementors@math.ams.com Subject: Re: overflow when converting real -> integer (scaled) Cc: jjs@arbortext.com Greeting, We ran into this quite a while back as well at ArborText. I reported the problem to Knuth, whose response basically was: "Why would you ever want to do that? That's really not a bug in tex.web, it's in implementation restriction that you're more than welcome to fix up in your change files." Turns out we had quite a few legimate reasons to want to do that as well... The only "correct" way to solve this would be to use 8-byte long arithmetic for these computations. Short of that, we did the following. Note that the problem shows up in both |hlist_out| and |vlist_out|. @x Module 625 @ @= begin g:=glue_ptr(p); rule_wd:=width(g); if g_sign<>normal then begin if g_sign=stretching then begin if stretch_order(g)=g_order then rule_wd:=rule_wd+round(float(glue_set(this_box))*stretch(g)); @^real multiplication@> end else begin if shrink_order(g)=g_order then rule_wd:=rule_wd-round(float(glue_set(this_box))*shrink(g)); end; end; if subtype(p)>=a_leaders then @; goto move_past; end @y4 @ For Pub\TeX, we add some logic to the step where the glue set is multiplied by the stretch/shrink. The product can be greater than |max_int| in certain situations. To prevent an integer overflow on the subsequent |round| operation, truncate large products to |max_allowed_glue|. The rationale for picking the value of |max_allowed_glue| was simply to use a very large number approaching |maxdimen|. There is the potential that this truncation may result in incorrect results, but our experience has shown that with glue values this large the truncation makes little difference and is much preferable to simply letting an integer overflow exception happen. @d max_allowed_glue==float_constant(100000000) @= begin g:=glue_ptr(p); rule_wd:=width(g); if g_sign<>normal then begin if g_sign=stretching then begin if stretch_order(g)=g_order then begin g_temp:=float(glue_set(this_box))*stretch(g); @^real multiplication@> if abs(g_temp)>max_allowed_glue then begin if g_temp > float_constant(0) then g_temp:=max_allowed_glue else g_temp:=-max_allowed_glue; end; rule_wd:=rule_wd+round(g_temp); end; end else begin if shrink_order(g)=g_order then begin g_temp:=float(glue_set(this_box))*shrink(g); @^real multiplication@> if abs(g_temp)>max_allowed_glue then begin if g_temp > float_constant(0) then g_temp:=max_allowed_glue else g_temp:=-max_allowed_glue; end; rule_wd:=rule_wd-round(g_temp); end; end; end; if subtype(p)>=a_leaders then @; goto move_past; end @z @x Module 629 @!g_order: glue_ord; {applicable order of infinity for glue} @!g_sign: normal..shrinking; {selects type of glue} @y4 @!g_order: glue_ord; {applicable order of infinity for glue} @!g_sign: normal..shrinking; {selects type of glue} @!g_temp: real; {used while stretching/shrinking glue} @z @x Module 634 @ @= begin g:=glue_ptr(p); rule_ht:=width(g); if g_sign<>normal then begin if g_sign=stretching then begin if stretch_order(g)=g_order then rule_ht:=rule_ht+round(float(glue_set(this_box))*stretch(g)); @^real multiplication@> end else begin if shrink_order(g)=g_order then rule_ht:=rule_ht-round(float(glue_set(this_box))*shrink(g)); end; end; if subtype(p)>=a_leaders then @; goto move_past; end @y4 @ See the comments in |hlist_out| for an explanation of the Pub\TeX\ related changes made to the way the floating point multiplication is done in this module. @= begin g:=glue_ptr(p); rule_ht:=width(g); if g_sign<>normal then begin if g_sign=stretching then begin if stretch_order(g)=g_order then begin g_temp:=float(glue_set(this_box))*stretch(g); @^real multiplication@> if abs(g_temp)>max_allowed_glue then begin if g_temp > float_constant(0) then g_temp:=max_allowed_glue else g_temp:=-max_allowed_glue; end; rule_ht:=rule_ht+round(g_temp); end; end else begin if shrink_order(g)=g_order then begin g_temp:=float(glue_set(this_box))*shrink(g); @^real multiplication@> if abs(g_temp)>max_allowed_glue then begin if g_temp > float_constant(0) then g_temp:=max_allowed_glue else g_temp:=-max_allowed_glue; end; rule_ht:=rule_ht-round(g_temp); end; end; end; if subtype(p)>=a_leaders then @; goto move_past; end @z Jim Sterken ArborText, Inc. ============================================================ > From @MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:mattes@azu.informatik.uni-stuttgart.de Mon Dec 23 07:35:18 1991 > Received: by blue.arbortext.com (5.54/ati-3.1) > id AA01110; Mon, 23 Dec 91 07:25:50 EST > Received: from ifi.informatik.uni-stuttgart.de by MATH.AMS.COM via SMTP > with TCP; Mon, 23 Dec 91 07:24:11-EDT > Received: from azu.informatik.uni-stuttgart.de by > ifi.informatik.uni-stuttgart.de with SMTP; Mon, > 23 Dec 91 13:17:16 +0100 > From: Eberhard Mattes > Date: Mon, 23 Dec 91 13:23:32 +0100 > Message-Id: <9112231223.AA18457@azu.informatik.uni-stuttgart.de> > Received: by azu.informatik.uni-stuttgart.de; Mon, > 23 Dec 91 13:23:32 +0100 > To: tex-implementors@math.ams.com > Subject: overflow when converting real -> integer (scaled) > Status: R > > % This TeX input causes an overflow in the following line of |vlist_out|: > % > % rule_ht:=rule_ht+round(float(glue_set(this_box))*stretch(g)); > % > % The |round| Pascal function usually prints an error message and aborts > % if the |real| value cannot be converted to an |integer| (as in this case). > % > % What should a decent TeX implementation do? > % > % Does TeX need a special |round| implementation which uses some $x$ or > % $-x$ (depending on the sign of the argument) on overflow, where both $x$ > % and $-x$ are |integer| values (|maxint|, for instance)? But that `solution' > % would break other combinations of values. Is there any real solution? > % > % DEK writes in glue.web: > % > % surely we need not bother trying to accommodate such anomalous > % combinations of values. > % > % but there are real-life texts, where this happens, see below. LaTeX uses > % > % \vskip 0pt plus .0001fil > % > % in \raggedbottom! (Maybe that can be changed.) > % > % The behavior of TeX is machine/implementation-dependent for this file, > % maybe @^Overflow in arithmetic@> should be added to tex.web for that line > % and similar lines. > % > > \shipout > \vbox to 100pt{% > \vskip0pt plus 1fil > \vskip0pt plus -1fil > \vskip0pt plus 0.00001fil > } > \end > > > % > % Here's a short LaTeX/REVTEX input file which shows the problem (the > % problem is not due to the almost empty tabular environment!): > % > > \documentstyle[revtex]{aps} > \begin{document} > \begin{table} > \begin{tabular}{rr} > \\ > \end{tabular} > \end{table} > \pagebreak > \end{document} > > % > % Yours, > % Eberhard Mattes (mattes@azu.informatik.uni-stuttgart.de) > % > From @MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CA_ROWLEY@vax.acs.open.ac.uk Sun Dec 29 20:23:21 1991 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CA_ROWLEY@vax.acs.open.ac.uk> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA00471; Sun, 29 Dec 91 20:23:15 MST Message-Id: <9112300323.AA00471@math.utah.edu> Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.COM via SMTP with TCP; Sun, 29 Dec 91 18:25:55-EDT Received: from vax.acs.open.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <9251-0@sun2.nsfnet-relay.ac.uk>; Sun, 29 Dec 1991 23:19:10 +0000 Date: Sun, 29 DEC 91 23:22:49 GMT From: CA_ROWLEY@vax.acs.open.ac.uk To: tex-implementors <@nsfnet-relay.ac.uk:tex-implementors@math.ams.com> Subject: RE: overflow when converting real -> integer (scaled) Sender: JANET "CA_ROWLEY@UK.AC.OPEN.ACS.VAX" A few observations on this problem. 1. Despite the use of: \vskip 0pt plus 0.0001fil *pure* LaTeX will not produce this problem (unless the actual page-size and, hence, the size of some glue really gets too big). However, the example below shows that the run-time error can be produced by the use of \filbreak, a command that is not documented in LaTeX but is nevertheless available (and useful). It (\filbreak) has been in frequent use within our LaTeX-based system for about 5 years now, and has only produced this run-time error when running test documents: ie it is not, in practice, a problem. Here is the type of LaTeX+\filbreak document which does produce it. \documentstyle{article} \raggedbottom \begin{document} Small amount of text. \filbreak Small amount of text. \par \pagebreak Next page \end{document} [I have not yet looked at the REVTEX styles to see why they produce it, as documented by Eberhardt.] 2. Although the Arbortext idea will work for Eberhardt's example: > \shipout > \vbox to 100pt{% > \vskip 0pt plus 1fil > \vskip 0pt plus -1fil > \vskip 0pt plus 0.00001fil > } > \end it will produce completely incorrect layout for this, similar, example: \shipout \vbox to 100pt{% \vskip 0pt plus 0.00001fil \hbox{Text.} \vskip 0pt plus 1.00001fil \vskip 0pt plus -1fil} \end This is not intended to imply that their solution is not good enough: it will certainly be OK for all the code in use which I know of (the above example was constructed, in the true spirit of numerical analysis, explicitly to break their solution!). What it does illustrate is that certain pieces of TeX code lead to such inherently ill-conditioned calculations; thus nothing short of higher precision arithmetic will solve the problem in general. 3. What is needed in the TeX code is something such as Arbortext's solution PLUS something which writes in the LOG file to record where the value was changed and the fact that the spacing may therefore be incorrect. This would be better than implementation-dependent code but could be open to the criticism that it does not allow an implementation to use higher precision arithmetic as its solution. 4. What is needed elsewhere is education of TeX programmers which: A. Describes why certain types of TeX code are dangerous in that they can lead to such calculations. B. Explains what the TeX program does about the problem and what affects this can have (eg how it can give incorrect layout). C. For known dangerous uses of such code, shows how to change the TeX coding so that it still does what is needed whilst avoiding the problems. 4. To implement 3C, we need to know what sort of code is in use which either has, or may one day, produce this error (I wonder how many times it has been seen and not reported?). For example, what are Arbortext's uses? chris From @MATH.AMS.COM,@ICNUCEVM.CNUCE.CNR.IT:WSULIVAN@IRLEARN.UCD.IE Fri Jan 3 06:00:01 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@ICNUCEVM.CNUCE.CNR.IT:WSULIVAN@IRLEARN.UCD.IE> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25889; Fri, 3 Jan 92 05:59:56 MST Message-Id: <9201031259.AA25889@math.utah.edu> Received: from ICNUCEVM.CNUCE.CNR.IT by MATH.AMS.COM via SMTP with TCP; Fri, 3 Jan 92 07:50:49-EDT Received: from IRLEARN.UCD.IE by ICNUCEVM.CNUCE.CNR.IT (IBM VM SMTP V2R1) with BSMTP id 5426; Fri, 03 Jan 92 12:00:26 MET Received: from IRLEARN.UCD.IE (WSULIVAN) by IRLEARN.UCD.IE (Mailer R2.08) with BSMTP id 3773; Fri, 03 Jan 92 10:18:16 GMT Date: Fri, 03 Jan 92 09:46:25 GMT From: "Wayne G. Sullivan" Subject: glue ratio overflow To: tex-implementors@math.ams.com Glue ratio overflow can never occur with nonnegative glue entries provided that individual terms and their sums do not exceed TeX max dimen value (the `minus' in glue specifications refers to shrink, not negation). Negative glue values are `tricks', but there are important applications. When negative values are used, overflow is possible at any fill level, including `normal'. In practical cases overflow will not occur unless the total is nearly, but not equal, zero, and relatively large glue entries occur. No range or accuracy of floating point arithmetic can overcome glue ratio overflow without changing the basics of TeX arithmetic. A glue value of the form \vskip 10pt plus 0.0001 fil will cause problems only when there the additional trick of the form \vskip 0pt plus 1 fil \penalty -500 \vskip 0pt plus -1 fil is used. Since TeX does have 3 levels of infinity for fil, is it wise to use 0.0001 fil? The basic LaTeX files never use filll, if a skip less than 1 fil is necessary, perhaps it would have been better to have used fill and filll instead of fil and fill in the basic files. To avoid glue ratio overflow one need only insure that the use of teeny fil's does not occur with plus 1 fil and plus -1 fil, or better, make these tricks at a different fil level. From BNB@MATH.AMS.COM Fri Jan 3 14:55:49 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA29406; Fri, 3 Jan 92 14:55:46 MST Date: Fri 3 Jan 92 16:43:08-EST From: bbeeton Subject: flash!!! notice that tex update is about to happen To: tex-implementors@MATH.AMS.COM Message-Id: <694474988.0.BNB@MATH.AMS.COM> In-Reply-To: i've just received this message from don knuth's secretary. if anyone has any bug reports they haven't forwarded yet, please send them to me at once. before the weekend is over, i'll be sending out everything i haven't already forwarded. Barbara, Don dashed in and asked me to send you the following message: Updating TeX -- send everything you have. this means texbook, plain.tex, metafontbook, etc., etc., in addition to all the .web files. -- bb ------- From @MATH.AMS.COM,@rs2.hrz.th-darmstadt.de:schrod@iti.informatik.th-darmstadt.de Fri Jan 10 10:07:03 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@rs2.hrz.th-darmstadt.de:schrod@iti.informatik.th-darmstadt.de> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25728; Fri, 10 Jan 92 10:06:57 MST Received: from rs2.hrz.th-darmstadt.de by MATH.AMS.COM via SMTP with TCP; Fri, 10 Jan 92 09:57:08-EDT Received: from hp5.iti.informatik.th-darmstadt.de by rs2.hrz.th-darmstadt.de with SMTP id AA35086 (5.65c/IDA-1.4.4 for ); Fri, 10 Jan 1992 15:53:41 +0100 Received: by hp5.iti.informatik.th-darmstadt.de (15.11/Server-1.2/HRZ-THD) id AA21243; Fri, 10 Jan 92 15:53:58 mez From: Joachim Schrod Message-Id: <9201101453.AA21243@hp5.iti.informatik.th-darmstadt.de> Subject: Re: FWD: MIME (Multimedia Mail draft standard) To: DHOSEK@HMCVAX.CLAREMONT.EDU (Don Hosek) Date: Fri, 10 Jan 92 15:53:57 MEZ Cc: tex-implementors@math.ams.com In-Reply-To: <01GF4RATLLWW9KM242@HMCVAX.CLAREMONT.EDU>; from "Don Hosek" at Jan 10, 92 12:37 am X-Mailer: ELM [version 2.3 PL11] You wrote: > > I have been asked to find someone willing to supply the necessary > information to allow the appropriate TeX data types for exchange > through the protocol described. Sorry, but I don't understand what you (resp. the persons who asked you) want. Do you want that a "text" subtype "TeX" is defined, like "richtext"? (I beg that you will find nobody -- this would mean to define the abstract and the concrete syntax of TeX markup. Horrible for a language where the lexical structure may be defined by the user. And the definition of the dynamic binding rules are not simple either. Oh, but, if somebody does it, I would be interested to get the result...) Btw, concerning the mail of barbara, I've put the rfc-draft on ftp.th-darmstadt.de, directory pub/incoming/schrod. (I will remove it in, let's say, two weeks.) -- Joachim =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Joachim Schrod Email: schrod@iti.informatik.th-darmstadt.de Computer Science Department Technical University of Darmstadt, Germany From @MATH.AMS.COM,@FRIGGA.CLAREMONT.EDU:DHOSEK@HMCVAX.CLAREMONT.EDU Fri Jan 10 10:07:41 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@FRIGGA.CLAREMONT.EDU:DHOSEK@HMCVAX.CLAREMONT.EDU> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AB25728; Fri, 10 Jan 92 10:07:03 MST Received: from FRIGGA.CLAREMONT.EDU by MATH.AMS.COM via SMTP with TCP; Fri, 10 Jan 92 03:40:33-EDT Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GF4RATLLWW9KM242@HMCVAX.CLAREMONT.EDU>; Fri, 10 Jan 1992 00:37 PDT Date: Fri, 10 Jan 1992 00:37 PDT From: Don Hosek Subject: FWD: MIME (Multimedia Mail draft standard) To: tex-implementors@math.ams.com Message-Id: <01GF4RATLLWW9KM242@HMCVAX.CLAREMONT.EDU> X-Vms-To: tex_implementors I have been asked to find someone willing to supply the necessary information to allow the appropriate TeX data types for exchange through the protocol described. INTERNET DRAFT -- RFC-XXXX MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies Nathaniel Borenstein, Bellcore Ned Freed, Innosoft January, 1992 Status of this Document This draft document will be submitted to the RFC editor as a Proposed Standard protocol specification. Distribution of this document is unlimited. Please send comments to Nathaniel Borenstein or Ned Freed . Experimentation with the mechanisms described in this document is encouraged. It is anticipated that such experimentation will take place during the first half of 1992, after which this document will be revised and submitted as a Draft Standard. Abstract RFC 822 defines a message representation protocol which specifies considerable detail about message headers, but which leaves the message content, or message body, as flat ASCII text. This document redefines the format of message bodies to allow multi-part textual and non-textual message bodies to be represented and exchanged without loss of information. This is based on earlier work documented in RFC 934 and RFC 1049, but extends and revises that work. Because RFC 822 said so little about message bodies, this document is largely orthogonal to (rather than a revision of) RFC 822. In particular, this document is designed to provide facilities to include multiple objects in a single message, to represent body text in character sets other than US- ASCII, to represent formatted multi-font text messages, to represent non-textual material such as images and audio fragments, and generally to facilitate later extensions defining new types of Internet mail for use by cooperating mail agents. This document does NOT extend Internet mail header fields to permit anything other than US-ASCII text data. It is recognized that such extensions are necessary, and they are the subject of a companion document [RFC-HDRS]. INTERNET DRAFT Internet Message Body Format 2 A table of contents appears at the end of this document. INTERNET DRAFT Internet Message Body Format 3 1 Introduction Since its publication in 1982, RFC 822 [RFC-822] has defined the standard format of textual mail messages on the Internet. Its success has been such that the RFC 822 format has been adopted, wholly or partially, well beyond the confines of the Internet and the Internet SMTP transport defined by RFC 821 [RFC-821]. As the format has seen wider use, a number of limitations have proven increasingly restrictive for the user community. RFC 822 was intended to specify a format for text messages. As such, non-text messages, such as multimedia messages that might include audio or images, are simply not mentioned. Even in the case of text, however, RFC 822 is inadequate for the needs of mail users whose languages require the use of character sets richer than US ASCII [US-ASCII]. For mail containing audio, video, Asian language text, or even text in most European languages, RFC 822 does not specify enough to provide interoperability. One of the notable limitations of RFC 821/822 based mail systems is the fact that they limit the contents of electronic mail messages to relatively short lines of seven-bit ASCII. This forces users to convert any non- textual data that they may wish to send into seven-bit bytes representable as printable ASCII characters before invoking a local mail UA (User Agent, a program with which human users send and receive mail). Examples of such encodings currently used in the Internet include pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in RFC 1113, the Andrew Toolkit Representation [ATK], and many others. The limitations of RFC 822 mail become even more apparent as gateways are designed to allow for the exchange of mail messages between RFC 822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the inclusion of non-textual body parts within electronic mail messages. The current standards for the mapping of X.400 messages to RFC 822 messages specify that either X.400 non-textual body parts should be converted to (not encoded in) an ASCII format, or that they should be discarded, notifying the RFC 822 user that discarding has occurred. This is clearly undesirable, as information that a user may wish to receive is lost. Even though a user's UA may not have the capability of dealing with the non-textual body part, the user might have some mechanism external to the UA that can extract useful information from the body part. Moreover, it does not allow for the fact that the message may eventually be gatewayed back into an X.400 message handling system (i.e., the X.400 message is "tunneled" through Internet mail), where the non-textual information would definitely become useful again. INTERNET DRAFT Internet Message Body Format 4 this document describes several mechanisms that combine to solve most of these problems without introducing any serious incompatibilities with the existing world of RFC 822 mail. In particular, it describes: 1. A MIME-Version header field, which uses a version number to declare a message to be conformant with this specification and allows mail processing agents to distinguish between such messages and those generated by older or non-conformant software, which is presumed to lack such a field. 2. A Content-Type header field, generalized from RFC 1049 [RFC-1049], which can be used to specify the type and subtype of data in the body of a message and to fully specify the native representation (encoding) of such data. 2.a. A "text" Content-Type value, which can be used to represent textual information in a number of character sets and formatted text description languages in a standardized manner. 2.b. A "multipart" Content-Type value, which can be used to combine several body parts, possibly of differing types of data, into a single message. 2.c. An "application" Content-Type value, which can be used to transmit application data or binary data, and hence, among other uses, to implement an email file transfer service. 2.d. A "message" Content-Type value, for encapsulating a mail message. 2.e An "image" Content-Type value, for transmitting still image (picture) data. 2.f. An "audio" Content-Type value, for transmitting audio or voice data. 2.g. A "video" Content-Type value, for transmitting video, or moving image data, possibly with audio as part of the composite video data format. 3. A Content-Transfer-Encoding header field, which can be used to specify an auxiliary encoding that was applied to the data in order to allow it to pass through mail transport mechanisms which may have data or character set limitations. 4. Two optional header fields that can be used to further describe the data in a message body or body part, the Content-ID and Content-Description header fields. INTERNET DRAFT Internet Message Body Format 5 Finally, to specify and promote interoperability, Appendix A of this document provides a basic applicability statement for a subset of the above mechanisms that defines a minimal level of "conformance" with this document. HISTORICAL NOTE: Several of the mechanisms described in this document may seem somewhat strange or even baroque at first reading. It is important to note that compatibility with existing standards AND robustness across existing practice were two of the highest priorities of the working group that developed this document. In particular, compatibility was always favored over elegance. 2 Notations, Conventions, and Generic BNF Grammar This document is being published in two versions, one as plain ASCII text and one as PostScript. The latter is recommended, though the textual contents are identical. An Andrew-format copy of this document is also available from the first author (Borenstein). Although the mechanisms specified in this document are all described in prose, most are also described formally in the modified BNF notation of RFC 822. Implementors will need to be familiar with this notation in order to understand this specification, and are referred to RFC 822 for a complete explanation of the modified BNF notation. Some of the modified BNF in this document makes reference to syntactic entities that are defined, not in this document, but in RFC 822. Therefore RFC 822 is required for a complete grammar. Like RFC 822, this document has an appendix that is a collected grammar. A complete formal grammar, then, is obtained by combining the collected grammar appendix of this document with that of RFC 822. The term CRLF, in this document, refers to the sequence of the two ASCII characters CR (13) and LF (10) which, taken together, denote a line break in RFC 822 mail. The term "character set", wherever it is used in this document, refers to a coded character set, in the sense of ISO character set standardization work, and should not be misinterpreted as meaning "a set of characters." In this document, all numeric and octet values are given in decimal notation. It should be noted that Content-Type values, subtypes, and parameter names as defined in this document are case- insensitive. However, parameter values are case-sensitive. INTERNET DRAFT Internet Message Body Format 6 FORMATTING NOTE: This document has been carefully formatted for ease of reading. The PostScript version of this document, in particular, places notes like this one, which may be skipped by the reader, in a smaller, italicized, font, and indents it as well. In the text version, only the indentation is preserved, so if you are reading the text version of this you might consider using the PostScript version instead. However, all such notes will be indented and preceded by "NOTE:" or some similar introduction, even in the text version. The primary purpose of these non-essential notes is to convey information about the rationale of this document, or to place this document in the proper historical or evolutionary context. Such information may be skipped by those who are focused entirely on building a compliant implementation, but may be of use to those who wish to understand why this document is written as it is. For ease of recognition, all BNF definitions have been placed in a fixed-width font in the PostScript version of this document. 3 The MIME-Version Header Field Since RFC 822 was published in 1982, there has really been only one format standard for Internet messages, and there has been little perceived need to declare the format standard in use. This document is an independent document that complements RFC 822. Although the extensions in this document have been defined in such a way as to be compatible with RFC 822, there are still circumstances in which it might be desirable for a mail-processing agent to know whether a message was composed with the new standard in mind. Therefore, this document defines a new header field, "MIME- Version", which is to be used to declare the version of the Internet message body format standard in use. Messages composed in accordance with this document MUST include such a header field, with the following verbatim text: MIME-Version: 1.0 The presence of this header field is an assertion that the message has been composed in compliance with this document. INTERNET DRAFT Internet Message Body Format 7 Since it is possible that a future document might extend the message format standard again, a formal BNF is given for the content of the MIME-Version field: MIME-Version := text Thus, future format specifiers, which might replace or extend "1.0", are constrained by the definition of "text", which appears in RFC 822. Note that the MIME-Version header field is required at the top level of a message. It is not required for each body part of a multipart message. It is required for the embedded headers of a body or body part of type "message" if and only if the embedded message is itself claimed to be MIME-compliant. 4 The Content-Type Header Field The purpose of the Content-Type field is to describe the data contained in the message body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner. HISTORICAL NOTE: The Content-Type header field was first defined in RFC 1049. RFC 1049 Content- types used a simpler and less powerful syntax, but one that is largely compatible with the mechanism given here. However, most of the specific values for the Content-Type field that were defined by RFC 1049 have been replaced, in this document, with type/subtype pairs. A few types that were incompletely defined in RFC 1049, and never used in any known implementation, are omitted here, but could be reintroduced in the new type/subtype scheme without major difficulty. The Content-Type header field is used to specify the nature of the data in a message, by giving type and subtype identifiers, and by providing auxiliary information that may be required for certain types. After the type and subtype names, the remainder of the header field is simply a set of parameters, specified in an attribute/value notation. The set of meaningful parameters differs for the different types. Among the defined parameters is a "charset" parameter by which the character set used in the message body or body part may be declared. Comments are allowed in accordance with RFC 822 rules for structured header fields. In general, the top-level Content-Type is used to declare the general type of data, while the subtype specifies a specific format for that type of data. Thus, a Content-Type INTERNET DRAFT Internet Message Body Format 8 of "image/xyz" is enough to tell a user agent that the data is an image, even if the user agent has no knowledge of the specific image format "xyz". Such information can be used, for example, to decide whether or not to show a user the raw data from an unrecognized subtype -- such an action might be reasonable for unrecognized subtypes of text, but not for unrecognized subtypes of image or audio. For this reason, registered subtypes of audio, image, text, and video, should not contain embedded information that is really of a different type. Such compound types should be represented using the "multipart" or "application" types. Parameters are modifiers of the content-subtype, and do not fundamentally affect the requirements of the host system. Although most parameters make sense only with certain parameters, others are "global" in the sense that they might apply to any subtype. For example, the "boundary" parameter makes sense only for the "multipart" content-type, but the "charset" parameter might make sense with several content- types. An initial set of seven Content-Types is defined by this document. This set of top-level names is intended to be substantially complete. It is expected that additions to the larger set of supported types can generally be accomplished by the creation of new subtypes of these initial types. In the future, more top-level types may be defined only by an extension to this standard. If another primary type is to be used for any reason, it should be given a name starting with "X-" to indicate its non-standard status and to avoid a potential conflict with a future official name. In the Extended BNF notation of RFC 822, a Content-Type header field value is defined as follows: INTERNET DRAFT Internet Message Body Format 9 Content-Type:= type "/" subtype *[";" parameter] type := "application" / "audio" / "image" / "message" / "multipart" / "text" / "video" / x-token x-token := subtype := token parameter := attribute "=" value attribute := token value := token / quoted-string token := 1* tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in / "," / ";" / ":" / "\" / <"> ; quoted-string, / "/" / "[" / "]" / "?" / "." ; to use within / "=" ; parameter values Note that the definition of "tspecials" is the same as the RFC 822 definition of "specials" with the addition of the three characters "/", "?", and "=". Note also that a subtype specification is MANDATORY. There are no default subtypes. The type, subtype, and parameter names are not case sensitive. For example, TEXT, Text, and TeXt are all equivalent. Parameter values are normally case sensitive, but certain parameters are interpreted to be case- insensitive, depending on the intended use. (For example, multipart boundaries are case-sensitive, but the "access- type" for message/external-body is not case-sensitive.) Beyond this syntax, the only constraint on the definition of subtype names is the desire that their uses must not conflict. That is, it would be undesirable to have two different communities using "Content-Type: application/foobar" to mean two different things. The process of defining new content-subtypes, then, is not intended to be a mechanism for imposing restrictions, but simply a mechanism for publicizing the usages. There are, therefore, two acceptable mechanisms for defining new Content-Type subtypes: 1. Private values (starting with "X-") may be defined bilaterally between two cooperating agents without outside registration or INTERNET DRAFT Internet Message Body Format 10 standardization. 2. New "Standard" values must be documented, registered with, and approved by the Internet Assigned Numbers Authority (IANA) at ISI, by email to IANA@ISI.EDU. Where intended for public use, the formats they refer to should also be defined by a published specification, and possibly offered for standardization. The seven standard initial predefined Content-Types are detailed in the bulk of this document. They are: text -- textual information. The primary subtype, "plain", indicates plain (unformatted) text. No special software is required to get the full meaning of the text, aside from support for the indicated character set. Subtypes are to be used for enriched text in forms where application software may enhance the appearance of the text, but such software must not be required in order to get the general idea of the message. Possible subtypes thus include any readable word processor format. A very simple and portable subtype, richtext, is defined in this document. multipart -- data consisting of multiple parts of independent data types. Four initial subtypes are defined, including the default "mixed" subtype, "alternative" for representing the same data in multiple formats, "parallel" for parts intended to be viewed simultaneously, and "digest" for multipart messages in which each part is of type "message". message -- an encapsulated message. A body of Content-Type message is itself a fully formatted RFC 822 conformant message which may contain its own different Content-Type header field. The "partial" subtype is defined for partial messages, to permit the fragmented transmission of message bodies that are thought to be too large to be passed through mail transport facilities. Another subtype, "external-body", is defined for specifying large message bodies by reference to an external data source. application -- some other kind of data, typically either uninterpreted binary data or information to be processed by a mail-based application. The primary subtype, "octet-stream", is to be used in the case of uninterpreted binary data, in which case the simplest recommended action is to offer to write the information into a file for the user. Two additional subtypes, "ODA" and "PostScript", are defined for transporting ODA and PostScript documents in message bodies. Other expected uses INTERNET DRAFT Internet Message Body Format 11 for "application" include spreadsheets, data for mail-based scheduling systems, and languages for "active" (computational) email. image -- image data. Image requires a display device (such as a graphical display, a printer, or a FAX machine) to view the information. Initial subtypes are defined for several widely-used image formats, including jpeg, gif, G3fax, pbm, ppm, pgm, and TIFF-B-NetFax. The latter is recommended by the IETF Network Fax Working Group. audio -- audio data, with initial subtype "basic". Such messages contain information which requires an audio output device (such as a speaker or a telephone) to "display" the contents. video -- video data. Video requires the capability to display moving images, typically including specialized hardware and software. The initial subtype is "mpeg". Default RFC 822 messages are typed by this protocol as plain text in the US-ASCII character set, which can be explicitly specified as "Content-type: text/plain; charset=us-ascii". If no Content-Type is specified, either by error or by an older user agent, this default is assumed. In the presence of a MIME-Version header field, a receiving User Agent can also assume that plain US-ASCII text was the sender's intent. In the absence of a MIME-Version specification, plain US-ASCII text should still be assumed, but the sender's intent might have been otherwise. RATIONALE: In the absence of any Content-Type header field or MIME-Version header field, it is impossible to be certain that a message is actually text in the US-ASCII character set, since it might well be a message that, using the conventions that predate this document, includes non-textual data in a manner that cannot be automatically recognized (e.g., a uuencoded compressed UNIX tar file). Although there is no fully acceptable alternative to treating such untyped messages as "text/plain; charset=us- ascii", implementors should remain aware that if a message lacks both the MIME-Version and the Content-Type header fields, it may in practice contain almost anything. It should be noted that the list of Content-Type values given here may be augmented in time, via the mechanisms described above, and that the set of subtypes is expected to grow substantially. When a mail reader encounters mail with an unknown Content- type value, it should generally treat it as equivalent to "application/octet-stream", as described later in this INTERNET DRAFT Internet Message Body Format 12 document. 5 The Content-Transfer-Encoding Header Field Many Content-Types which could usefully be transported via e-mail are represented, in their "natural" format, as 8-bit character or binary data. Such data cannot be transmitted over some transport protocols. For example, both RFC 821 and RFC 822 restrict mail messages to 7-bit US-ASCII data with 1000 character lines. It is necessary, therefore, to define a standard mechanism for re-encoding such data into a 7-bit short-line format. this document specifies that such encodings will be indicated by a new "Content-Transfer-Encoding" header field. The Content-Transfer-Encoding field is used to indicate the type of transformation that has been used in order to represent the message body part in an acceptable manner for transport. Unlike Content-Types, a proliferation of Content-Transfer- Encoding values is undesirable and unnecessary. However, establishing only a single Content-Transfer-Encoding mechanism does not seem possible. There is a tradeoff between the desire for a compact and efficient encoding of largely-binary data and the desire for a readable encoding of data that is mostly, but not entirely, 7-bit data. For this reason, at least two encoding mechanisms are necessary: a "readable" encoding and a "dense" encoding. The Content-Transfer-Encoding field is designed to specify an invertible mapping between the "native" representation of a type of data and a representation that can be readily exchanged using 7 bit mail transport protocols, such as those defined by RFC 821 (SMTP). This field has not been defined by any previous standard. The field's value is a single token specifying the type of encoding, as enumerated below. Formally: Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE"/ "8BIT" / "7BIT" "BINARY" / x-token These values are not case sensitive. That is, Base64 and BASE64 and bAsE64 are all equivalent. An encoding type of 7BIT requires that the message is already in a seven-bit mail-ready representation. This is the default value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the Content-Transfer-Encoding header field is not present. The difference between "8bit" (or any other conceivable bit-width token) and the "binary" token is that "binary" does not require adherance to any limits on line length or to the SMTP CR/LF semantics, while the bit-width tokens do INTERNET DRAFT Internet Message Body Format 13 require such adherance. If the message contains data in any bit-width other than 7-bit, the appropriate bit-width Content-Transfer-Encoding token must be used (e.g., "8bit" for unencoded 8 bit wide data). If the message contains binary data, the "binary" Content-Transfer-Encoding token must be used. NOTE: The distinction between the Content- Transfer-Encoding values of "binary," "8bit," etc. may seem unimportant, in that all of them really mean "none" -- that is, there has been no encoding of the data for transport. However, clear labeling will be of enormous value to gateways between future mail transport systems with differing capabilities in transporting data that does not meet the restrictions of RFC 821 transport. As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies. Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet. However, in the event that 8-bit or binary mail transport becomes a reality in Internet mail, or when this document is used in conjunction with any other 8-bit or binary-capable transport mechanism, 8-bit or binary messages should be labelled as such using this mechanism. NOTE: The five values defined for the Content- Transfer-Encoding field imply nothing about the Content-Type other than the algorithm by which it was encoded or the transport system requirements if unencoded. Implementors may, if necessary, define new Content- Transfer-Encoding values, but must use an x-token, which is a name prefixed by "X-" to indicate its non-standard status, e.g., "Content-Transfer-Encoding: x-my-new-encoding". However, unlike Content-Types and subtypes, the creation of new Content-Transfer-Encoding values is explicitly and strongly discouraged, as it seems likely to hinder interoperability with little potential benefit. Their use is allowed only as the result of an agreement between cooperating user agents. If a Content-Transfer-Encoding header field appears as part of a message header, it applies to the entire message body. If a Content-Transfer-Encoding header field appears as part of a multipart message body part's headers, it applies only to the body part. If the body part is itself of type "multipart" or "message", the Content-Transfer-Encoding is INTERNET DRAFT Internet Message Body Format 14 not permitted to have any value other than a bit width (e.g., "7bit", "8bit", etc.) or "binary". It should be noted that email is character-oriented, so that the mechanisms described here are mechanisms for encoding arbitrary byte streams, not bit streams. If a bit stream is to be encoded via one of these mechanisms, it must first be converted to an 8-bit byte stream using the network standard bit order ("big-endian"), in which the earlier bits in a stream become the higher-order bits in a byte. A bit stream not ending at an 8-bit boundary should be padded with zeroes. This document provides a mechanism for noting the addition of such padding in the case of the application Content-Type, which has a "padding" parameter. The encoding mechanisms defined here explicitly encode all data in ASCII. Thus, for example, if a message has header fields such as: Content-Type: text/plain, charset=ISO-8859-1 Content-transfer-encoding: base64 This should be interpreted to mean that the message body (or body part) is a base64 ASCII encoding of data that was originally in ISO-8859-1, and will be in that character set again after decoding. The following sections will define the two standard encoding mechanisms. The definition of new content-transfer- encodings is explicitly discouraged and should only occur when absolutely necessary. All content-transfer-encoding namespace except that beginning with "X-" is explicitly reserved to the IANA for future use. Private agreements about content-transfer-encodings are also explicitly discouraged. Certain Content-Transfer-Encoding values may only be used on certain Content-Types. In particular, it is expressly forbidden to use any encodings other than "7bit", "8bit", or "binary" with any Content-Type that recursively includes other Content-Type fields, notably the "multipart" and "message" Content-Types. All encodings that are desired for bodies of type multipart or message must be done at the innermost level, by encoding the actual body part that needs to be encoded. NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using content-transfer- encodings on data of type multipart or message may seem overly restrictive, it is necessary to prevent nested encodings, in which data are passed through an encoding algorithm multiple times, and must be decoded multiple times in order to be properly viewed. Nested encodings add INTERNET DRAFT Internet Message Body Format 15 considerable complexity to user agents: aside from the obvious efficiency problems with such multiple encodings, they can obscure the basic structure of a message. In particular, they can imply that several decoding operations are necessary simply to find out what types of objects a message contains. Banning nested encodings may complicate the job of certain mail gateways, but this seems less of a problem than the effect of nested encodings on user agents. NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER-ENCODING: It may seem that the Content-Transfer-Encoding could be inferred from the characteristics of the Content-Type that is to be encoded, or, at the very least, that certain Content-Transfer-Encodings could be mandated for use with specific Content-Types. There are several reasons why this is not the case. First, given the varying types of transports used for mail, some encodings may be appropriate for some Content- Type/transport combinations and not for others. (For example, in an 8-bit transport, no encoding would be required for text in European character sets, while such encodings are clearly required for 7-bit SMTP.) Second, certain Content-Types may require different types of transfer encoding under different circumstances. For example, many PostScript messages might consist entirely of short lines of 7-bit data and hence require little or no encoding. Other PostScript messages (especially those using Level 2 PostScript's binary encoding mechanism) may only be resonably represented using a binary transport encoding. Finally, since Content-Type is intended to be an open-ended specification mechanism, strict specification of an association between Content- Types and encodings effectively couples the specification of an application protocol with a specific lower-level transport. This is not desirable since the developers of a Content-Type should not have to be aware of all the transports in use and what their limitations are. INTERNET DRAFT Internet Message Body Format 16 5.1 Quoted-Printable Content-Transfer-Encoding The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded are mostly ASCII text, the encoded form of the data remains largely recognisable by humans. A message which is entirely ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway. In this encoding, octets are to be represented as determined by the following rules: Rule #1: (General 8-bit representation) Any octet, except those indicating a line break according to the local newline convention, may be represented by an "=" followed by a two digit hexadecimal representation of the octet's value. The digits of the hexadecimal alphabet, for this purpose, are "0123456789ABCDEF". Uppercase letters must be used when sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt. Thus, for example, the value 12 (ACII carriage return) can be represented by "=0C", and the value 61 (ASCII EQUAL SIGN) can be represented by "=3D". Except when the following rules allow an alternative encoding, this rule is mandatory. Rule #2: (Literal representation) Octets with decimal values of 33 through 60 inclusive, and 62 through 126, inclusive, MAY be represented as the ASCII characters which correspond to those octets (EXCLAMATION POINT through LESS THAN, and GREATER THAN through TILDE, respectively). Rule #3: (White Space): Octets with values of 9 and 32 MAY be represented as ASCII TAB (HT) and SPACE characters, respectively, but MUST NOT be so represented at the end of an encoded line. Any TAB (HT) or SPACE characters on an encoded line MUST thus be followed on that line by a printable character. In particular, an "=" at the end of an encoded line, indicating a soft line break (see rule #5) may follow one or more TAB (HT) or SPACE characters. It follows that octets with values 9 and 32 appearing at the and of an encoded line must be represented according to Rule #1. This rule is necessary because some MTAs (Message Transport Agents, programs which transport messages from one user to another, or perform a part of such transfers) are known to pad lines of text with SPACEs, and others are known to remove "white space" INTERNET DRAFT Internet Message Body Format 17 characters from the end of a line. Therefore, when decoding a Quoted-Printable message, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents. Rule #4 (Line Breaks): A line break, whatever its representation is following the local newline convention, must be represented by a (RFC 822) line break, which is a CRLF sequence, in the Quoted- Printable encoding. If isolated CRs and LFs, or LF CR and CR LF sequences are allowed to appear in binary data according to local conventions, they must be represented using the "=0D", "=0A", "=0A=0D" and "=0D=0A" notations respectively. Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long. If longer lines are to be encoded with the Quoted-Printable encoding, 'soft' line breaks must be used. An equal sign as the last character on a encoded line indicates such a non-significant ('soft') line break in the encoded text. Thus if the "raw" form of the line is a single line that says: Now's the time for all folk to come to the aid of their country. This can be represented, in the Quoted-Printable encoding, as Now's the time = for all folk to come= to the aid of their country. This provides a mechanism with which long lines are encoded in such a way as to be restored by the user agent. The 76 character limit does not count the trailing CRLF, but counts all other characters, including any equal signs. Since the hyphen character ("-") is represented as itself in the Quoted-Printable encoding, care must be taken, when encapsulating a quoted-printable encoded message or body part in a multipart message, to ensure that the encapsulation boundary does not appear anywhere in the message. (A good strategy is to choose a boundary that includes a character sequence such as "=_" which can never appear in a quoted-printable body part. See the definition of multipart messages later in this document.) NOTE: The quoted-printable encoding represents something of a compromise between readability and reliability in transport. Message bodies encoded with the quoted-printable encoding will work INTERNET DRAFT Internet Message Body Format 18 reliably over most mail gateways, but may not work perfectly over a few gateways, notably those involving translation into EBCDIC. (In theory, an EBCDIC gateway could decode a quoted-printable message and re-encode it using base64, but such gateways do not yet exist.) A higher level of confidence is offered by the base64 Content- Transfer-Encoding. A way to get reasonably reliable transport through EBCDIC gateways is to also quote the ASCII characters !"#$@[]^`{}|~ \ according to rule #1. See Appendix B for more information. 5.2 Base64 Content-Transfer-Encoding The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form that is not humanly readable. The encoding and decoding algorithms are simple, but the encoded data are consistently only about 33 percent larger than the unencoded data. This encoding is based on the one used in Privacy Enhanced Mail applications, as defined in RFC 1113. The base64 encoding is adapted from RFC 1113, with one change: base64 eliminates the "*" mechanism for embedded clear text. A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.) NOTE: This subset has the important property that it is represented identically in all versions of ISO 646, including US ASCII, and all characters in the subset are also represented identically in all versions of EBCDIC. Other popular encodings, such as the encoding used by the UUENCODE utility and the base85 encoding specified as part of Level 2 PostScript, do not share these properties, and thus do not fulfill the portability requirements a binary transport encoding for mail must meet. The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single digit in the base64 alphabet. When encoding a bit stream via the base64 encoding, the bit stream should be presumed to be ordered with the most- significant-bit first. That is, the first bit in the stream will be the high-order bit in the first byte, and the eighth INTERNET DRAFT Internet Message Body Format 19 bit with be the low-order bit in the first byte, and so on. Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 1, below, are selected so as to be universally representable, and the set excludes characters with particular significance to SMTP (e.g., ".", "CR", "LF") and to the encapsulation boundaries defined in this document (e.g., "-"). Table 1: The Base64 Alphabet Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y The output stream (encoded bytes) must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances. Special processing is performed if fewer than 24 bits are available at the end of a message or encapsulated part of a message. A full encoding quantum is always completed at the end of a message. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Output character positions which are not required to represent actual input data are set to the character "=". Since all base64 input is an integral number of octets, only the following cases can arise: (1) the final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 INTERNET DRAFT Internet Message Body Format 20 characters with no "=" padding, (2) the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters, or (3) the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character. Note: There is no need to worry about quoting apparent encapsulation boundaries within base64- encoded parts of multipart messages, because no hyphen characters are used in the base64 encoding. 6 Additional Optional Content- Header Fields 6.1 Optional Content-ID Header Field In constructing a high-level user agent, it may be desirable to allow one message body part to make reference to another. Accordingly, message body parts may be labelled using the "Content-ID" header field, which is syntactically identical to the "Message-ID" header field: Content-ID := msg-id Like the Message-ID values, Content-ID values should be generated to be as unique as possible. 6.2 Optional Content-Description Header Field The ability to associate some descriptive information with a given body part is often desirable. For example, it may be useful to mark an "image" body part as "a picture of the Space Shuttle Endeavor." Such text may be placed in the Content-Description header field. Content-Description := *text The description is presumed to be given in the US-ASCII character set, although the mechanism specified in [RFC- HDRS] may be used for non-US-ASCII Content-Description values. INTERNET DRAFT Internet Message Body Format 21 7 The Predefined Content-Type Values this document defines seven initial Content-Type values and an extension mechanism for private or experimental types. Further standard types must be defined by new published specifications. It is expected that most innovation in new types of mail will take place as subtypes of the seven types defined here. The most essential characteristics of the seven content-types are summarized in Appendix G. 7.1 The Text Content-Type The text Content-Type is intended for sending material which is principally textual in form. It is the default Content- Type. A "charset" parameter may be used to indicate the character set of the body text. The primary subtype of text is "plain". This indicates plain (unformatted) text. The default Content-Type for Internet mail is "text/plain; charset=us-ascii". Beyond plain text, there are many formats for representing what might be known as "extended text" -- text with embedded formatting and presentation information. An interesting characteristic of many such representations is that they are to some extent readable even without the software that interprets them. It is useful, then, to distinguish them, at the highest level, from such unreadable data as images, audio, or text represented in an unreadable form. In the absence of appropriate interpretation software, it is reasonable to show subtypes of text to the user, while it is not reasonable to do so with most nontextual data. Such formatted textual data should be represented using subtypes of text. Plausible subtypes of text are typically given by the common name of the representation format, e.g., "text/richtext". 7.1.1 The charset parameter A critical parameter that may be specified in the Content- Type field for text data is the character set. This is specified with a "charset" parameter, as in: Content-type: text/plain; charset=us-ascii Unlike some other parameter values, the values of the charset parameter are NOT case sensitive. The default character set, which should be assumed in the absence of a charset parameter, is US-ASCII. An initial list of predefined character set names can be found at the end of this section. Additional character sets may be registered with IANA, although the standardization of their use requires the usual IAB review and approval. Note INTERNET DRAFT Internet Message Body Format 22 that if the specified character set includes 8-bit data, a Content-Transfer-Encoding header field and a corresponding encoding on the data are required in order to transmit the message via some mail transfer protocols, such as SMTP. The default character set, US-ASCII, has been the subject of some confusion and ambiguity in the past. Not only were there some ambiguities in the definition, there have been wide variations in practice. In order to elminate such ambiguity and variations in the future, it is strongly recommended that new user agents explicitly specify a character set via the Content-Type header field. "US-ASCII" does not indicate an arbitrary seven-bit character code, but specifies that the message body uses character coding that uses the exact correspondence of codes to characters specified in ASCII. National use variations of ISO 646 [ISO-646] are NOT ASCII and their use in Internet mail is explicitly discouraged. The omission of the ISO 646 character set is deliberate in this regard. The character set name of "US-ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only. The character set name "ASCII" is reserved and must not be used for any purpose. Note: RFC 821 explicitly specifies "ASCII", and references an earlier version of the American Standard rather than the international standard. Insofar as one of the purposes of specifying a Content-Type and character set is to permit the receiver to unambiguously determine how the sender intended the coded message to be interpreted, assuming anything other than "strict ASCII" as the default would risk unintentional and incompatible changes to the semantics of messages now being transmitted. This also implies that messages containing characters coded according to national variations on ISO 646, or using code-switching procedures (e.g., those of ISO 2022), as well as 8-bit or multiple octet character encodings MUST use an appropriate character set specification to be consistent with this specification. The complete US-ASCII character set is listed in [US-ASCII]. Note that the control characters including DEL (0-31, 127) have no defined meaning apart from the combination CRLF (ASCII values 13 and 10) indicating a new line. Two of the characters have de facto meanings in wide use: FF (12) often means "start subsequent text on the beginning of a new page"; and TAB or HT (9) often (though not always) means "move the cursor to the next available column after the current position where the column number is a multiple of 8 (counting the first column as column 0)." Apart from this, any use of the control characters or DEL in a message must be part of a private agreement between the sender and recipient. Such private agreements are discouraged and INTERNET DRAFT Internet Message Body Format 23 should be replaced by the other capabilities of this document. NOTE: Beyond US-ASCII, an enormous proliferation of character sets is possible. It is the opinion of the IETF working group that a large number of character sets is NOT a good thing. We would prefer to specify a single character set that can be used universally for representing all of the world's languages in electronic mail. Unfortunately, existing practice in several communities seems to point to the continued use of multiple character sets in the near future. For this reason, we define names for a small number of character sets for which a strong constituent base exists. It is our hope that ISO-10646 or some other effort will eventually define a single world character set which can then be specfied for use in Internet mail, but in the advance of that definition we cannot specify the use of ISO-10646, Unicode, or any other character set whose definition is, as of this writing, incomplete. The defined charset values are: US-ASCII -- as defined in [US-ASCII]. ISO-8859-X -- where "X" is to be replaced, as necessary, for the parts of ISO-8859 [ISO- 8859]. Note that the ISO-646 character sets have deliberately been omitted in favor of their 8859 replacements, which are the designated character sets for Internet mail. As of the publication of this document, the legitimate values for "X" are the digits 1 through 9, though a value of "10" is expected to be defined in 1992. ISO-2022-jp -- ISO-2022, as defined in [ISO-2022] specifies ways of designating and accessing character sets, rather than, itself, being a character set. Its use in mail will probably be strongly desired by communities who are already using it locally to handle multiple sets of characters and multi-byte characters. It appears necessary to explicitly specify the ISO-2022 methods that will be permitted in text mail so as to avoid the need for private agreements about, e.g., the specific character sets being used in messages. A specification corresponding to the existing practice of ISO-2022 use in Japan is included as Appendix F. INTERNET DRAFT Internet Message Body Format 24 Note that the character set used, if anything other than US-ASCII, must always be explicitly specified in the Content-Type field. The use of the string "ISO-10646" as a character set specification is hereby reserved for future use, once the ongoing efforts to define a standard universal character set are completed. No other character set name should be used in Internet mail without the publication of a formal specification and its registration with IANA, or by private agreement, in which case the character set name should begin with "x-". Parties wishing to use additional character sets and desiring to label them uniformly might wish to consult [RFC-CHAR], which names and defines a large number of additional character sets. Implementors who wish to use one of the character sets from that document should either publish a specification of its use in internet mail, or should prefix the character set name from [RFC-CHAR] with the characters "X-". Implementors are discouraged from defining new character sets for mail use unless absolutely necessary. The "charset" parameter has been defined primarily for the purpose of textual data, and is described in this section for that reason. However, it is conceivable that non- textual data might also wish to specify a charset value for some purpose, in which case the same syntax and values should be used. In general, mail-sending software should always use the "lowest common denominator" character set possible. For example, if a message contains only US-ASCII characters, it should be marked as being in the US-ASCII character set, not ISO-8859-1, which is a superset of US-ASCII. This will increase the chances that the recipient will be able to view the mail correctly. 7.1.2 The Text/richtext subtype In order to promote the wider interoperability of simple formatted text, this document defines an extremely simple subtype of "text", the "richtext" subtype. This subtype was designed to meet the following criteria: 1. The syntax must be extremely simple to parse, so that even teletype-oriented mail systems can easily strip away the formatting information and leave only the readable text. 2. The syntax must be extensible to allow for new formatting commands that are deemed essential. INTERNET DRAFT Internet Message Body Format 25 3. The capabilities must be extremely limited, to ensure that it can represent no more than is likely to be representable by the user's primary word processor. While this limits what can be sent, it increases the likelihood that what is sent can be properly displayed. 4. The syntax must be compatible with SGML, so that, with an appropriate DTD (Document Type Definition, the standard mechanism for defining a document type using SGML), a general SGML parser could be made to parse richtext. However, despite this compatibility, the syntax should be far simpler than full SGML, so that no SGML knowledge is required in order to implement it. The syntax of "richtext" is very simple. It is assumed, at the top-level, to be in the US-ASCII character set, unless of course a different charset parameter was specfied in the Content-type field. All characters represent themselves, with the exception of the "<" character (ASCII 60), which is used to begin a formatting command. Formatting instructions consist of formatting commands surrounded by angle brackets ("<>", ASCII 60 and 62). Each formatting command may be no more than 40 characters in length, all in US-ASCII, restricted to the alphanumeric and hyphen ("-") characters. Formatting commands that begin with a forward slash or solidus ("/", ASCII 47) are negations, and such negations must always exist to balance the initial opening commands. Thus, if the formatting command "" appears at some point, there must later be a "" to balance it. There are only three exceptions to this "balancing" rule: First, the command "" is used to represent a literal "<" character. Second, the command "" is used to represent a required line break. (Otherwise, CRLFs in the data are treated as equivalent to a single SPACE character.) Finally, the command "" is used to represent a page break. (NOTE: The 40 character limit on formatting commands does not include the "<", ">", or "/" characters that might be attached to such commands.) Initially defined formatting commands, not all of which will be implemented by all richtext implementations, include: Bold -- causes the subsequent text to be in a bold font. Italic -- causes the subsequent text to be in an italic font. Fixed -- causes the subsequent text to be in a fixed width font. Smaller -- causes the subsequent text to be in a smaller font. INTERNET DRAFT Internet Message Body Format 26 Bigger -- causes the subsequent text to be in a bigger font. Underline -- causes the subsequent text to be underlined. Center -- causes the subsequent text to be centered. FlushLeft -- causes the subsequent text to be left justified. FlushRight -- causes the subsequent text to be right justified. Indent -- causes the subsequent text to be indented at the left margin. IndentRight -- causes the subsequent text to be indented at the right margin. Outdent -- causes the subsequent text to be outdented at the left margin. OutdentRight -- causes the subsequent text to be outdented at the right margin. SamePage -- causes the subsequent text to be grouped, if possible, on one page. Subscript -- causes the subsequent text to be interpreted as a subscript. Superscript -- causes the subsequent text to be interpreted as a superscript. Heading -- causes the subsequent text to be interpreted as a page heading. Footing -- causes the subsequent text to be interpreted as a page footing. ISO-8859-X (for any value of X that is legal as a "charset" parameter) -- causes the subsequent text to be interpreted as text in the appropriate character set. US-ASCII -- causes the subsequent text to be interpreted as text in the US-ASCII character set. Excerpt -- causes the subsequent text to be interpreted as a textual excerpt from another source. Typically this will be displayed using indentation and an alternate font, but such decisions are up to the viewer. Paragraph -- causes the subsequent text to be interpreted as a single paragraph, with appropriate paragraph breaks (typically blank space) before and after. Signature -- causes the subsequent text to be interpreted as a message "signature". Some systems may wish to display signatures in a smaller font or otherwise set them apart from the main text of the message. Comment -- causes the subsequent text to be interpreted as a comment, and hence not shown to the reader. No-op -- has no effect on the subsequent text. lt -- is replaced by a literal "<" character. No balancing is required. INTERNET DRAFT Internet Message Body Format 27 nl -- causes a line break. No balancing is required. np -- causes a page break. No balancing is required. Each positive formatting command affects all subsequent text until the matching negative formatting command. Such pairs of formatting commands must be properly balanced and nested. Thus, a proper way to describe text in bold italics is: the-text or, alternately, the-text but, in particular, the following is illegal richtext: the-text NOTE: The nesting requirement for formatting commands imposes a slightly higher burden upon the composers of richtext messages, but potentially simplifies richtext displayers by allowing them to be stack-based. The main goal of richtext is to be simple enough to make multifont, formatted email widely readable, so that those with the capability of sending it will be able to do so with confidence. Thus slightly increased complexity in the composing software was deemed a reasonable tradeoff for simplified reading software. Nonetheless, implementors of richtext readers are encouraged to follow the general Internet guidelines of being conservative in what you send and liberal in what you accept. Those implementations that can do so are encouraged to deal reasonably with improperly nested richtext. Implementations must regard any unrecognized formatting command as equivalent to "No-op", thus facilitating future extensions to "richtext". Private extensions may be defined using formatting commands that begin with "X-", by analogy to Internet mail header field names. It is worth noting that no special behavior is required for the TAB (HT) character. It is recommended, however, that, at least when fixed-width fonts are in use, the common semantics of the TAB (HT) character should be observed, namely that it moves to the next column position that is a multiple of 8. (In other words, if a TAB (HT) occurs in column n, where the leftmost column is column 0, then that TAB (HT) should be replaced by 8-(n mod 8) SPACE characters.) INTERNET DRAFT Internet Message Body Format 28 Richtext also differentiates betweeen "hard" and "soft" line breaks. A line break (CRLF) in the richtext data stream is interpreted as a "soft" line break, one that is included only for purposes of mail transport, and is to be treated as white space by richtext interpreters. To include a "hard" line break (one that must be displayed as such), the "" or " formatting constructs should be used. In general, a soft line break should be treated as white space, but when soft line breaks immediately follow a or a tag they should be ignored rather than treated as white space. Putting all this together, the following "text/richtext" body fragment: Now is the time for all good men (and women>) to come to the aid of their beloved country. Stupid quote! -- the end represents the following formatted text (which will, no doubt, look cryptic in the text-only version of this document): Now is the time for all good men (and ) to come to the aid of their beloved country. -- the end Richtext conformance: A minimal richtext implementation is one that simply converts "" to "<", converts CRLFs to SPACE, converts to a newline according to local newline convention, removes everything between a command and the next balancing command, and removes all other formatting commands (all text enclosed in angle brackets). NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML: Richtext is decidedly not SGML, and must not be used to transport arbitrary SGML documents. Those who wish to use SGML document types as a mail transport format must define a new text or application subtype, e.g., "text/sgml-dtd- whatever" or "application/sgml-dtd-whatever", depending on the perceived readability of the DTD in use. Richtext is designed to be compatible with SGML, and specifically so that it will be possible to define a richtext DTD if one is INTERNET DRAFT Internet Message Body Format 29 needed. However, this does not imply that arbitrary SGML can be called richtext, nor that richtext implementors have any need to understand SGML; the description in this document is a complete definition of richtext, which is far simpler than complete SGML. NOTE ON THE INTENDED USE OF RICHTEXT: It is recognized that implementors of future mail systems will want rich text functionality far beyond that currently defined for richtext. The intent of richtext is to provide a common format for expressing that functionality in a form in which much of it, at least, will be understood by interoperating software. Thus, in particular, software with a richer notion of formatted text than richtext can still use richtext as its basic representation, but can extend it with new formatting commands and by hiding information specific to that software system in richtext comments. As such systems evolve, it is expected that the definition of richtext will be further refined by future published specifications, but richtext as defined here provides a platform on which evolutionary refinements can be based. Implementation note: In some environments, it might be impossible to combine certain richtext formatting commands, whereas in others they might be combined easily. For example, the combination of and might produce bold italics on systems that support such fonts, but there exist systems that can make text bold or italicized, but not both. In such cases, the most recently issued recognized formatting command should be preferred. One of the major goals in the design of richtext was to make it so simple that even text-only mailers will implement richtext-to-plain-text translators, thus increasing the likelihood that multifont text will become "safe" to use very widely. To demonstrate this simplicity, an extremely simple 35-line C program that converts richtext input into plain text output is included in Appendix D. INTERNET DRAFT Internet Message Body Format 30 7.2 The Multipart Content-Type In the case of multiple part messages, in which one or more different sets of data are combined in a single message, a "multipart" Content-Type field must appear in the RFC 822 message header. The message body must then contain one or more "body parts," each preceded by an encapsulation boundary, and the last one followed by a closing boundary. Each part starts with an encapsulation boundary, and then contains a body part consisting of header area, a blank line, and a body area. Thus a body part is similar to an RFC 822 message in syntax, but different in meaning. A body part is NOT to be interpreted as actually being an RFC 822 message. To begin with, NO header fields are actually required in body parts. A body part that starts with a blank line, therefore, is allowed and is a body part for which all default values are to be assumed. In such a case, the absence of a Content-Type header field implies that the encapsulation is plain US-ASCII text. The only header fields that have defined meaning for body parts are those the names of which begin with "Content-". All other header fields are generally to be ignored in body parts. Although they should generally be retained in mail processing, they may be discarded by gateways if necessary. Such other fields are permitted to appear in body parts only for ease of conversion between messages and body parts. "X-" fields may be created for experimental or private purposes, with the recognition that the information they contain may be lost at some gateways. The distinction between an RFC 822 message and a body part is subtle, but important. A gateway between Internet and X.400 mail, for example, must be able to tell the difference between a body part that consists of an image and a bodypart that consists of an encapsulated message, the body of which is an image. In order to represent the latter, the body part must have "Content-Type: message", and its body (after the blank line) must be the encapsulated message, with its own "Content-Type: image" header field. The use of similar syntax facilitates the conversion of messages to body parts, and vice versa, but the distinction between the two must be understood by implementors. (For the special case in which all parts actually are messages, a "digest" subtype is also defined.) As stated previously, each body part is preceded by an encapsulation boundary. The encapsulation boundary MUST NOT appear inside any of the encapsulated parts. Thus, it is crucial that the composing agent be able to choose and specify the unique boundary that will separate the parts. INTERNET DRAFT Internet Message Body Format 31 All present and future subtypes of the "multipart" type must use an identical syntax. Subtypes may differ in their semantics, and may impose additional restrictions on syntax, but must conform to the required syntax for the multipart type. This requirement ensures that all conformant user agents will at least be able to recognize and separate the parts of any multipart message, even of an unrecognized subtype. As stated in the definition of the Content-Transfer-Encoding field, no encoding other than "7bit", "8bit", or "binary" is permitted for messages or parts of type "multipart". The multipart delimiters and header fields are always 7-bit ASCII in any case, and data within the body parts can be encoded on a part-by-part basis, with Content-Transfer- Encoding fields for each appropriate body part. Mail gateways, relays, and other mail handling agents are commonly known to alter the top-level header of an RFC 822 message. In particular, they frequently add, remove, or reorder header fields. Such alterations are explicitly forbidden for the body part headers embedded in the bodies of messages of type "multipart." 7.2.1 Multipart: The common syntax All subtypes of "multipart" share a common syntax, defined in this section. A simple example of a multipart message also appears in this section. An example of a more complex multipart message is given in Appendix C. The Content-Type field for multipart messages requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter from the Content-Type header field. NOTE: The hyphens are for rough compatibility with the earlier RFC 934 method of message encapsulation, and for ease of searching for the boundaries in some implementations. However, it should be noted that multipart messages are NOT completely compatible with RFC 934 encapsulations; in particular, they do not obey RFC 934 quoting conventions for embedded lines that begin with hyphens. This mechanism was chosen over the RFC 934 mechanism because the latter causes lines to grow with each level of quoting. The combination of this growth with the fact that SMTP implementations sometimes wrap long lines made the RFC 934 mechanism unsuitable for use in the event that deeply-nested multipart structuring is ever desired. INTERNET DRAFT Internet Message Body Format 32 Thus, a typical multipart Content-Type header field might look like this: Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p This indicates that the message consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line --gc0p4Jq0M2Yt08jU534c0p Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain). NOTE: The CRLF preceding the encapsulation line is considered part of the boundary so that it is possible to have a part that does not end with a CRLF (newline). Body parts that must be considered to end with newlines, therefore, should have two CRLFs preceding the encapsulation line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary. The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart message must itself begin with a CRLF before the first encapsulation line. This is indeed how such messages should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such messages. Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens. The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line: INTERNET DRAFT Internet Message Body Format 33 --gc0p4Jq0M2Yt08jU534c0p-- There appears to be room for additional information prior to the first encapsulation boundary and following the final boundary. These areas should generally be left blank, and implementations should ignore anything that appears before the first boundary or after the last one. NOTE: These "preamble" and "epilogue" areas are not used because of the lack of proper typing of these parts and the lack of clear semantics for handling these areas at gateways, particularly X.400 gateways. NOTE: Because encapsulation boundaries must not appear in the body parts being encapsulated, a user agent must exercise care to choose a unique boundary. The boundary in the example above could have been the result of an algorithm designed to produce boundaries with a very low probability of already existing in the data to be encapsulated without having to prescan the data. Alternate algorithms might result in more 'readable' boundaries for a recipient with an old user agent, but would require more attention to the possibility that the boundary might appear in the encapsulated part. The simplest boundary possible is something like "---", with a closing boundary of "-----". As a very simple example, the following multipart message has two parts, both of them plain text, one of them explicitly typed and one of them implicitly typed: From: Nathaniel Borenstein To: Ned Freed Subject: Sample message MIME-Format: 1.0 Content-type: multipart/mixed; boundary="simple boundary" This is the preamble. It is to be ignored, though it is a handy place for mail composers to include an explanatory note to non-MIME compliant readers. --simple boundary This is implicitly typed plain ASCII text. It does NOT end with a linebreak. --simple boundary Content-type: text/plain; charset=us-ascii This is explicitly typed plain ASCII text. INTERNET DRAFT Internet Message Body Format 34 It DOES end with a linebreak. --simple boundary-- This is the epilogue. It is also to be ignored. The use of a Content-Type of multipart in a body part within another multipart message is explicitly allowed. In such cases, for obvious reasons, care must be taken to ensure that each nested multipart message must use a different boundary delimiter. See Appendix C for an example of nested multipart messages. The use of the multipart Content-Type with only a single body part may be useful in certain contexts, and is explicitly permitted. The only mandatory parameter for the multipart Content-Type is the boundary parameter, which consists of 1 to 70 characters from a set of characters known to be very robust through email gateways, and NOT ending with white space. (If a boundary appears to end with white space, the white space should be presumed to have been added by a gateway, and should be deleted.) It is formally specified by the following BNF: boundary := 0*69 bcharsnospace bchars := bcharsnospace / " " bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / "," / "-" / "." / "/" / ":" / "=" / "?" Overall, the body of a multipart message may be specified as follows: multipart-body := preamble 1*encapsulation close-delimiter epilogue encapsulation := delimiter CRLF part-encapsulation delimiter := CRLF "--" boundary ; taken from Content-Type field. ; There should be no space ; between "--" and boundary. close-delimiter := delimiter "--" ; Again, no space before "--" preamble := *text ; to be ignored upon receipt. epilogue := *text ; to be ignored upon receipt. part-encapsulation = <"message" as defined in RFC 822, INTERNET DRAFT Internet Message Body Format 35 with all header fields optional, and with the specified delimiter not occurring anywhere in the message body, either on a line by itself or as a substring anywhere. Note that the semantics of a part differ from the semantics of a message, as described in the text.> NOTE: Conspicuously missing from the multipart type is a notion of structured, related body parts. In general, it seems premature to try to standardize interpart structure yet. It is recommended that those wishing to provide a more structured or integrated multipart messaging facility should define a subtype of multipart that is syntactically identical, but that always expects the inclusion of a distinguished part that can be used to specify the structure and integration of the other parts, probably referring to them by their Content-ID field. If this approach is used, other implementations will not recognize the new subtype, but will treat it as the primary subtype (multipart/mixed) and will thus be able to show the user the parts that are recognized. 7.2.2 The Multipart/mixed (primary) subtype The primary subtype for multipart, "mixed", is intended for use when the body parts are independent and intended to be displayed serially. Any multipart subtypes that an implementation does not recognize should be treated as being of subtype "mixed". 7.2.3 The Multipart/alternative subtype The multipart/alternative type is syntactically identical to multipart/mixed, but the semantics are different. In particular, each of the parts is an "alternative" version of the same information. User agents should recognize that the content of the various parts are interchangable. The user agent should either choose the "best" type based on the user's environment and preferences, or offer the user the available alternatives. In general, choosing the best type means displaying only the LAST part that can be displayed. This may be used, for example, to send mail in a fancy text format in such a way that it can easily be displayed anywhere: From: Nathaniel Borenstein To: Ned Freed Subject: Formatted text mail Content-Type: multipart/alternative; boundary=boundary42 INTERNET DRAFT Internet Message Body Format 36 --boundary42 Content-Type: text/plain; charset=us-ascii ...plain text version of message goes here.... --boundary42 Content-Type: text/richtext .... richtext version of same message goes here ... --boundary42 Content-Type: text/x-whatever .... fanciest formatted version of same message goes here ... --boundary42-- In this example, users whose mail system understood the "text/x-whatever" format would see only the fancy version, while other users would see only the richtext or plain text version, depending on the capabilities of their system. In general, user agents that compose multipart/alternative messages should place the body parts in increasing order of preference, that is, with the preferred format last. For fancy text, the sending user agent should put the plainest format first and the richest format last. Receiving user agents should pick and display the last format they are capable of displaying. In the case where one of the alternatives is itself of type "multipart" and contains unrecognized sub-parts, the user agent may choose either to show that alternative, an earlier alternative, or both. NOTE: From an implementor's perspective, it might seem more sensible to reverse this ordering, and have the plainest alternative last. However, placing the plainest alternative first is the friendliest possible option when mutlipart/alternative messages are viewed using a non-compliant mail reader. While this approach does impose some burden on compliant mail readers, interoperability with older mail readers was deemed to be more important in this case. It may be the case that some user agents, if they can recognize more than one of the formats, will prefer to offer the user the choice of which format to view. This makes sense, for example, if mail includes both a nicely-formatted image version and an easily-edited text version. What is most critical, however, is that the user not automatically be shown multiple versions of the same data. Either the user should be shown the last recognized version or should explicitly be given the choice. 7.2.4 The Multipart/digest subtype INTERNET DRAFT Internet Message Body Format 37 This document defines a "digest" subtype of the multipart Content-Type. This type is syntactically identical to multipart/mixed, but the semantics are different. In particular, in a digest, the default Content-Type value for a body part is changed from "text/plain" to "message/rfc822". This is done to allow a more readable digest format that is largely compatible (except for the quoting convention) with RFC 934. A digest in this format might, then, look something like this: From: Moderator-Address Subject: Internet Digest, volume 42 Content-Type: multipart/digest; boundary="---- next message ----" ------ next message ---- From: someone-else Subject: my opinion ...body goes here ... ------ next message ---- From: someone-else-again Subject: my different opinion ... another body goes here... ------ next message ------ 7.2.5 The Multipart/parallel subtype This document defines a "parallel" subtype of the multipart Content-Type. This type is syntactically identical to multipart/mixed, but the semantics are different. In particular, in a parallel message, all of the parts are intended to be presented in parallel, i.e., simultaneously, on hardware and software that are capable of doing so. Composing agents should be aware that many mail readers will lack this capability and will show the parts serially in any event. INTERNET DRAFT Internet Message Body Format 38 7.3 The Message Content-Type It is frequently desirable, in sending mail, to encapsulate another mail message. For this common operation, a special Content-Type, "message", is defined. The primary subtype, message/rfc822, has no required parameters in the Content- Type field. Additional subtypes, "partial" and "external- body", do have required parameters. These subtypes are explained below. NOTE: It has been suggested that subtypes of message might be defined for forwarded or rejected messages. However, forwarded and rejected messages can be handled as multipart messages in which the first part contains any control or descriptive information, and a second part, of type message/rfc822, is the forwarded or rejected message. Composing rejection and forwarding messages in this manner will preserve the type information on the original message and allow it to be correctly presented to the recipient, and hence is strongly encouraged. As stated in the definition of the Content-Transfer-Encoding field, no encoding other than "7bit", "8bit", or "binary" is permitted for messages or parts of type "message". The message header fields are always US-ASCII in any case, and data within the body part can still be encoded, in which case the Content-Transfer-Encoding header field in the encapsulated message will reflect this. Non-ASCII text in the headers of an encapsulated message can be specified using the mechanisms described in [RFC-HDRS]. Mail gateways, relays, and other mail handling agents are commonly known to alter the top-level header of an RFC 822 message. In particular, they frequently add, remove, or reorder header fields. Such alterations are explicitly forbidden for the encapsulated headers embedded in the bodies of messages of type "message." 7.3.1 The Message/rfc822 (primary) subtype A Content-Type of "message/rfc822" indicates that the body or body part is an encapsulated message, with the syntax of an RFC 822 message. 7.3.2 The Message/Partial subtype A subtype of message, "partial", is defined in order to allow large objects to be delivered as several separate pieces of mail and automatically reassembled by the receiving user agent. (The concept is similar to IP fragmentation/reassembly in the basic Internet Protocols.) This mechanism can be used when intermediate transport INTERNET DRAFT Internet Message Body Format 39 agents limit the size of individual messages that can be sent. Content-Type "message/partial" thus indicates that the body or body part is a fragment of a larger message. Three parameters must be specified in the Content-Type field of type message/partial: The first, "id", is a unique identifier, as close to a world-unique identifier as possible, to be used to match the parts together. (In general, the identifier is essentially a message-id; if placed in double quotes, it can be any message-id, in accordance with the BNF for "parameter" given earlier in this specification.) The second, "number", an integer, is the part number, which indicates where this part fits into the sequence of fragments. The third, "total", another integer, is the total number of parts. This third subfield is required on the final part, and is optional on the earlier parts. Note also that these parameters may be given in any order. Thus, part 2 of a 3-part message may have either of the following header fields: Content-Type: Message/Partial; number=2; total=3 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; Content-Type: Message/Partial; id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; number=2 But part 3 MUST specify the total number of parts: Content-Type: Message/Partial; number=3; total=3 id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; Note that part numbering begins with 1, not 0. When the parts of a message broken up in this manner are put together, the result is a complete RFC 822 format message, which may have its own Content-Type header field, and thus may contain any other data type. Message fragmentation and reassembly: The semantics of a reassembled partial message should be those of the "inner" message, rather than of a message containing the inner message. This makes it possible, for example, to send a large audio message as several partial messages, and still have it appear to the recipient as a simple audio message rather than as an encapsulated message containing an audio message. That is, the encapsulation of the message is considered to be "transparent". INTERNET DRAFT Internet Message Body Format 40 When generating and reassembling the parts of a message/partial message, the headers of the encapsulated message must be merged with the headers of the enclosing messages. In this process the following rules should be observed: (1) All of the headers from the initial enclosing message (part one), except those that start with "Content-", should be copied, in order, to the new message. (2) Only those headers in the enclosed message which start with "Content-" should be appended, in order, to the headers of the new message. Any headers in the enclosed message which do not start with "Content-" will be ignored. (3) All of the headers from the second and any subsequent messages will be ignored. For example, if an audio message is broken into two parts, the first part might look something like this: X-Weird-Header-1: Foo From: Bill@host.com To: joe@otherhost.com Subject: Audio mail Message-ID: id1@host.com Content-type: message/partial; id="ABC@host.com"; number=1; total=2 X-Weird-Header-1: Bar X-Weird-Header-2: Hello Content-type: audio/basic Content-transfer-encoding: base64 ... first half of encoded audio data goes here... and the second half might look something like this: From: Bill@host.com To: joe@otherhost.com Subject: Audio mail Message-ID: id2@host.com Content-type: message/partial; id="ABC@host.com"; number=2; total=2 ... second half of encoded audio data goes here... Then, when the fragmented message is reassembled, the resulting message to be displayed to the user should look something like this: INTERNET DRAFT Internet Message Body Format 41 X-Weird-Header-1: Foo From: Bill@host.com To: joe@otherhost.com Subject: Audio mail Message-ID: id1@host.com Content-type: audio/basic Content-transfer-encoding: base64 ... first half of encoded audio data goes here... ... second half of encoded audio data goes here... 7.3.3 The Message/External-body' subtype The external-body subtype indicates that the actual body or body part data are not included, but merely referenced. In this case, the parameters describe a mechanism for accessing the external data. The only mandatory parameter is "access-type"; all of the other parameters may be mandatory or optional depending on the value of access-type. ACCESS-TYPE -- one or more case-insensitive words, comma-separated, indicating supported access mechanisms by which the file or data may be obtained. Values include, but are not limited to, "FTP", "ANON-FTP", "TFTP", "AFS", "LOCAL-FILE", and "MAIL-SERVER". (The value "ANON-FTP" is used to specify the FTP protocol with login "anonymous".) Future values should be registered with IANA in the same manner that Content-type and charset values are registered. For the "mail-server", "ftp" and "anon-ftp" access-types, the additional mandatory parameters are name and site. For the "afs" and "local-file" access types, only the name parameter is mandatory. NAME -- The name of a file or other token that can be used to reference the external body data. SITE -- a domain specifier for a machine or set of machines that are known to have access to the data file. Asterisks may be used for wildcard matching to a part of a domain name, such as "*.bellcore.com", to indicate a set of machines on which the data should be directly visible, while a single asterisk may be used to indicate a file that is expected to be universally available, e.g., via a global file system. EXPIRATION -- The date (with the RFC 822 "date- time" syntax) after which the existence of the external data is not guaranteed. INTERNET DRAFT Internet Message Body Format 42 DIRECTORY -- A directory from which the data named by NAME should be retrieved. This is particularly useful for the FTP access-type. MODE -- A transfer mode for retrieving the information, with access-type FTP. PERMISSION -- A field that indicates whether or not it is expected that clients might also attempt to overwrite the data. By default, or if permission is "read", the assumption is that they are not, and that if the data is retrieved once, it is never needed again. If PERMISSION is "read- write", this assumption is invalid, and any local copy should be considered no more than a cache. No other values of permission are defined here. With the emerging possibility of very wide-area file systems, it becomes very hard to know in advance the set of machines where a file will and will not be accessible directly from the file system. Therefore it may make sense to provide both a file name, to be tried directly, and the name of one or more sites from which the file is known to be accessible. An implementation can try to retrieve remote files using FTP or any other protocol, using anonymous file retrieval or prompting the user for the necessary name and password. If an external body is accessible via multiple mechanisms, the sender may include multiple parts of type message/external-body within a part of type multipart/alternative. However, the external-body mechanism is not intended to be limited to file retrieval, as shown by the mail-server access-type. Beyond this, one can imagine, for example, using a video server for external references to video clips. If a message is of type "message/external-body", then the body of the message will contain the header fields of the encapsulated message. The body itself is to be found in the external location. This means that if the body of the "message/external-body" message contains two consecutive CRLFs, everything after those pairs is NOT part of the message itself. For most message/external-body messages, this trailing area must simply be ignored. However, it is a convenient place for additional data that cannot be included in the content-type header field. In particular, if the "access-type" value is "mail-server", then the trailing area should contain commands to be sent to the mail server at the address given by NAME@SITE, where NAME and SITE are the values of the NAME and SITE parameters, respectively. The embedded message header fields which appear in the body of the message/external-body data can be used to declare the Content-type of the external body. Thus a complete INTERNET DRAFT Internet Message Body Format 43 message/external-body message, referring to a document in PostScript format, might look like this: From: Whomever Subject: whatever Content-Type: multipart/alternative; boundary=42 --42 Content-Type: message/external-body; name="BodyFormats.ps"; site="thumper.bellcore.com"; access-type = ANON-FTP; directory = "pub"; mode = "image"; expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" Content-type: application/postscript --42 Content-Type: message/external-body; name="/u/nsb/writing/rfcs/RFC-XXXX.ps"; site="thumper.bellcore.com"; access-type = AFS expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" Content-type: application/postscript --42 Content-Type: message/external-body; name="listserv"; site="bogus.bitnet"; access-type = mail-server expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" Content-type: application/postscript SEND-FILE: /u/nsb/writing/rfcs/RFC-XXXX.ps --42-- Like the message/partial type, the message/external-body type is intended to be transparent, that is, to convey the data type in the external body rather than to convey a message with a body of that type. Thus the headers on the outer and inner parts should be merged using the same rules as for message/partial. In particular, this means that the Content-type header is overridden, but the From and Subject headers are preserved. Note that since the external bodies are not transported as mail, they need not conform to the 7-bit and line length requirements, but might in fact be binary files. Thus a Content-Transfer-Encoding is not generally necessary, though it is permitted. INTERNET DRAFT Internet Message Body Format 44 7.4 The Application Content-Type The "application" Content-Type is to be used for data which do not fit in any of the other categories, and particularly for data to be proceessed by mail-based uses of application programs. This is information which must be processed by an application before it is viewable or usable to a user. Expected uses for Content-Type application include mail- based file transfer, spreadsheets, data for mail-based scheduling systems, and languages for "active" (computational) email. For example, a meeting scheduler might define a standard representation for information about proposed meeting dates. An intelligent user agent would use this information to conduct a dialog with the user, and might then send further mail based on that dialog. More generally, there have been several "active" messaging languages developed in which programs in a suitably specialized language are sent through the mail and automatically run in the recipient's environment. Such applications may be defined as subtypes of the "application" Content-Type. This document defines three subtypes: octet-stream, ODA, and PostScript. In general, the subtype of application will often be the name of the application for which the data are intended. This does not mean, however, that any application program name may be used freely as a subtype of application. Such usages must be registered with IANA, as described earlier in this document. 7.4.1 The Application/Octet-Stream subtype The primary subtype of application, "octet-stream", may be used to indicate that the body or body part of a message is binary data. The set of possible parameters includes, but is not limited to: NAME -- a suggested name for the binary data if stored as a file. TYPE -- the general type or category of binary data CONVERSIONS -- the set of operations that have been performed on the data before putting it in the mail (and before any Content-Transfer-Encoding that might have been applied). If multiple conversions have occurred, they must be separated by commas and specified in the order they were applied -- that is, the leftmost conversion must have occurred first, and conversions are undone INTERNET DRAFT Internet Message Body Format 45 from right to left PADDING -- the number of bits of padding that were appended to the bitstream comprising the actual contents to produce the enclosed byte-oriented data. This is useful for enclosing a bitstream in a message when the total number of bits is not a multiple of the byte size. The values for these attributes are left undefined at present, but may require specification in the future. An example of a common (though UNIX-specific) usage might be: Content-Type: application/octet-stream ; name=foo.tar.Z; type=tar; conversions= "encrypt,compress" However, it should be noted that the use of such conversions is explicitly discouraged due to a lack of portability and standardization. The use of uuencode is particularly discouraged, in favor of the Content-Transfer-Encoding mechanism, which is both more standardized and more portable across mail boundaries. The recommended action for an implementation that receives application/octet-stream mail is to simply offer to put the data in a file, with any Content-Transfer-Encoding undone, or perhaps to use it as input to a user-specified process. To reduce the danger of transmitting rogue programs through the mail, it is strongly recommended that implementations NOT implement a path-search mechanism whereby an arbitrary program named in the Content-Type parameter (e.g., an "interpreter=" parameter) is found and executed using the mail body as input. 7.4.2 The Application/PostScript subtype A Content-Type of "application/postscript" indicates a PostScript program. The language is defined in [POSTSCRIPT]. It is recommended that Postscript as sent through email should use Postscript document structuring conventions if at all possible, and correctly. The execution of general-purpose PostScript interpreters entails serious security risks, and implementors are discouraged from simply sending PostScript email bodies to "off-the-shelf" interpreters. While it is usually safe to send PostScript to a printer, where the potential for harm is greatly constrained, implementors should consider all of the following before they add interactive display of PostScript messages to their mail readers. INTERNET DRAFT Internet Message Body Format 46 The remainder of this section outlines some, though probably not all, of the possible problems with sending PostScript through the mail. Dangerous operations in the PostScript language include, but may not be limited to, the PostScript operators deletefile, renamefile, filenameforall, and file. File is only dangerous when applied to something other than standard input or output. Implementations may also define additional nonstandard file operators; these may also pose a threat to security. Filenameforall, the wildcard file search operator, may appear at first glance to be harmless. Note, however, that this operator has the potential to reveal information about what files the recipient has access to, and this information may itself be sensitive. Message senders should avoid the use of potentially dangerous file operators, since these operators are quite likely to be unavailable in secure PostScript implementations. Message- receiving and -displaying software should either completely disable all potentially dangerous file operators or take special care not to delegate any special authority to their operation. These operators should be viewed as being done by an outside agency when interpreting PostScript documents. Such disabling and/or checking should be done completely outside of the reach of the PostScript language itself; care should be taken to insure that no method exists for reenabling full-function versions of these operators. The PostScript language provides facilities for exiting the normal interpreter, or server, loop. Changes made in this "outer" environment are customarily retained across documents, and may in some cases be retained semipermanently in nonvolatile memory. The operators associated with exiting the interpreter loop have the potential to interfere with subsequent document processing. As such, their unrestrained use constitutes a threat of service denial. PostScript operators that exit the interpreter loop include, but may not be limited to, the exitserver and startjob operators. Message-sending software should not generate PostScript that depends on exiting the interpreter loop to operate. The ability to exit will probably be unavailable in secure PostScript implementations. Message-receiving and -displaying software should, if possible, disable the ability to make retained changes to the PostScript environment. Eliminate the startjob and exitserver commands. If these commands cannot be eliminated, at least set the password associated with them to a hard-to-guess value. PostScript provides operators for setting system-wide and device-specific parameters. These parameter settings may be retained across jobs and may potentially pose a threat to the correct operation of the interpreter. The PostScript operators that set system and device parameters include, but may not be limited to, the setsystemparams and setdevparams INTERNET DRAFT Internet Message Body Format 47 operators. Message-sending software should not generate PostScript that depends on the setting of system or device parameters to operate correctly. The ability to set these parameters will probably be unavailable in secure PostScript implementations. Message-receiving and -displaying software should, if possible, disable the ability to change system and device parameters. If these operators cannot be disabled, at least set the password associated with them to a hard-to-guess value. Some PostScript implementations provide nonstandard facilities for the direct loading and execution of machine code. Such facilities are quite obviously open to substantial abuse. Message-sending software should not make use of such features. Besides being totally hardware- specific, they are also likely to be unavailable in secure implementations of PostScript. Message-receiving and -displaying software should not allow such operators to be used if they exist. PostScript is an extensible language, and many, if not most, implementations of it provide a number of their own extensions. This document does not deal with such extensions explicitly since they constitute an unknown factor. Message-sending software should not make use of nonstandard extensions; they are likely to be missing from some implementations. Message-receiving and -displaying software should make sure that any nonstandard PostScript operators are secure and don't present any kind of threat. It is possible to write PostScript that consumes huge amounts of various system resources. It is also possible to write PostScript programs that loop infinitely. Both types of programs have the potential to cause damage if sent to unsuspecting recipients. Message-sending software should avoid the construction and dissemination of such programs, which is antisocial. Message-receiving and -displaying software should provide appropriate mechanisms to abort processing of a document after a reasonable amount of time has elapsed. In addition, PostScript interpreters should be limited to the consumption of only a reasonable amount of any given system resource. Finally, bugs may exist in some PostScript interpreters which could possibly be exploited to gain unauthorized access to a recipient's system. Apart from noting this possibility, there is no specific action to take to prevent this, apart from the timely correction of such bugs if any are found. 7.4.3 The Application/ODA subtype The "ODA" subtype of application is used to mark message bodies or parts as being information encoded according to INTERNET DRAFT Internet Message Body Format 48 the Office Document Architecture [ODA] standards. For application/oda, the Content-Type line should also specify an attribute/value pair that indicates the document application profile (DAP), using the key word "profile". Thus an appropriate header field might look like this: Content-Type: application/oda; profile=Q112 Consult the ODA standard [ODA] for further information. INTERNET DRAFT Internet Message Body Format 49 7.5 The Image Content-Type A Content-Type of "image" indicates that the body or body part contains an image. The subtype names the specific image format. These names are case insensitive. A few subtypes are "G3Fax" for Group Three Fax, "jpeg" for the JPEG format, "gif" for GIF format, and "pbm", "pgm", and "ppm" for the "portable bitmap" formats for black and white, grey scale, or color images. A special subtype is "tiff-b-netfax" which refers to the bi-level (e.g., fax) image file format proposed by the IETF Network Fax Working Group described in the Internet Draft [RFC-NETFAX]. The proposed format is basically TIFF-B with some restrictions and supports MMR, MR, and MH compression as well as uncompressed images. MMR compression is recommended where possible. A Content-Type of "image/pbm" indicates portable bitmap data (PBM) data encoded using the format described in the pbm(5) manual entry in the PBMPLUS sources, dated 91-09-21. Note that both the ASCII and RAWBITS formats are allowed (magic numbers P1 and P4). PBMPLUS is available via anonymous FTP at many sites. Similarly, Content-Types of "image/pgm" and "image/ppm" refer to the pgm(5) and ppm(5) manual entries (magic numbers P2/P5 and P3/P6, respectively). For further details on these formats, contact jef@well.sf.ca.us. A Content-Type of "image/g3fax" indicates a Group 3 (G3) Facsimile image. The encoding format is defined in the CCITT T.4 Recommendation. NOTE: The T.4 Recommendation defines two major encoding schemes: one-dimensional and two-dimensional. Practical experience shows that most implementations of g3fax encoding/decoding software implement one, but not both of these schemes. As such, use of this content is strongly discouraged. Instead, it is recommended that the sender should convert it to another format. The PBMPLUS package contains a program to convert a one-dimensional g3fax image to a PBM image. The ISODE package [ISODE] contains a program to convert a one- or two-dimensional g3fax image to a PBM image. This list of subtypes is neither exclusive nor exhaustive, and is expected to grow as more types are registered with IANA. For maximum interoperability within the Internet, the Network Fax Working Group recommends the use of "IMAGE/TIFF-B-NetFax" rather than "G3Fax." INTERNET DRAFT Internet Message Body Format 50 7.6 The Audio Content-Type A Content-Type of "audio" indicates that the body or body part contains audio data. Although there is not yet a consensus on an "ideal" audio format for use with computers, there is a pressing need for a format capable of providing interoperable behavior. The initial subtype of "basic" is specified to meet this requirement by providing an absolutely minimal lowest common denominator audio format. It is expected that richer formats for higher quality and/or lower bandwidth audio will be defined by a later document. The content of the "audio/basic" subtype is audio encoded using 8-bit ISDN u-law [PCM]. When this subtype is present, a sample rate of 8000 Hz and a single channel is assumed. 7.7 The Video Content-Type A Content-Type of "video" indicates that the body or body part contains a time-varying-picture image, possibly with color and coordinated sound. The term "video" is used extremely generically, rather than with reference to any particular technology or format, and is not meant to preclude subtypes such as animated drawings encoded compactly. The subtype "mpeg" refers to video coded according to the MPEG standard [MPEG]. Note that although in general this document strongly discourages the mixing of multiple media in a single body part, it is recognized that many so-called "video" formats include a representation for synchronized audio, and this is explicitly permitted for subtypes of "video". 7.8 Experimental Content-Type Values A Content-Type value beginning with the characters "X-" is a private value, to be used by consenting mail systems by mutual agreement. Any format without a rigorous and public definition must be named with an "X-" prefix, and publicly specified values shall never begin with "X-". (Older versions of the widely-used Andrew system use the "X-BE2" name, so new systems should probably choose a different name.) In general, the use of "X-" top-level types is strongly discouraged. Implementors should invent subtypes of the existing types whenever possible. The invention of new types is intended to be restricted primarily to the development of new media types for email, such as digital odors or holography, and not for new data formats in general. In many cases, a subtype of application will be more appropriate than a new top-level type. INTERNET DRAFT Internet Message Body Format 51 Summary Using the MIME-Version, Content-Type, and Content-Transfer- Encoding header fields, it is possible to include, in a standardized way, arbitrary types of data objects with RFC 822 conformant mail messages. No restrictions imposed by either RFC 821 or RFC 822 are violated, and care has been taken to avoid problems caused by additional restrictions imposed by the characteristics of some Internet mail transport mechanisms (see Appendix B). The "multipart" and "message" Content-Types allow mixing and hierarchical structuring of objects of different types in a single message. Further Content-Types provide a standardized mechanism for tagging messages or body parts as audio, image, or several other kinds of data. A distinguished parameter syntax allows further specification of data format details, particularly the specification of alternate character sets. Additional optional header fields provide mechanisms for certain extensions deemed desirable by many implementors. Finally, a number of useful Content-Types are defined for general use by consenting user agents, notably text/richtext, message/partial, and message/external-body. Contacts For more information, the authors of this document may be contacted via Internet mail: Nathaniel Borenstein Ned Freed INTERNET DRAFT Internet Message Body Format 52 Acknowledgements This document is the result of the collective effort of a large number of people, at several IETF meetings, on the IETF-SMTP and IETF-822 mailing lists, and elsewhere. Although any enumeration seems doomed to suffer from egregious omissions, the following are among the many contributors to this effort: Harald Tveit Alvestrand Vincent Lau Randall Atkinson Timo Lehtinen Philippe Brandon John R. MacMillan Kevin Carosso Rick McGowan Cristian Constantinof Leo Mclaughlin Mark Crispin Goli Montaser-Kohsari Dave Crocker Keith Moore Terry Crowley Tom Moore Walt Daniels Mark Needleman Frank Dawson John Noerenberg Hitoshi Doi Mats Ohrman Kevin Donnelly Julian Onions Keith Edwards Michael Patton Chris Eich David J. Pepper Johnny Eriksson Marshall T. Rose Craig Everhart Jonathan Rosenberg Patrik F.ltstr.m Jan Rynning Erik E. Fair Harri Salminen Roger Fajman Michael Sanderson Alain Fontaine Masahiro Sekiguchi James M. Galvin Mark Sherman Philip Gladstone Keld J/rn Simonsen Thomas Gordon Bob Smart Phill Gross Einar Stefferud James Hamilton Michael Stein Steve Hardcastle-Kille Klaus Steinberger David Herron Peter Svanberg Bruce Howard James Thompson Bill Janssen Steve Uhler Olle Jaernefors Stuart Vance Risto Kankkunen Erik van der Poel Phil Karn Guido van Rossum Alan Katz Peter Vanderbilt Tim Kehres Greg Vaudreuil Neil Katin Ed Vielmetti Anders Klemets Ryan Waldron John Klensin Sven-Ove Westberg Valdis Kletniek Brian Wideen Jim Knowles John Wobus Stev Knowles Glenn Wright Bob Kummerfeld Rayan Zachariassen Pekka Kytolaakso David Zimmerman The authors apologize for any omissions from this list, which are certainly unintentional. INTERNET DRAFT Internet Message Body Format 53 Appendix A -- Minimal MIME-Conformance The mechanisms described in this document are open-ended. It is definitely not expected that all implementations will implement all of the Content-Types described, nor that they will all share the same extensions. In order to promote interoperability, however, it is useful to define the concept of "MIME-conformance" to define a certain level of implementation that allows the useful interworking of messages with content that differs from US ASCII text. In this section, we specify the requirements for such conformance. An mail user agent that is MIME-conformant MUST: 1. Always generate a "MIME-Version: 1.0" header field. 2. Recognize the Content-Transfer-Encoding header field, and decode all received data encoded with either the quoted-printable or base64 implementations. Encode any data sent that is not in seven-bit mail-ready representation using one of these transformations and include the appropriate Content-Transfer-Encoding header field, unless the underlying transport mechanism supports non-seven-bit data, as SMTP does not. 3. Recognize and interpret the Content-Type header field, and avoid showing users raw data with a Content-Type field other than text. Be able to send at least text/plain messages, with the character set specified as a parameter if it is not US-ASCII. 4. Explicitly handle the following Content-Type values, to at least the following extents: Text: -- Recognize and display "text" mail with the character set "US-ASCII." -- Recognize other character sets at least to the extent of being able to inform the user about what character set the message uses. -- Recognize the "ISO-8859-*" character sets to the extent of being able to display those characters that are common to ISO-8859-* and US-ASCII, namely all characters represented by octet values 0-127. -- For unrecognized subtypes, show or offer to show the user the "raw" version of the data. An ability at INTERNET DRAFT Internet Message Body Format 54 least to convert "text/richtext" to plain text, as shown in Appendix D, is encouraged, but not required for conformance. Message: --Recognize and display at least the primary (822) encapsulation. Multipart: -- Recognize the primary (mixed) subtype. Display all relevant information on the message level and the body part header level and then display or offer to display each of the body parts individually. -- Recognize the "alternative" subtype, and avoid showing the user redundant pieces parts of multipart/alternative mail. -- Treat any unrecognized subtypes as if they were "mixed". Application: -- Offer the ability to remove either of the two types of Content-Transfer- Encoding defined in this document and put the resulting information in a user file. 5. Upon encountering any unrecognized Content- Type, an implementation must treat it as if it had a Content-Type of "application/octet-stream" with no parameter sub-arguments. How such data are handled is up to an implementation, but likely options for handling such unrecognized data include offering the user to write it into a file (decoded from its mail transport format) or offering the user to name a program to which the decoded data should be passed as input. Unrecognized predefined types, which in a MIME- conformant mailer might still include audio, image, or video, should also be treated in this way. A user agent that meets the above conditions is said to be MIME-conformant. The meaning of this phrase is that it is assumed to be "safe" to send virtually any kind of properly-marked data to users of such mail systems, because they will at least be able to treat the data as undifferentiated binary, and will not simply splash it onto the screen of unsuspecting users. There is another sense in which it is always "safe" to send data in a format that is MIME-conformant, which is that such data will not break or be broken by any known systems that are conformant with RFC 821 and RFC 822. User agents that are MIME-conformant INTERNET DRAFT Internet Message Body Format 55 have the additional guarantee that the user will not be shown data that were never intended to be viewed as text. INTERNET DRAFT Internet Message Body Format 56 Appendix B -- General Guidelines For Sending Email Data Internet email is not a perfect, homogeneous system. Mail may become corrupted at several stages in its travel to a final destination. Specifically, email sent throughout the Internet may travel across many networking technologies. Many networking and mail technologies do not support the full functionality possible in the SMTP transport environment. Mail traversing these systems is likely to be modified in such a way that it can be transported. There exist many widely-deployed non-conformant MTA's in the Internet. These MTA's, speaking the SMTP protocol, alter messages on the fly to take advantage of the internal data structure of the hosts they are implemented on, or are just plain broken. The following guidelines may be useful to anyone devising a data format (Content-Type) that will survive the widest range of networking technologies and known broken MTA's unscathed. Note that anything encoded in the base64 encoding will satisfy these rules, but that some well-known mechanisms, notably the UNIX uuencode facility, will not. Note also that anything encoded in the Quoted-Printable encoding will survive most gateways intact, but possibly not gateways to systems that use the EBCDIC character set. (1) Line delimiters other than CRLF may be used in the local representation of a message on some systems. The persistence of CRLF should not be relied on. (2) Isolated CR and LF characters are not well tolerated in general; they may be lost or converted to delimiters on some systems, and hence should not be relied on. (3) TAB (HT) characters may be misinterpreted or may be automatically converted to variable numbers of spaces. This is unavoidable in some environments, notably those not based on the ASCII character set. Such conversion is STRONGLY DISCOURAGED, but it may occur, and mail formats should not rely on the persistence of TAB (HT) characters. (4) Lines longer than 76 characters may be wrapped or truncated in some environments. Line wrapping and line truncation are STRONGLY DISCOURAGED, but unavoidable in some cases. Applications which require long lines should somehow differentiate between soft and hard line breaks. (A simple way to do this is to use the quoted-printable encoding.) INTERNET DRAFT Internet Message Body Format 57 (5) Trailing "white space" characters (SPACE, TAB (HT), etc.) on a line may be discarded by some transport agents, while other transport agents may pad lines with these characters so that all lines in a mail file are of equal length. The persistence of trailing white space, therefore, should not be relied on. (6) Many mail domains use variations on the ASCII character set, or use character sets such as EBCDIC which contain most but not all of the US- ASCII characters. The correct translation of characters not in the "invariant" set cannot be depended on across character converting gateways. For example, this situation is a problem when sending uuencoded information across BITNET, an EBCDIC system. Similar problems can occur without crossing a gateway, since many Internet hosts use character sets other than ASCII internally. The definition of Printable Strings in X.400 adds further restrictions in certain special cases. In particular, the only characters that are known to be consistent across all gateways are the 73 characters that correspond to the upper and lower case letters A-Z and a-z, the 10 digits 0-9, and the following eleven special characters: "'" (ASCII code 39) "(" (ASCII code 40) ")" (ASCII code 41) "+" (ASCII code 43) "," (ASCII code 44) "-" (ASCII code 45) "." (ASCII code 46) "/" (ASCII code 47) ":" (ASCII code 58) "=" (ASCII code 61) "?" (ASCII code 63) A maximally portable mail representation, such as the base64 encoding, will confine itself to relatively short lines of text in which the only meaningful characters are taken from this set of 73 characters. Please note that the above list is NOT a list of recommended practices for MTA's. RFC 821 MTA's are prohibited from altering the character of white space or wrapping long lines. These BAD and illegal practices are known to occur on established networks, and implementions should be robust in dealing with the bad effects they can cause. INTERNET DRAFT Internet Message Body Format 58 Appendix C -- A Complex Multipart Example What follows is the outline of a complex multipart message. This message has five parts to be displayed serially: two introductory plain text parts, an embedded multipart message, a richtext part, and a closing encapsulated text message in a non-ASCII character set. The embedded multipart message has two parts to be displayed in parallel, a picture and an audio fragment. MIME-Version: 1.0 From: Nathaniel Borenstein Subject: A multipart example Content-Type: multipart/mixed; boundary=unique-boundary-1 This is the preamble area of a multipart message. Mail readers that understand multipart format should ignore this preamble. If you are reading this text, you might want to consider changing to a mail reader that understands how to properly display multipart messages. --unique-boundary-1 ...Some text appears here... [Note that the preceding blank line means no header fields were given and this is text, with charset US ASCII. It could have been done with explicit typing as in the next part.] --unique-boundary-1 Content-type: text/plain; charset=US-ASCII This could have been part of the previous part, but illustrates implicit versus explicit typing of body parts. --unique-boundary-1 Content-Type: multipart/parallel; boundary=unique-boundary-2 --unique-boundary-2 Content-Type: audio/basic Content-Transfer-Encoding: base64 ... base64-encoded 8000 Hz single-channel u-law-format audio data goes here.... --unique-boundary-2 Content-Type: image/tiff-b-netfax Content-Transfer-Encoding: Base64 INTERNET DRAFT Internet Message Body Format 59 ... base64-encoded image data goes here.... --unique-boundary-2-- --unique-boundary-1 Content-type: text/richtext This is richtext. Isn't it cool? --unique-boundary-1 Content-Type: message/rfc822 From: (name in US-ASCII) Subject: (subject in US-ASCII) Content-Type: Text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: Quoted-printable ... Closing text in ISO-8859-1 goes here ... --unique-boundary-1-- INTERNET DRAFT Internet Message Body Format 60 Appendix D -- A Richtext-to-Text Translator in C One of the major goals in the design of the richtext subtype of the text Content-Type is to make formatted text so simple that even text-only mailers will implement richtext-to- plain-text translators, thus increasing the likelihood that multifont text will become "safe" to use very widely. To demonstrate this simplicity, what follows is an extremely simple 43-line C program that converts richtext input into plain text output: #include #include main() { int c, i; char token[50]; while((c = getc(stdin)) != EOF) { if (c == '<') { for (i=0; (c = getc(stdin)) != '>' && c != EOF; ++i) { token[i] = isupper(c) ? tolower(c) : c; } if (c == EOF) break; token[i] = NULL; if (!strcmp(token, "lt")) { putc('<', stdout); } else if (!strcmp(token, "nl")) { putc('\n', stdout); } else if (!strcmp(token, "/paragraph")) { puts("\n\n", stdout); } else if (!strcmp(token, "comment")) { int commct=1; while (commct > 0) { while ((c = getc(stdin)) != '<' && c != EOF) ; if (c == EOF) break; for (i=0; (c = getc(stdin)) != '>' && c != EOF; ++i) { token[i] = isupper(c) ? tolower(c) : c; } if (c== EOF) break; token[i] = NULL; if (!strcmp(token, "/comment")) -- commct; if (!strcmp(token, "comment")) ++commct; } } /* Ignore all other tokens */ } else if (c != '\n') { putc(c, stdout); INTERNET DRAFT Internet Message Body Format 61 } } putc('\n', stdout); /* for good measure */ } It should be noted that one can do considerably better than this in displaying richtext data on a dumb terminal. In particular, one can replace font information such as "bold" with textual emphasis (like *this* or _T_H_I_S_). One can also properly handle the richtext formatting commands regarding indentation, justification, and others. However, the above program is all that is *necessary* in order to present richtext on a dumb terminal. INTERNET DRAFT Internet Message Body Format 62 Appendix E -- Collected Grammar This appendix contains the complete BNF grammar for all the syntax specified by this document. By itself, however, this grammar is incomplete. It refers to several entities that are defined by RFC 822. Rather than reproduce those definitions here, and risk unintentional differences between the two, this document simply refers the reader to RFC 822 for the remaining definitions. Wherever a term is undefined, it refers to the RFC 822 definition. attribute := token MIME-Version := 1*token boundary := 0*69 bcharsnospace bchars := bcharsnospace / " " bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / "," / "-" / "." / "/" / ":" / "=" / "?" close-delimiter := delimiter "--" Content-Description := *text Content-ID := msg-id Content-Transfer-Encoding := "BASE64" / "QUOTED- PRINTABLE"/ "8BIT" / "7BIT" "BINARY" / x-token Content-Type:= type "/" subtype *[";" parameter] delimiter := CRLF "--" boundary ; taken from Content-Type field. ; There should be no space ; between "--" and boundary. encapsulation := delimiter CRLF part-encapsulation epilogue := text ; to be ignored upon receipt. part-encapsulation = <"message" as defined in RFC 822, with all header fields optional, and with the specified delimiter not occurring anywhere in the message body, either on a line by itself or as a substring anywhere.> multipart-body := preamble 1*encapsulation close-delimiter epilogue INTERNET DRAFT Internet Message Body Format 63 parameter := attribute "=" value preamble := text ; to be ignored upon receipt. subtype := token token := 1* tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in / "," / ";" / ":" / "\" / <"> ; quoted-string, / "/" / "[" / "]" / "?" / "." ; to use within / "=" ; parameter values type := "application" / "audio" / "image" / "message" / "multipart" / "text" / "video" / x-token value := token / quoted-string x-token := INTERNET DRAFT Internet Message Body Format 64 Appendix F -- ISO-2022-jp This appendix briefly describes the existing practice for the use of ISO-2022 in Japanese electronic mail. This description is for informational purposes only, and is not intended to guide an implementation of ISO-2022-jp mail senders or readers. ISO-2022 is a scheme for switching between multiple character sets. Japanese usage of it uses ASCII as a base character set. Thus it is possible that a message labelled as text/iso-2022-jp is entirely in US-ASCII, and just showing the text as if it were ASCII is not an unreasonable strategy for an implementation that does not really understand ISO-2022-jp. A better, nearly-minimal strategy would be to at least inform the user of a character set shift when escape sequences are encountered. In ISO-2022-jp, announcer sequences ESC 2/0 4/1, ESC 2/0 4/8 and ESC 2/0 4/10 are implicitly assumed. These announcer sequences must not appear as a part of the data. (They mean "Use G0 only and no locking-shifts are allowed," "Use 94 characters sets only" and "Use 7-bits.") Designation sequences ESC 2/8 4/2, ESC 2/8 4/10, ESC 2/4 4/0 and ESC 2/4 4/2 are allowed. No other designation sequences are allowed. Each escape sequence designates: ESC 2/8 4/2: ASCII. ESC 2/8 4/10: Left half of JIS X0201. (Japanese version of 646) ESC 2/4 4/0: JIS C6226-1978; so-called "Old JIS Kanji." ESC 2/4 4/2: JIS X0208-1983; so-called "New JIS Kanji.") No other escape sequences are allowed. At the beginning, ESC 2/8 4/2 or ESC 2/8 4/10 may be omitted. As an exception to 2022, the following rule is applied: Three bit combinations 0/10, 0/13 and 2/0 can appear only when a single byte coded character set is designated to G0. That is, these bit combinations must not appear when a multi byte coded character set is designated to G0. (This rule forces that each line starts and terminates in a "single byte designation state" and that the space is handled as if it is a member of 94 character single byte graphics sets.) INTERNET DRAFT Internet Message Body Format 65 Appendix G -- Summary of the Seven Content-types Content-type: text Subtypes defined by this document: plain, richtext Important Parameters: charset Encoding notes: quoted-printable generally preferred if an encoding is needed and the character set is mostly an ASCII superset. Security considerations: Rich text formats such as TeX and Troff often contain mechanisms for executing arbitrary commands or file system operations, and should not be used automatically unless these security problems have been addressed. Even plain text may contain control characters that can be used to exploit the capabilities of "intelligent" terminals and cause security violations. User interfaces designed to run on such terminals should be aware of and try to prevent such problems. ________________________________________________________________ Content-type: multipart Subtypes defined by this document: mixed, alternative, digest, parallel. Important Parameters: boundary Encoding notes: No content-transfer-encoding is permitted. ________________________________________________________________ Content-type: message Subtypes defined by this document: rfc822, partial, external-body Important Parameters: id, number, total Encoding notes: No content-transfer-encoding is permitted. ________________________________________________________________ Content-type: application Subtypes defined by this document: Octet-stream, PostScript, ODA Important Parameters: profile INTERNET DRAFT Internet Message Body Format 66 Encoding notes: base64 generally preferred for octet-stream or other unreadable subtypes. Security considerations: This type is intended for the transmission of data to be interpreted by locally-installed programs. If used, for example, to transmit executable binary programs or programs in general-purpose interpreted languages, such as LISP programs or shell scripts, severe security problems could result. In general, authors of mail-reading agents are cautioned against giving their systems the power to execute mail-based application data without carefully considering the security implications. While it is certainly possible to define safe application formats and even safe interpreters for unsafe formats, each interpreter should be evaluated separately for possible security problems. ________________________________________________________________ Content-type: image Subtypes defined by this document: tiff-b-netfax, g3fax, jpeg, gif, pbm, pgm, ppm Important Parameters: none Encoding notes: base64 generally preferred ________________________________________________________________ Content-type: audio Subtypes defined by this document: basic Important Parameters: none Encoding notes: base64 generally preferred ________________________________________________________________ Content-type: video Subtypes defined by this document: mpeg Important Parameters: none Encoding notes: base64 generally preferred INTERNET DRAFT Internet Message Body Format 67 References [US-ASCII] Coded Character Set--7-Bit American Standard Code for Information Interchange, ANSI X3.4-1986. [ATK] Borenstein, Nathaniel S., Multimedia Applications Development with the Andrew Toolkit, Prentice-Hall, 1990. [ISO-10646] Working document for ISO/IEC Draft International Standard 10646-1. ISO/IEC / JTC1 / SC2 / WG2, document N 745. 27 September 1991. [ISO-2022] International Standard--Information Processing-- ISO 7-bit and 8-bit coded character sets--Code extension techniques, ISO 2022:1986. [ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No. 5, ISO 8859-9, 1990. [ISO-646] International Standard--Information Processing-- ISO 7-bit coded character set for information interchange, ISO 646:1983. [ISODE] Rose, Marshall T., Julian P. Onions, and Colin J. Robbins, "The ISO Development Environment: User's Manual", Version 7.0, X-Tel Services Ltd., July, 1991. [MPEG] Video Coding Draft Standard ISO 11172 CD, ISO IEC/TJC1/SC2/WG11 (motion Picture Experts Group), May, 1991. [ODA] ISO 8613; Information Processing: Text and Office System; Office Document Architecture (ODA) and Interchange Format (ODIF), Part 1-8, 1989. [PCM] CCITT, Fascicle III.4 - Recommendation G.711, Geneva, 1972, "Pulse Code Modulation (PCM) of Voice Frequencies". [POSTSCRIPT] Adobe Systems, Inc., PostScript Language Reference Manual, Addison-Wesley, 1985. [UNICODE] The Unicode Standard. Worldwide Character Encoding. Version 1.0. The Unicode Consortium, 1991. [X400] Schicker, Pietro, "Message Handling Systems, X.400", Message Handling Systems and Distributed Applications, E. Stefferud, O-j. Jacobsen, and P. Schicker, eds., North- INTERNET DRAFT Internet Message Body Format 68 Holland, 1989, pp. 3-41. [RFC-1049] Sirbu, M.A. Content-Type header field for Internet messages. March, 1988, Network Information Center, RFC-1049. [RFC-1113] Linn, J. Privacy enhancement for Internet electronic mail: Part I - message encipherment and authentication procedures (Draft). August, 1989, Network Information Center, RFC-1113. [RFC-1154] Robinson, D.; Ullmann, R. Encoding header field for internet messages. April, 1990, Network Information Center, RFC-1154. [RFC-821] Postel, J.B. Simple Mail Transfer Protocol. August, 1982, Network Information Center, RFC-821. [RFC-822] Crocker, D. Standard for the format of ARPA Internet text messages. August, 1982, Network Information Center, RFC-822. [RFC-934] Rose, M.T.; Stefferud, E.A. Proposed standard for message encapsulation. January, 1985, Network Information Center, RFC-934. [RFC-HDRS] Moore,Keith, "Representation of Non-Ascii Text in Internet Message Headers", Internet Draft, draft-ietf- 822ext-msghead-01.txt. [RFC-CHAR] Simonsen, Keld, "Character Mnemonics and Character Sets", Internet Draft draft-ietf-822ext-charsets- 01.txt. [RFC-NETFAX] Katz, A., Cohen, D., "File Format for the Transmission of Images in the Internet", Internet Draft, draft-ietf-netfax-netimage-00.txt. INTERNET DRAFT Internet Message Body Format 69 Table of Contents 1 Introduction....................................... 3 2 Notations, Conventions, and Generic BNF Grammar.... 5 3 The MIME-Version Header Field...................... 6 4 The Content-Type Header Field...................... 7 5 The Content-Transfer-Encoding Header Field......... 12 5.1 Quoted-Printable Content-Transfer-Encoding......... 16 5.2 Base64 Content-Transfer-Encoding................... 18 6 Additional Optional Content- Header Fields......... 20 6.1 Optional Content-ID Header Field................... 20 6.2 Optional Content-Description Header Field.......... 20 7 The Predefined Content-Type Values................. 21 7.1 The Text Content-Type.............................. 21 7.1.1 The charset parameter.............................. 21 7.1.2 The Text/richtext subtype.......................... 24 7.2 The Multipart Content-Type......................... 30 7.2.1 Multipart: The common syntax...................... 31 7.2.2 The Multipart/mixed (primary) subtype.............. 35 7.2.3 The Multipart/alternative subtype.................. 35 7.2.4 The Multipart/digest subtype....................... 36 7.2.5 The Multipart/parallel subtype..................... 37 7.3 The Message Content-Type........................... 38 7.3.1 The Message/rfc822 (primary) subtype............... 38 7.3.2 The Message/Partial subtype........................ 38 7.3.3 The Message/External-body' subtype................. 41 7.4 The Application Content-Type....................... 44 7.4.1 The Application/Octet-Stream subtype............... 44 7.4.2 The Application/PostScript subtype................. 45 7.4.3 The Application/ODA subtype........................ 47 7.5 The Image Content-Type............................. 49 7.6 The Audio Content-Type............................. 50 7.7 The Video Content-Type............................. 50 7.8 Experimental Content-Type Values................... 50 Summary............................................ 51 Contacts........................................... 51 Acknowledgements................................... 52 Appendix A -- Minimal MIME-Conformance............. 53 Appendix B -- General Guidelines For Sending Email Data56 Appendix C -- A Complex Multipart Example.......... 58 Appendix D -- A Richtext-to-Text Translator in C... 60 Appendix E -- Collected Grammar.................... 62 Appendix F -- ISO-2022-jp.......................... 64 Appendix G -- Summary of the Seven Content-types... 65 References......................................... 67 From BNB@MATH.AMS.COM Fri Jan 10 10:07:47 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AB25728; Fri, 10 Jan 92 10:07:43 MST Date: Fri 10 Jan 92 08:19:04-EST From: bbeeton Subject: Re: FWD: MIME (Multimedia Mail draft standard) To: DHOSEK@HMCVAX.CLAREMONT.EDU Cc: tex-implementors@MATH.AMS.COM Message-Id: <695049544.0.BNB@MATH.AMS.COM> In-Reply-To: <01GF4RATLLWW9KM242@HMCVAX.CLAREMONT.EDU> Mail-System-Version: don, please don't *ever* send a file of that size through tex-implementors! for one thing, it stands a very good chance of getting corrupted by some of the gateways it must travel through to some of the recipients, and for another, i'm not sure the mailer here will accept files that large for remailing outside, so damage may already have been done. finally, as i understand it, some of the nodes, particularly in europe, to which tex-implementors mail is sent have to pay by the number of bytes in their traffic. while i agree that many addressees on this list will be interested in your request, that is by no means universal, and the sheer size of the file constitutes an abuse of the list. if you have a question of this sort, please write a concise summary, with an offer to send a complete file to those who ask for it explicitly, and a location from which it can be retrieved by ftp. i am very sorry to be a grouch about this, but you have already broken *my* mail file! -- bb ------- From @MATH.AMS.COM,@lilserv.citilille.fr:yannis@citil.citilille.fr Fri Jan 10 19:06:35 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@lilserv.citilille.fr:yannis@citil.citilille.fr> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA29630; Fri, 10 Jan 92 19:06:31 MST Received: from lilserv.citilille.fr by MATH.AMS.COM via SMTP with TCP; Fri, 10 Jan 92 20:17:44-EDT Received: from citil.citilille.fr by lilserv.citilille.fr Sat, 11 Jan 1992 02:12:25 +0100 Received: from FR*0*CITILILLE by citilille.fr via QTFS with X.400; Sa, 11 Jan 92 02:13:37 +0100 X400-Trace: FR*0*CITILILLE; arrival Sa, 11 Jan 92 02:13:36 +0100 action Relayed Date: Sa, 11 Jan 92 02:13:36 +0100 Message-Id: <5C010B020A02028A-CITICDC*yannis@citil.citilille.fr> P1-Message-Id: FR*0*CITILILLE; 5C010B020A02028A-CITICDC Ua-Content-Id: 5C010B020A02028A From: yannis@citil.citilille.fr Subject: at last, virtual fonts for the Macintosh To: tex-implementors@math.ams.com *********************************************** * At last... virtual fonts for the Macintosh! * *********************************************** January 10, 1992 Good news for Macintosh TeXers: Peter Breitenlohner's DVIcopy has been ported to the Macintosh OS, as a standalone application called MacDVIcopy. This application can be used with OzTeX or Textures (through the utility DVItool). * What is DVIcopy? DVIcopy allows ``devirtualization'' (replacement of virtual characters by their real components) of DVI files. For example, if you need accented characters and only have CM fonts, you can easily write a virtual font where characters are composed with accents in order to produce accented characters. OzTeX will run a file using this virtual font as if it were a real one. Before previewing, you will run MacDVIcopy on the DVI file produced by OzTeX. MacDVIcopy will create a new DVI file where composed characters will be replaced by their components. * Why on earth should I do that? Since this operation happens on the DVI level, TeX will hyphenate your text as if it were actually using accented characters. In this way non-English texts can be correctly hyphenated while using CM fonts---and without the necessity of TeX extensions like MLTeX. Also it will save you a lot of space since pk files tend to grow and a lot of money since some companies have the bad habit of selling separately different rearrangements of the same PostScript font... * Do I need MPW to compile MacDVIcopy? No. MacDVIcopy has been compiled using Think Pascal version 3. Sources and resources are included in the package. * Which version is available? Currently available version is alpha, please test it extensively and report bugs and missing features to Yannis Haralambous, bitnet: yannis@frcitl81. When the code reaches the final stage, a WEB changefile will be written. * Where can I get it? You can get the alpha version of MacDVIcopy by anonymous ftp at spi.ens.fr (IP 129.199.104.3) in the directory ``pub/mac/hqx'' cd pub/mac/hqx get MacDVIcopy-alpha.sea.hqx quit Once binhexed, the package is compacted in an autoexpandable form by Compact Pro (just double-click on it's icon). Documentation is provided in English, French and Greek. A big thank to Peter Breitenlohner for making this beautiful program and distributing it on public domain, together with the necessary changefile for PC (which was the starting point for the Mac implementation). From @MATH.AMS.COM,@CBROWN.CLAREMONT.EDU:DHOSEK@HMCVAX.CLAREMONT.EDU Mon Jan 13 00:45:53 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@CBROWN.CLAREMONT.EDU:DHOSEK@HMCVAX.CLAREMONT.EDU> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA11020; Mon, 13 Jan 92 00:45:50 MST Received: from CBROWN.CLAREMONT.EDU by MATH.AMS.COM via SMTP with TCP; Mon, 13 Jan 92 02:37:25-EDT Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GF8VYU4CKKAC2XH4@HMCVAX.CLAREMONT.EDU>; Sun, 12 Jan 1992 23:34 PST Date: Sun, 12 Jan 1992 23:34 PST From: Don Hosek Subject: Re: FWD: MIME (Multimedia Mail draft standard) To: schrod@iti.informatik.th-darmstadt.de, tex-implementors@math.ams.com Message-Id: <01GF8VYU4CKKAC2XH4@HMCVAX.CLAREMONT.EDU> X-Vms-To: IN%"schrod@iti.informatik.th-darmstadt.de" X-Vms-Cc: TEX_IMPLEMENTORS Regarding precisely what is wanted for the definition of text-tex, etc. is to indicate external references which might not be indicated in the source (e.g., lplain.fmt for a LaTeX file, etc.) Yes, I'm being vague... I only talked briefly with Ned about what was necessary since I have little time to deal with these things myself. Queries should probably be best sent directly to Ned Freed at ned@ymir.claremont.edu -dh And apologies for not checking the size of the message before forwarding it. -dh From @MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de Mon Jan 20 04:58:11 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA10986; Mon, 20 Jan 92 04:58:08 MST Received: from ifi.informatik.uni-stuttgart.de by MATH.AMS.COM via SMTP with TCP; Mon, 20 Jan 92 06:48:50-EDT Received: from azu.informatik.uni-stuttgart.de by ifi.informatik.uni-stuttgart.de with SMTP; Mon, 20 Jan 92 12:48:01 +0100 From: Bernd Raichle Date: Mon, 20 Jan 92 12:48:20 +0100 Message-Id: <9201201148.AA27284@azu.informatik.uni-stuttgart.de> Received: by azu.informatik.uni-stuttgart.de; Mon, 20 Jan 92 12:48:20 +0100 To: tex-implementors@math.ams.com Subject: Question: Exists this bug in YOUR TeX implementation??? If your TeX implementation uses the routines |more_name| (\S516) and |scan_file_name| (\S526) without or with small changes only, a user can specify filenames with any possible character (except Space), e.g. \font\test=t{es^t or \input foo^^e4bar^^Ybaz.tex Without additional changes of \S519 |pack_file_name| and \S523 |pack_buffered_name| (converts all chars in the filename to the external charset, no(!!) "expansion" of unprintable characters to ^^xy notation) and the `open file' calls in \S27 (|a_open_in| et.al), the function |reset| and |rewrite| are called with filenames containing special characters!!! In the worst case, your system or the TeX program can crash. In other cases files with "funny" characters in their filenames are created or wrong files are accessed. Example: If the next example is TeX'ed with a web-to-C implementation, (with xchr[0] = 0), the file `ztest.tex' is read and read and read and.... ------------------------------ CUT HERE % File: ztest.tex \catcode`\{=1\catcode`\}=2\catcode`\^=7 \catcode0=12 % set up to read TeX ^^@ char \message{Before input:} \input ztest^^@foo % this fails \newlinechar=-1 \input ztest.tex^^@foo % <<==== \message{Never reached?} \end ------------------------------ CUT HERE And here is the log (TeX, C Version 3.14a is based on web2c-5.84b with the changes of DEK's alpha-version 3.141): ------------------------------ START of ztest.log This is TeX, C Version 3.14a (INITEX) 19 JAN 1992 18:03 **ztest (ztest.tex Before input: ! I can't find file `ztest foo'. l.6 \input ztest^^@foo % this fails Please type another input file name: null (null.tex) (ztest.tex Before input: ! I can't find file `ztest^^@foo'. l.6 \input ztest^^@foo % this fails Please type another input file name: ! Emergency stop. l.6 \input ztest^^@foo % this fails End of file on the terminal! No pages of output. ------------------------------ END of ztest.log Fix: include checks in |a_open_in|, ... for disallowed characters in |name_of_file|, before |reset| or |rewrite| is called. -Bernd Raichle (raichle@azu.informatik.uni-stuttgart.de) From @MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CET1@phoenix.cambridge.ac.uk Mon Jan 20 17:05:59 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CET1@phoenix.cambridge.ac.uk> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA15305; Mon, 20 Jan 92 17:05:56 MST Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.COM via SMTP with TCP; Mon, 20 Jan 92 18:53:06-EDT Received: from phoenix.cambridge.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <12010-2@sun2.nsfnet-relay.ac.uk>; Mon, 20 Jan 1992 14:53:16 +0000 Date: Mon, 20 Jan 92 14:58:17 GMT From: Chris Thompson To: tex-implementors@math.ams.com Subject: Re: Question: Exists this bug in YOUR TeX implementation??? Message-Id: Bernd Raichle writes > Fix: include checks in |a_open_in|, ... for disallowed characters in > |name_of_file|, before |reset| or |rewrite| is called. Consider as an alternative the addition of tests in |pack_file_name|, which is where one would do (say) uppercasing of file names for those filing systems that require it. This is what I do in my implementation for MVS: the output of |pack_file_name| is two strings replacing |name_of_file| ("ddname" and "PDS member name" --- this is MVS-specific of course) and a boolean indicating whether invalid characters were found (in which case the strings are to be ignored). |(a|b|w)_open_ (in|out)| indicate failure immediately if the boolean is so set. The same mechanism can provide a guard against empty file components: how well does your implementation cope with the following? \input .tex \input / % or whatever you use as an area delimiter \input \relax % and the same with \input replaced by \openout If you use this method, remember to also set the boolean appropriately when reading the pool file (section 51) and the format file (section 523/524). Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From @MATH.AMS.COM,@bulldog.CS.YALE.EDU:lrw!leichter@LRW.COM Tue Jan 21 05:41:09 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@bulldog.CS.YALE.EDU:lrw!leichter@LRW.COM> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA21537; Tue, 21 Jan 92 05:41:06 MST Received: from bulldog.CS.YALE.EDU by MATH.AMS.COM via SMTP with TCP; Tue, 21 Jan 92 07:31:56-EDT Received: from lrw.UUCP by bulldog.CS.YALE.EDU via UUCP; Tue, 21 Jan 1992 07:29:42 -0500 Message-Id: <199201211229.AA18518@bulldog.CS.YALE.EDU> Received: by lrw.UUCP (DECUS UUCP w/Smail); Tue, 21 Jan 92 07:05:07 EDT Date: Tue, 21 Jan 92 07:05:07 EDT From: Jerry Leichter To: TEX-IMPLEMENTORS@MATH.AMS.COM Subject: Re: Question: Exists this bug in YOUR TeX implementation??? X-Vms-Mail-To: UUCP%"CET1@phoenix.cambridge.ac.uk" I believe it is a mistake for applications such as TeX to try to correct for undesireable operating system features. It just leads to gratuitous incompa- tibilities. Like it or not, the Unix world has decided that every character is valid in a file name, with the exception of "/" (which separates components) and ASCII 0 (which indicates the end of the name). Reasonable people don't put silly things like newlines into their file names, but that's up to them. If a single application tries to limit the files that it will allow access to, it will inevitably run into users who complain that they REALLY WANTED to access that file with a newline in the name - why won't TeX let them? You can't correct this kind of misdesign on a piece-at-a-time basis; you only make things worse. Decent operating systems provide a way for applications to avoid knowing what constitutes a proper file name. For example, VMS provides file spec parsing routines. Properly written VMS applications consider file specifi- cations to be opaque objects, passed through them from the user to the parsing routines. This way, the syntax of acceptable file specifications can be changed without changing applications - as has happened several times in the past. The closest one has to this on Unix is to take the string and pass it to stat(). (Or one can simply try to open the file, I suppose.) One still has to deal with adding "area" specifications, e.g., when looking for inputs. Fortunately, one can usually view those as simple prefixes. (Note that on VMS the parsing routines will handle that for you - and if you try to do it yourself, as many early VMS implementations did, you will NOT get it "right" - i.e., the results will not be what VMS users expect based on similar operations in other VMS programs.) -- Jerry From @MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CET1@phoenix.cambridge.ac.uk Thu Jan 23 22:06:08 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@sun2.nsfnet-relay.ac.uk:CET1@phoenix.cambridge.ac.uk> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25196; Thu, 23 Jan 92 22:06:04 MST Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.COM via SMTP with TCP; Thu, 23 Jan 92 23:46:54-EDT Received: from sun2.nsfnet-relay.ac.uk by MATH.AMS.COM via SMTP with TCP; Thu, 23 Jan 92 23:48:16-EDT Received: from phoenix.cambridge.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <13832-0@sun2.nsfnet-relay.ac.uk>; Thu, 23 Jan 1992 16:20:20 +0000 Date: Thu, 23 Jan 92 16:25:05 GMT From: Chris Thompson To: tex-implementors@math.ams.com Subject: Re: Question: Exists this bug in YOUR TeX implementation??? Message-Id: Jerry Leichter writes > Like it or not, the Unix world has decided that every character is > valid in a file name, with the exception of "/" (which separates > components) and ASCII 0 (which indicates the end of the name). > Reasonable people don't put silly things like newlines into their > file names, but that's up to them. But the ASCII 0 is precisely the point of Bernd Raichle's ^^@ example, surely? As an embedded null in the path argument to open(2) will cause the rest of the string to be ignored, TeX has to know this to keep its knowledge of the filename in step with the operating system's. It would, of course, be preferable to pass the string-with-length to the system and have it do the syntax check, but if it can't, you must. Unix allows filenames with embedded spaces. The "natural" Unix convention would require TeX `file names' (arguments to \input, etc) to be terminated by null, not space, and it would be perfectly possible to modify |more_name| to do this. But no-one would seriously recommend this, just in order to support all possible Unix file names, would they? (Of course, this does suggest a horrible Unix-specific hack: convert nulls to spaces in the file name to solve both problems at once!) Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From @MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de Fri Jan 24 05:24:52 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA00487; Fri, 24 Jan 92 05:24:48 MST Received: from ifi.informatik.uni-stuttgart.de by MATH.AMS.COM via SMTP with TCP; Fri, 24 Jan 92 07:11:34-EDT Received: from azu.informatik.uni-stuttgart.de by ifi.informatik.uni-stuttgart.de with SMTP; Fri, 24 Jan 92 13:10:32 +0100 From: Bernd Raichle Date: Fri, 24 Jan 92 13:10:58 +0100 Message-Id: <9201241210.AA01736@azu.informatik.uni-stuttgart.de> Received: by azu.informatik.uni-stuttgart.de; Fri, 24 Jan 92 13:10:58 +0100 To: tex-implementors@math.ams.com In-Reply-To: Chris Thompson's message of Thu, 23 Jan 92 16:25:05 GMT Subject: Re: Question: Exists this bug in YOUR TeX implementation??? Chris Thompson writes: > But the ASCII 0 is precisely the point of Bernd Raichle's ^^@ example, > surely? As an embedded null in the path argument to open(2) will cause ^^^^^^ Yes! > the rest of the string to be ignored, TeX has to know this [...] >... My intension of the posting and the very exotic example is: Implementors should keep in mind, that the prototype code in TeX.web can fail for their implementation. If an implementor doesn't include additional code to restrict the possible characters in a filename, document this!! My example "fails" on all(?)/most implementations and operating systems written in C. Jerry Leichter wrote: > Like it or not, the Unix world has decided that every character is valid in a > file name, with the exception of "/" [...] and ASCII 0 [...]. There are Unix impl. with an "explicit" or "implicit" restriction that characters in filenames have codes in the range 0--127. "Implicit" means: if chars outside this range are used in a filename, they are converted (e.g. to code 127) or ignored _without_ an error! Chris Thompson writes: > Unix allows filenames with embedded spaces. The "natural" Unix > convention would require TeX `file names' (arguments to \input, etc) > to be terminated by null, not space, and it would be perfectly possible > to modify |more_name| to do this. But no-one would seriously recommend ^^^^^^^^^^^^^^^^^^^^^ > this, just in order to support all possible Unix file names, would they? Perhaps it is time to make a minimal standard for |scan_file_name|? Proposal (quick&dirty): 1. A `TeX file name' ends at the first space character or the first non-expandable primitive, ... (implementation-independent) 2. After the `file name' is scanned, decide, if the scanned character sequence is a complete and correct filename specification, then try to open it. (impl.-dependent) -bernd __________________________________________________________________________ Bernd Raichle, Student der Universit"at Stuttgart | "Le langage est source home: Stettener Str. 73, D-W-7300 Esslingen, FRG | de malentendus" email: raichle@azu.informatik.uni-stuttgart.de | (A. de Saint-Exupery) From @MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de Fri Jan 24 09:03:19 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@ifi.informatik.uni-stuttgart.de:raichle@azu.informatik.uni-stuttgart.de> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA01223; Fri, 24 Jan 92 09:03:15 MST Received: from ifi.informatik.uni-stuttgart.de by MATH.AMS.COM via SMTP with TCP; Fri, 24 Jan 92 10:51:21-EDT Received: from azu.informatik.uni-stuttgart.de by ifi.informatik.uni-stuttgart.de with SMTP; Fri, 24 Jan 92 16:49:22 +0100 From: Bernd Raichle Date: Fri, 24 Jan 92 16:49:48 +0100 Message-Id: <9201241549.AA18032@azu.informatik.uni-stuttgart.de> Received: by azu.informatik.uni-stuttgart.de; Fri, 24 Jan 92 16:49:48 +0100 To: tex-implementors@math.ams.com In-Reply-To: "Wayne G. Sullivan"'s message of Fri, 24 Jan 92 13:35:29 GMT <9201241545.AA08768@ifi.informatik.uni-stuttgart.de> Subject: Re: weird characters in file names Wayne G. Sullivan writes: > One useful change to TeX.WEB > would be modifying the procedure `print_file_name' to use slow_print instead ^^^^^^^^^^^^^^^ (and at other places, where filenames are printed.) > of print, so that offending file names could be seen in the log file. This will be fixed in TeX 3.141. -bernd From @MATH.AMS.COM,@CUNYVM.CUNY.EDU:WSULIVAN@IRLEARN.UCD.IE Fri Jan 24 09:03:24 1992 Flags: 000000000001 Return-Path: <@MATH.AMS.COM,@CUNYVM.CUNY.EDU:WSULIVAN@IRLEARN.UCD.IE> Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AB01223; Fri, 24 Jan 92 09:03:20 MST Message-Id: <9201241603.AB01223@math.utah.edu> Received: from CUNYVM.CUNY.EDU by MATH.AMS.COM via SMTP with TCP; Fri, 24 Jan 92 10:34:59-EDT Received: from IRLEARN.UCD.IE by CUNYVM.CUNY.EDU (IBM VM SMTP V2R2) with BSMTP id 7994; Fri, 24 Jan 92 08:57:51 EST Received: from IRLEARN.UCD.IE (WSULIVAN) by IRLEARN.UCD.IE (Mailer R2.08) with BSMTP id 1681; Fri, 24 Jan 92 13:59:32 GMT Date: Fri, 24 Jan 92 13:35:29 GMT From: "Wayne G. Sullivan" Subject: weird characters in file names To: tex-implementors@math.ams.com It seems to me that it is the job of the operating system to determine which sequences of characters constitute a valid file name. Why should TeX implementors have to overcome bugs in the operating system? Sadly, it is necessary. Sometime ago our VM/CMS version of METAFONT wrote out file names to the disk directories in lower case, and these required intervention by systems people to erase. PascalVS should not have opened the files that way. It would be a good idea if users could give examples of nasty behavior of their operating system regarding file creation, so that implementors could decide which of these merit special care. One useful change to TeX.WEB would be modifying the procedure `print_file_name' to use slow_print instead of print, so that offending file names could be seen in the log file. Wayne Sullivan From BNB@MATH.AMS.COM Wed Feb 5 06:44:08 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA24674; Wed, 5 Feb 92 06:44:04 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GG5JNMLHQCA23QTI@MATH.AMS.COM>; Wed, 5 Feb 1992 08:38 EST Date: Wed, 5 Feb 1992 08:38:16 -0500 (EST) From: Barbara Beeton Subject: [yannis@gat.citilille.fr (yannis): for the TeX-implementors lis] To: tex-implementors@MATH.AMS.COM Message-Id: <697297096.120000.BNB@MATH.AMS.COM> Mail-System-Version: i am forwarding this message for yannis haralambous. (note that he has a new address.) replies direct to him or the list, please. by the way, it's supposed to be possible for anyone to send messages directly to the list -- tex-implementors@math.ams.com . if anyone has problems with that, please let me know right away. -- bb --------------- Date: Wed, 5 Feb 92 12:56:16 +0100 From: yannis@gat.citilille.fr (yannis) Subject: for the TeX-implementors list Hi! I would like to include TeXXeT in a TeX3.14 implementation. Is the code of the original DEK-MacKay paper still compatible? or must I change it? Thanks Yannis From DHOSEK@HMCVAX.CLAREMONT.EDU Wed Feb 5 19:53:41 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA29370; Wed, 5 Feb 92 19:53:38 MST Received: from CBROWN.CLAREMONT.EDU by MATH.AMS.COM (PMDF #12735) id <01GG6BAJBGLSA23L85@MATH.AMS.COM>; Wed, 5 Feb 1992 21:49 EST Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GG64X3XW7SB7BKRS@HMCVAX.CLAREMONT.EDU>; Wed, 5 Feb 1992 18:46 PST Date: Wed, 5 Feb 1992 18:46 PST From: Don Hosek Subject: Re: [yannis@gat.citilille.fr (yannis): for the TeX-implementors lis] To: BNB@MATH.AMS.COM, tex-implementors@MATH.AMS.COM Cc: yannis@gat.citilille.fr Message-Id: <01GG64X3XW7SB7BKRS@HMCVAX.CLAREMONT.EDU> X-Vms-To: IN%"BNB@MATH.AMS.COM" X-Vms-Cc: IN%"yannis@gat.citilille.fr",TEX_IMPLEMENTORS The TeX-XeT code originally published will work, but there's a straightforward enhancement which will make it much better: I've not actually coded anything but can outline it clearly enough so someone with the necessary knowledge of tex.web and (more importantly) time could implement it. In short, the facilities of ivd2dvi can be incorporated into the write-the-dvi file stage of TeX so that TeX-XeT writes ordinary DVI files. Given my lack of time to touch this since I originally outlined the code (in June 1990, no less!), it's unlikely that I'll ever get a chance to write this, but someone else is welcome to get my ideas should they care to implement them. -dh From BNB@MATH.AMS.COM Sat Mar 14 13:51:52 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25424; Sat, 14 Mar 92 13:51:47 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GHN1DZ7780BESP6M@MATH.AMS.COM>; Sat, 14 Mar 1992 15:37 EST Date: Sat, 14 Mar 1992 15:37:51 -0500 (EST) From: Barbara Beeton Subject: Messages from DEK, part 1 of 5 To: tex-implementors@MATH.AMS.COM Message-Id: <700605471.690000.BNB@MATH.AMS.COM> Mail-System-Version: Date: 14 March 91 Message No: 036 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 1 The latest package of marked-up messages from DEK was postmarked 15 January 92. It contained 10 checks, including two for actual bugs. Unfortunately, other matters have conspired to delay my forwarding of all this material, and I apologize. (I don't suppose anyone out there would like to be a TUGboat production assistant? Or has lots of experience in moving from one office to another with less shelf space without losing track of anything? Didn't think so, somehow.) As some of you may know, DEK was awarded an honorary doctorate by the Royal Institute of Technology in Stockholm last November. At that time, some bug reports and other questions were apparently given to him in person, and changes have been delayed as explained in this note which accompanied the package: Barbara: Many thanks for this compilation! I will not however be ready to update the sources until February because I mistakenly put some things recd in Sweden into the wrong box and they are currently being shipped via _sea_ _mail_. I have gone through this material but will hold off updating the master sources until I've gotten through some other things sent to me in Stockholm from Poland, some of which may be extremely important I have just finished checking the /tex area at labrea. There are no new files in the errata directory, and as these are usually the first to be installed, I assume that the main update has not yet been made. ######################################################################## Contents: [ part 1 ] >>> responses to DEK Sep 26 package >>> DEK's alpha test, TeX 3.14a >>> TeXbook errata >>> inconsistent ligature/kerning behavior >>> inconsistent expansion of \string, etc.; \newlinechar >>> \halign + box assignment within displays >>> positive/negative xn_over_d, tex.web section 107 >>> DRF's "remaining comments" >>> suggestion -- additional tracing facilities for \if* >>> DEK's "local" change files, not at labrea [ part 2 ] >>> glue_ratio overflow >>> \TeX logo; some problems with \loop >>> cedilla accent macro in plain.tex >>> missing \mathsurround in plain.tex >>> \arrowvert code, plain.tex >>> MF -- bugs or features? (also, some ACP errata) >>> MFbook errata >>> WEAVE -- Peter Breitenlohner >>> WEAVE -- Brian {Hamilton Kelly} [ sent separately ] >>> Barry Smith, |history| variable in TeX.WEB >>> Chris Rowley, two items: \fontdimen2 and marks ************************************************************************ >>> responses to DEK Sep 26 package Date: Mon, 23 Dec 91 13:23:58 +0100 From: Eberhard Mattes Subject: Backenfisch, tex.web > [ dek: should i have used `backenfisch' instead? ] Maybe deleting `compound' on p. 96 would be better since I have found no word with ck hyphenated k-k *because* it is a compound word. (Of course there are words containing tt hyphenated tt-t *because* they are compound words.) [ dek: point well taken ... but I wasn't serious and i _did_ delete `compound' in September as he requested ] --- In tex.web, there's a @^system dependencies@> missing for the line [ dek: \S798 ] if n>max_quarterword then confusion("256 spans"); {this can happen, but won't} ^^^ This is system-dependent. [ dek: $2.56 ] Yours, Eberhard Mattes (mattes@azu.informatik.uni-stuttgart.de) <<< end responses to DEK Sep 26 package ************************************************************************ >>> DEK's alpha test, TeX 3.14a Date: Tue 24 Sep 91 11:22:21-EST From: bbeeton To: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk, tex%uk.ac.cranfield.rmcs@nsfnet-relay.ac.uk, pzf5hz@ruipc1e.Bitnet Subject: message from dek -- tex alpha version 3.14a chris, brian, and frank, you are hereby deputized to follow dek's instructions and test out the new version. [ ... ] i won't forward this information to anyone else [for at least two weeks], and you probably should be discreet about it too. thanks. cheers. -- bb -------------------- Date: Fri, 20 Sep 91 12:50:54 -0700 Subject: note from Don [ copy of message from dek ] ------- Date: Thu, 26 SEP 91 22:31:01 BST From: TEX@rmcs.cranfield.ac.uk Reply-to: Brian {Hamilton Kelly} Subject: RE: message from dek -- tex alpha version 3.14a In message <685725741.0.BNB@MATH.AMS.COM> of Tue 24 Sep 91 11:22:21-EST, bbeeton wrote: > chris, brian, and frank, > you are hereby deputized to follow dek's instructions ... > ... (i will take care of the envelope of "comments and checks" ^ (Would I spell this cheques? :-) > as soon as i can after i get back to providence.) Well, I've collected the new files (I *love* the README!!), modified my change file to accommodate the new banner and a few of the |slow_print|s, and built 3.14a. It's great!! Fixes my problem with hyphenation vs. boundary ligatures perfectly. And what an elegant solution: I'd expected the fix to be complicated, but it just requires one simple global and the odd preservation of context. > Date: Fri, 20 Sep 91 12:50:54 -0700 > Subject: note from Don > > Meanwhile, there's some more "immediate" news. I've prepared an > alpha-test version of TeX, called version 3.14a. Before I release > it officially, I'd like CET1 and Brian Hamilton-Kelly and maybe > Frank Mittelbach [and anybody else you think best, but I don't really > want this version to proliferate] to try it. There are two changes, > one to cure the anomaly Frank noticed about \newlinechar acting > differently wrt \write and \message [and other similar glitches I > then noticed myself], the other to fix I hope the problem Brian was > having with Greek. I'm reluctant to let the version number converge to > \pi too quickly, so I would at least like to have Brian accumulate > some experience with his Greek tests before putting this out widely. > Of course anybody is free to use it at their own risk. Be warned, however, that so far I've only tested it against the input that caused me problems back in April. I'll try to do some more thorough testing in the near future. [ dek: Certainly hope you did by now -- I need more reassurance than a single test ] > The files tex.web and tex82.bug have been moved to labrea.stanford.edu > where they are accessible via anonymous ftp. After logging in, > connect to directory alpha (which is for alpha-test software). > I forgot to copy the trip test files; if they are desperately needed, > I could bring them in from home. I haven't TRIPped the new TeX for this reason. The interesting thing would be if TRIP were extended to check the new ligature/hyphenation code: I expect that, with DEK's customary thoroughness, he'll have done this, so look forward to applying the tests myself when 3.14a emerges as 3.141. [ dek: Actually my customary thoroughness is becoming less customary ] > So, next week you'll get my letter explaining all! Meanwhile, a > lookahead: I'll be in Sweden during November and December (thanks to > some good words from Roswitha, I'll getting an honorary degree > there!), and I plan to make the next official update to TeX sources > early in January. By then, this version 3.14a should be checked out, > and I can release it as version 3.141. Whilst you're talking to DEK, barbara, what response did he give to that d..d problem of Weave failing to wrap correctly when processing a TeX comment? I have a horrible suspicion in my mind that he's already said something to the effect that he edited it manually! Or am I imagining that horror? My fix works very well and doesn't have any bugs that I can see, so why doesn't he adopt it? Brian HK ------- Date: Sun, 29 Sep 91 00:56:57 BST From: Chris Thompson To: Barbara Beeton Cc: Brian {Hamilton Kelly} , Frank Mittelbach Subject: Re: [message from dek -- tex alpha version 3.14a] Barbara, Brian, Frank, I have fetched and compiled TeX 3.14a. Unfortunately, change 397 seems to have bugs. Try the following input: % TeX 3.14a test \pretolerance=-1 \ Respectively.\par \showboxbreadth=255 \showboxdepth=255 \showlists \bye and compare the log files for 3.14 and 3.14a. That for 3.14a has an extra kern in between the "-ly" and the ".": .\tenrm e .\tenrm l .\tenrm y .\kern-0.83334 \ This kern has been doubled by the .\kern-0.83334 / automatic hyphentaion algorithm. .\tenrm . .\penalty 10000 [ dek: well, this made me feel very sick, but I sure am thankful to Chris for finding it. $10.24 ] I discovered this while checking that 'webman.tex' generated the same DVI file (except for date stamp), which is part of my standard paranoia suite. There were two discrepancies of the above type in the very first paragraph... but the rest of the document produced no more! [ dek: I think probably the _old_ 3.0 would also produce double kerns in cases where kern was specified before right boundary character by font designer [e.g. to kern after f at end of word] While studying this I found another mistake in my code from September. But I think the basic idea was sound, only the execution was faulty. Still I guess I should try another alpha test with "3.14b"? ] Brian & Frank: can you confirm this effect? Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk ------- Date: Mon, 30 SEP 91 15:46:17 BST Reply-to: Brian {Hamilton Kelly} From: TEX@rmcs.cranfield.ac.uk Subject: Re: [message from dek -- tex alpha version 3.14a] In message of Sun, 29 Sep 91 00:56:57 BST, Chris Thompson wrote: > Barbara, Brian, Frank, > > I have fetched and compiled TeX 3.14a. Unfortunately, change 397 > seems to have bugs. [ ... ] > > and compare the log files for 3.14 and 3.14a. That for 3.14a has an > extra kern in between the "-ly" and the ".": > > .\tenrm e > .\tenrm l > .\tenrm y > .\kern-0.83334 \ This kern has been doubled by the > .\kern-0.83334 / automatic hyphentaion algorithm. > .\tenrm . > .\penalty 10000 > > [ ... ] > > Brian & Frank: can you confirm this effect? I have repeated Chris' experiment, and can confirm that 3.14a does indeed double the kern. Oh dear... The odd thing is that I compared the outputs resulting from running DVItype on the DVI outputs of my Greek test, and saw only the expected change from medial sigma to terminal sigma (with relevant horizontal positioning changes). But I may have missed other horizontal changes due to this bug: I've thrown the printout away again by now, unfortunately! Brian <<< end DEK's alpha test, TeX 3.14a ************************************************************************ >>> TeXbook errata Date: Tue, 19 Nov 91 16:05:02 GMT From: Chris Thompson Subject: TeXbook p.178 bug report Here is another TeXbook bugette report by Robert Hunt. I suppose it is right in theory, unless it comes under the "deliberate lie" clause. Anyway, perhaps you would like to have someone else vet it: I am getting to the stage with Robert's bug reports that I just wish he would shut up! He seems happy with the latest p.377 fix, by the way. ----------------------------------------------------------------------- REH10 15 Nov 1991 16.49 Very minor mistake on p.178 of the TeXbook: it claims that using \phantom{...} in a maths formula always produces the same spacing (vertical and horizontal) as just {...}; but of course this isn't true as \phantom can't tell whether the current maths style is cramped or not, so \phantom{U^2} will produce different vertical spacing in cramped style compared with what {U^2} would produce. I don't think this is particularly worth worrying about, but it's always annoyed me that \mathchoice doesn't distinguish between (say) T and T'. ----------------------------------------------------------------------- [ dek: $2.56 I'm going to put the apology on page 360 because the truth would interfere with the exposition on p 178 ] Chris Thompson ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Date: Tue, 3 Dec 91 15:13:01 MST From: Nelson H. F. Beebe Subject: Possible bug in TeXbook, p.270 Last night, while studying to TeXbook to better understand TeX's rules for when spaces are significant, I found what I believe is a small error in the TeXbook. I checked errata files errata.one .. errata.six without finding this error already logged. The TeXbook, p.270, gives BNF rules for the grammar of dimensions. >From this grammar, one concludes from the rules (pulled from texman.tex) \is[true]\alt \is \alt \is\alt \is\alt \is|.|$_{12}$\alt|,|$_{12}$ \alt \alt that "123 truept" is illegal input, since no space may precede the word true according to the grammar. Nevertheless, TeX the program happily accepts such space: This is TeX, C Version 3.14 **\relax *\dimen0 = 12345 true pt [ dek: see the \P on p 268 that begins `We shall use a special convention for keywords'. _true_ is a keyword ] *\showthe\dimen0 > 12345.0pt. <*> \showthe\dimen0 ? I believe that TeX the program is correct, and is consistent with its permitting optional space before non-true dimensions. This suggests that the TeXbook should have the rule \is \alt changed to \is \alt I'm assuming that AB|CDE is to be parsed as (AB)|(CDE) (the conventional interpretation in BNF grammars), rather than A(B|C)DE; the latter would not make sense here anyway. ======================================================================== Nelson H.F. Beebe Center for Scientific Computing Department of Mathematics 220 South Physics Building University of Utah Salt Lake City, UT 84112 USA Tel: (801) 581-5254 FAX: (801) 581-4148 Internet: beebe@math.utah.edu ======================================================================== <<< end TeXbook errata ************************************************************************ >>> inconsistent ligature/kerning behavior Date: Wed, 25 Sep 91 13:13:55 +0200 From: Bernd Raichle Subject: Bug in TeX's ligature builder?! (No, it's documented.) In a letter to DEK in November 1989 (i.e., before TeX 3.0; appeared in TeXhax 90, Issue 5) Frank Mittelbach summarized some requirements of a new TeX version. For example, he mentioned manipulations of the "Ligatures and kerning tables" and the need to "Reconsider paragraphs after page-breaking". Other users had and have similar wishes and requirements. But nobody mentioned a bug, or better --- because this "bug" is documented in the TeXbook --- the inconsistent and (for the beginner) mysterious behaviour when TeX builds ligatures and inserts kerning. Try the following in plain TeX and look at the output. I have used the fl-ligature as an example. (TeX's behaviour for kerning between two characters is exactly the same.) ------------------------------ CUT ------------------------------ \hsize=5cm % linebreak for more than three words \hyphenation{Au-flage}% for demonstration (correct german is Auf-la-ge) Auflage\par % (1) show `fl' ligature Auf{}lage Auf\relax lage\par % (2) simple way to avoid the ligature \pretolerance=10000 % no second pass Auf{}lage Auf{}lage Auf{}lage Auf{}lage Auf{}lage\par % (3) no ligatures \pretolerance=-1 % force second pass Auf{}lage % no(!) ligature, because there's no preceding glue Auf{}lage Auf{}lage Auf{}lage Auf{}lage\par % (4) ligatures Auf{}lage % (5) no(!) ligature, there's no preceding glue Auf\kern0pt lage % no lig, explicit kern Auf\/lage % no lig, explicit kern $\rm Auflage$ % example of ligature in math mode $\rm Auf{}lage$ % no lig, in math mode \hbox{Auf{}lage} % no lig, in packed hbox Auf{}lage\vrule width0pt\ % no lig, followed by a rule node Auf{}lage\kern0ptAuf{}lage % 1st ligature, 2nd no ligature Auf{}lage\par % ligature \bye ------------------------------ CUT ------------------------------ Example (1) shows the normal fl ligature, in example (2) I suppress the ligature with an empty group (see TeXbook, exercise 5.1 and answer) or a \relax (both are non-character tokens). In example (3) I use empty groups to suppress the ligatures. The same(!) input paragraph in example (4) contains ligatures. The only and very important difference between (3) and (4) is the value of \pretolerance. (Note that the first word in (4) has no fl-ligature!) Example (5) gives a quick overview of TeX's behaviour, when the same word (= the same input!) is used in a different context. You can think of other examples: hbox the word... and unhbox it. Try the same with the ''-ligature (i.e. compare '' with '{}'), etc. Very short explanation of this behaviour: when TeX is in hmode, ligatures and kernings are build out of the (current input) token list. Non-character tokens (like { or \relax) influence ligatures and kerning. When TeX is in math mode or after hyphenation is done in the second pass of the line_break algorithm, ligatures and kernings are build out of the node list representing the current paragraph (or math list). This node list doesn't contain representations for tokens like { or \relax anymore. Because not all parts of the node list are tried for hyphenation, not all of the supressed ligatures are reconstituted. TeX has a powerful linebreak algorithm, a powerful hyphenation capability and powerful ligature and kerning tables. But the connection between them shows IMHO design bugs, if you want to use them and expect the same output from the same input. [ dek: This is unfortunate but known and I decided long ago that the cure would be worse than the disease. ] Comments, etc. ...?? (... are expected!!) -bernd PS: I have ideas how to make ligatures/kerning more consistent. But the result, after the necessary changes (in a few places) are applied, is a "TeX" which fails the Trip-Test. __________________________________________________________________________ Bernd Raichle, Student der Universit"at Stuttgart | "Le langage est source privat: Stettener Str. 73, D-W-7300 Esslingen | de malentendus" email: raichle@azu.informatik.uni-stuttgart.de | (A. de Saint-Exupery) <<< end inconsistent ligature/kerning behavior ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 036 ] From BNB@MATH.AMS.COM Sat Mar 14 13:53:42 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25439; Sat, 14 Mar 92 13:53:35 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GHN1GK7N7KBESP6M@MATH.AMS.COM>; Sat, 14 Mar 1992 15:40 EST Date: Sat, 14 Mar 1992 15:39:57 -0500 (EST) From: Barbara Beeton Subject: Messages from DEK, part 2 of 5 To: tex-implementors@MATH.AMS.COM Message-Id: <700605597.200000.BNB@MATH.AMS.COM> Mail-System-Version: Date: 14 March 91 Message No: 037 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 2 ************************************************************************ >>> inconsistent expansion of \string, etc.; \newlinechar [ dek: Here he has come across exactly what I think I fixed in the _other_ change made to 3.14a. I am contacting him directly about this. ] Date: Mon, 4 Nov 91 10:42:40 +0100 From: Bernd Raichle Subject: TeX.web 3.14: Bug Report #1 (+ Fix Proposal): \string, \special, ... Dear Ms. Beeton, I have found two bugs in TeX.Web 3.14: Bug #1: Incorrect "expansion" of unprintable chars for |selector=new_string| Bug #2: Incorrect handling of \newlinechar (I'll send a second mail with the report of bug #2.) The first bug looks like an inconsistent behaviour of \string (and \meaning, \fontname, \jobname, \special) in all TeX versions. Example: 1. \catcode`\^^Y=12 \string^^Y 2. \catcode`\^^Y=13 \string^^Y TeX 3.14 prints line 1 always as character "19 of the current font. Line 2 is printed either as char"19 or as three characters ^^Y, dependent on the TeX implementation! Other examples: 3. If ^^Y has one of the catcodes 1-4, 6-8, 11 or 12 |\string ^^Y| is printed like |\char"19|. 4. With \escapechar="19, the output of |\string\relax| is either like |^^Yrelax| or |\char"19 relax|. 5. The same is true for \catcode`\^^Y=11 \string\test^^Ytest 6. \newlinechar=`\^^Y \message{^^Y \string ^^Y} outputs two newlines only, if ^^Y is a printable character. (Remark: I have discovered the second bug with a variation of this example.) 7. The same is true for things like \special{^^Y}. 8. etc. for other characters (e.g. national characters)! An TeX implementor can (and should) change section 21 and 22 (character set translations) and section 49 (unprintable characters) to make TeX more user friendly. The predicate in section 49 specifies if a characters is printable or unprintable. Example: \ss (= char "19 in CM fonts) should be output as a single character instead of ^^Y. An implementor will change section 49 to exclude this character (i.e., to make this character printable). [With TeX 3 this is done for more and more characters on more and more machines with extended character set (especially if the new DC/EC fonts with national characters are available).] This implementation dependency is no problem (*), if the output is printed on a terminal or written in a file (log file or output stream). [(*) see the second bug report for some problems] But it can be a problem, if it is "printed" for further `internal' use (i.e., with |selector:=new_string|). Example: \def\a{\expandafter\b\string} \def\b#1{{\tt #1}} The output (in the dvi file) is implementation dependent, if \a is called with national characters, control sequence names with national characters, etc. The dvi file will also be different, if someone uses \special commands with national characters. E.g. \special{^^Y} should put one byte with value "19 into the dvi file, but for most TeX versions the three characters "^^Y" are written. Fix proposal: |print| and |slow_print| (\S 59,60) should print a single character without using |str_pool| if |selector = new_string|. (I.e., the `expansion' of unprintable characters to ^^X or ^^xx is inhibited.) procedure print(s : integer); label exit; var j : poolpointer; begin if s >= str_ptr then s:="???" else if s < 256 then if s < 0 then s:="???" + else if selector = new_string then + begin print_char(s); return + end; else if (s = new_line_char) then if selector < pseudo then begin print_ln; return end; j:=str_start[s]; while j < str_start[s+1] do begin print_char(so(str_pool[j])); incr(j); end; exit: end; ... and the same for |slow_print|. Additional changes: |issue_message| (used by \message and \errmessage) calls |print| and |print_err|. Because the string argument can contain unprintable character with the fix above, it has to be replaced by |slow_print|. Additionally the |print_err| call has to be replaced by a call to a new procedure |slow_print_err|, which calls |slow_print| instead of |print|. I don't know, if it's better to print strings like |job_name|, |log_name|, |font_name()|... with |slow_print| (This could be a third bug?). If this is true, the string |formatident| has to be printed with |slow_print|, because it incorporates |job_name|. All other strings `printed' with |selector:=new_string| are either made out of internal strings and integers or they are used only temporary (|special_out|, |ship_out|) or printed with |print_esc| (|font_id_text()|), so they do not need further changes. Possible problems: - Perhaps some macro writers assume the current behaviour of \string et.al. I.e., they assume that \string expands to a fixed set of "printable" characters only (e.g. for verbatim text like {\tt \string\foo}), but this set of printable characters changes if section 49 is changed. - The behaviour of \message and \errmessage will change: \newlinechar has an effect to these primitives (the TeXbook describes only the effect for \write). - I have made and tested the changes above... and this changed TeX passes the Trip-Test without problems. Perhaps it would be a good idea to add checks for \special and \string with arguments containing unprintable characters to `trip.tex'. Additionally there should be a short note, that the Trip-Test can produce a different output, if section 49 is changed. The output is different if one of the characters ^^@, ..., ^^82, ... is printable. Yours, Bernd Raichle __________________________________________________________________________ Bernd Raichle, Student der Universit"at Stuttgart | "Le langage est source home: Stettener Str. 73, D-W-7300 Esslingen, FRG | de malentendus" email: raichle@azu.informatik.uni-stuttgart.de | (A. de Saint-Exupery) ------- Date: Mon, 4 Nov 91 10:43:42 +0100 From: Bernd Raichle Subject: TeX.web 3.14: Bug Report #2 (+ Fix Proposal): \newlinechar Dear Ms. Beeton, here is the report of the second bug. I have found this bug when experimenting with examples (to test the behaviour of \message and \write) and possible fixes for the first bug. The second bug deals with \newlinechar and the inconsistent behaviour of \message and \write when it is set to some "exotic" values. Example: \catcode`\^^Y=11 \edef\test{\string\t^^Yt} % (1) \newlinechar=`\^ \message{^^Y ^ \string\^^Y ^ \test} \immediate\write0{^^Y ^ \string\^^Y ^ \test} % (2) \newlinechar=`\^^Y \escapechar=`\^^Y \message{^^Y \string ^^Y \relax} \immediate\write0{^^Y \string ^^Y \relax} % (3) \newlinechar=`\o The output of this example is implementation dependent! The output differs, if section 49 of TeX.web is changed to make the character ^^Y printable. 1. In part (1) of the example, the character ^^Y is printed as two newlines followed by a `Y' (if ^^Y is unprintable). The same aplies for all other ^^Y characters, even for ^^Y characters inside control sequences. 2. The same ^^Y character is printed as character |xchr["19]|, if section 49 is changed to make it printable. 3. The output of \write and \message in part (2) differs, if ^^Y is unprintable. \message produces three |^^Y|, \write prints three newlines. 4. There is no difference between \write and \message in part (1) and (2), if character ^^Y is made printable. (Remark: the TeXbook doesn't mention \newlinechar w.r.t. \message, so one could assume that changing \newlinechar will not change the output of \message (and \errmessage).) What behaviour should be correct for |print|, |slow_print|, ... w.r.t. \newlinechar, if something is printed to a file or the terminal? Here are my wishes: 1) if one of these functions is called with the current \newlinechar, it should output a newline. 2) if a tokenlist printed with \write, \message, \errmessage contains a \newlinechar, it should be printed as a newline. 3) _all_ other characters should never produce a newline character (even if there are unprintable characters and \newlinechar = `\^ !) 4) control sequence names containing a character equal to the current \newlinechar should not produce a newline (a user should use \string, if the newline character in the csname should produce a newline) 5) all fixed strings printed to the log file should never produce a newline. I think error messages in the log file should be readable for all values of \newlinechar! I have added part (3) in my example to show the current (mis-)behaviour. [ dek: Here we have a disagreement ... I don't believe it is necessary to support every possible \newlinechar in an optimum way; here the simpler rule without exceptions wins ] Fix proposal: The changes below needs the changes in |issue_message|, I have described in my first bug report (replace call of |print| in this function with |slow_print|, introduce new function |slow_print_err| and replace |print_err| in |issue_message| with a call of this funtion). Otherwise \message and \errmessage produce incorrect output for unprintable characters. I have made the following changes to |print_char|, |print| and |slow_print|: (* the old |print_char| procedure: *) procedure print_the_char ( s : ASCII_code ) (* special handling of current new_line_char deleted *) case selector of ... etc, etc. old code of |print_char|... end; procedure print_char ( s : ASCII_code ) label exit; begin if then if selector < pseudo then begin print_ln; return; end; print_the_char(s); exit: end; procedure print ( s : integer ); label exit; var j : poolpointer; begin if s >= str_ptr then s:="???" else if s < 256 then if s < 0 then s:="???" (* bug fix from bug report #1: *) + else if selector = new_string then + begin print_the_char(s); return + end; else if (s = new_line_char) then if selector < pseudo then begin print_ln; return end; j:=str_start[s]; while j < str_start[s+1] do (* to fulfill wish #4: *) ! begin print_the_char(so(str_pool[j])); incr(j); end; exit: end; procedure slow_print ( s : integer ); label exit; var j : poolpointer; begin if s >= str_ptr then s:="???" else if s < 256 then (* for wish #3: *) ! begin print(s); return end; j:=str_start[s]; while j < str_start[s+1] do begin print(so(str_pool[j])); incr(j); end; exit: end; I was very astonished that (after I had applied the changes) the resulted "TeX" passes the Trip-Test! Possible Problems(?): - \newlinechar handling is done in \write, \message or \errmessage. \write uses |token_show| to output the tokenlist and this is done by calls of |print_char| or |print| with single characters or by calls of |print_cs|. \message and \errmessage uses |slow_print| or |slow_err_print|, which handles newline characters at the `toplevel' of the string. - There should be additional changes to satisfy my wish #5 (I don't know, if it is enough to change the |print_char| calls in the procedure |show_token_list| to call a |print_char| procedure with \newlinechar handling. And the calls of |print_char| in all other places are calls to a |print_char| version without \newlinechar handling (e.g. the |print_the_char| above).) Yours, Bernd Raichle <<< end inconsistent expansion of \string, etc.; \newlinechar ************************************************************************ >>> \halign + box assignment within displays Date: Thu, 21 Nov 91 16:55:55 GMT From: Chris Thompson Cc: Robert Hunt Subject: TeX bug in math alignments (Robert Hunt) Barbara, I received the following bug report from Robert Hunt > Accepted: 12:02:31 21 Nov 91 > Submitted: 23:36:12 19 Nov 91 > IPMessageId: A4D2794E8E47F040 > From: Robert Hunt > To: cet1 > Subject: Bug (sigh) in TeX > > Here's a bug in TeX: or possibly in the TeXbook, though I'd much > prefer it to be considered the former! The input > > $$\halign{#\cr x\cr}\global\setbox1=\hbox{y}$$ > > generates a "Missing $$ inserted" error just before the "y" token. > (Pressing RETURN then generates an "I can't go on meeting you like > this" error, and TeX bombs out.) This appears to be because TeX > objects to having horizontal mode material to process after an > alignment in displayed maths mode. However, page 291 of the TeXbook > clearly states that optional may follow the closing > brace of an \halign in display maths mode, and \setbox certainly > counts as an ! It is definitely a bug in TeX: here's a log file (duplicated on Unix TeX, by the way) --------------------start log file This is TeX, Phoenix Version 3.14:0.75 (preloaded format=plain 90.7.18) 21 NOV 1991 14:17 **sysin (&TEXR ! Missing $$ inserted. y l.1 $$\halign{#\cr x\cr}\global\setbox1=\hbox{y }$$ ? h Displays can use special alignments (like \eqalignno) only if nothing but the alignment itself is between $$'s. ? ! I can't go on meeting you like this. y l.1 $$\halign{#\cr x\cr}\global\setbox1=\hbox{y }$$ One of your faux pas seems to have wounded me deeply... [ dek: ^^^^^^^^^^^^^^^^^ how true ] in fact, I'm barely conscious. Please fix it and try again. No pages of output. --------------------end log file The problem is that the state of the world when |do_assignments| in module 1206 returns can be quite different from when it is called, in respect of the mode nest and the save stack in particular. It occurred to me that one should be able to create a simlar effect with the call in |make_accent| (module 1123), and indeed, one can: --------------------start log file This is TeX, Phoenix Version 3.14:0.75 (preloaded format=plain 90.7.18) 21 NOV 1991 15:05 **sysin (&TEXS ! This can't happen (vpack). } l.1 \accent"7F\global\setbox1=\vbox{\vskip1in} z I'm broken. Please show this to someone who can fix can fix No pages of output. --------------------end log file [ dek: $327.68 made out to Hunt ] Can you add this to Don's input queue, please? [ dek: As Chris undoubtedly knows, this is not at all easy to fix. I think it would be far too risky trying to allow arbitrary boxes to be assigned after \halign in display or between \accent and the accentee ... making _fin_align_ and _accent_ recursive would open up a huge can of worms and would make \afterassignment even murkier than it is now. Therefore I am modifying the TeXbook to disallow such things [which nobody has ever used because they never have worked] and modifying TeX to detect this prohibited activity ] Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk <<< end \halign + box assignment within displays ************************************************************************ >>> positive/negative xn_over_d, tex.web section 107 Date: Thu, 21 Nov 91 18:38:51 GMT From: CET1@phoenix.cambridge.ac.uk Cc: Wayne Sullivan Subject: Wayne Sullivan's new TeX bug Barbara, Here is the promised bug report from Wayne Sullivan. I don't think I have anything useful to add to the messages below. I can produce backtraces very like Wayne's. The calls can come from |math_kern| as well as from |math_glue|. > Accepted: 16:47:07 15 Nov 91 > Submitted: 14:42:13 15 Nov 91 > From: "Wayne G. Sullivan" > To: Chris Thompson > Subject: possible bug in TeX In TeX WEB section 107, Knuth claims that for use of the procedure xn_over_d, n and d are always nonnegative. Whether the actual code makes use of this in section 107, I am not sure. However, some implementations utilize the assumed positivity in optimizing this section. In sections 716 and 717 xn_over_d is used in context with cur_mu. Though one would expect cur_mu to be nonnegative, I see no place where this is assured: I think one could simply change the corresponding font parameter to a negative value. Wayne Sullivan [ dek: $327.68 \S716-7 ] > Submitted: 17:18:39 15 Nov 91 > IPMessageId: A4CD1EF77B402240 > From: Chris Thompson > To: "Wayne G. Sullivan" > Subject: Re: [possible bug in TeX] I think you are right. |cur_mu| is set in section 703 and there is no reason why it has to be positive. Hence |f| in both sections 716 and 717 can be negative (as |x_over_n| gives a negative remainder when x and n have opposite signs). |xn_over_d| is *not* robust against negative n, quite apart from the promises in sections 107 and 99, because it causes u to be negative, and so (a) you can't assume u=d*(u div d)+(u mod d) in ANSI Pascal any longer, & (b) the test for (u div d) being too large would need to be supplemented by one for it being too small. I would like to construct a hard example of obvious bad results before passing this to Barbara & Don, though. Chris Thompson > Accepted: 14:57:26 18 Nov 91 > Submitted: 09:55:30 18 Nov 91 > From: "Wayne G. Sullivan" > To: Chris Thompson > Subject: xn_over_d I grabbed the pascal code for xn_over_d out of my change file to compare the result with an optimized version. On tangling I discovered that the intermediate values t,u,v are declared as nonnegative_integer. Hence a range checking compiler will choke on a negative cur_mu: Can DEK escape >From that? > Accepted: 15:38:21 18 Nov 91 > Submitted: 09:32:41 18 Nov 91 > From: "Wayne G. Sullivan" > To: Chris Thompson > Subject: xn_over_d The use of xn_over_d with the possibly negative cur_mu only affects the the lower precision term in a `split' precision calculation so that the problem of overflow arises only if the input values are beyond the allowed values, i.e., with valid input values no overflow arises. Thus the algorithm will give the correct answer even though cur_mu is negative. However, this does depend on the div procedure satisfying negation symmetry: (-n) div d =-(n div d) when n,d>0. At any rate the documentation needs to be corrected so that optimized versions of the code will also give the correct result. > Accepted: 17:40:28 18 Nov 91 > Submitted: 11:05:39 18 Nov 91 > From: "Wayne G. Sullivan" > To: Chris Thompson > Subject: xnod Here is an example of the xn_over_d problem. Sorry about the antique version of TeX currently on our VM/CMS: I use it only to illustrate the problem. The TeX file: \showboxdepth100 \showboxbreadth100 \scrollmode \showthe\fontdimen6 \tensy \fontdimen6 \tensy=-10pt \showthe\fontdimen6 \tensy $$a\;a$$ \showlists \end The LOG file (errors were reported on the terminal). Look for negative glue. This is TeX, VM/CMS Version 2.1 (preloaded format=PLAIN 87.5.19) 18 NOV 1991 11 **xnod (xnod.tex.* > 10.00002pt. l.4 \showthe\fontdimen6 \tensy > -10.0pt. l.6 \showthe\fontdimen6 \tensy ### horizontal mode entered at line 7 spacefactor 1000 ### vertical mode entered at line 0 ### current page: \glue(\topskip) 10.0 \hbox(0.0+0.0)x469.75499, glue set 449.75488fil .\hbox(0.0+0.0)x20.0 .\penalty 10000 .\glue(\parfillskip) 0.0 plus 1.0fil .\glue(\rightskip) 0.0 \penalty 10000 \glue(\abovedisplayshortskip) 0.0 plus 3.0 \glue(\baselineskip) 7.69446 \hbox(4.30554+0.0)x7.79407, shifted 230.98047 .\teni a .\glue -2.77771 plus -2.77771 .\teni a \penalty 0 \glue(\belowdisplayshortskip) 7.0 plus 3.0 minus 4.0 total height 29.0 plus 6.0 minus 4.0 goal height 643.20255 prevdepth 0.0, prevgraf 4 lines ! OK. l.8 \showlists [1] Output written on xnod.dvi (1 page, 264 bytes). > Submitted: 17:48:09 18 Nov 91 > IPMessageId: A4D0EAFE6C983690 > From: Chris Thompson > To: "Wayne G. Sullivan" > Subject: Re: [xnod] Wayne, Thanks for the messages. There is no doubt that there is a bug: the declaration of t,u,v as |nonnegative_integer|, which I had missed, makes it definite. I have traced through |xn_over_d| using a debugger, and found it storing negative values, as expected. I'll put all this together and send it off to Barbara Beeton, if you are agreeable. Chris Thompson > Accepted: 12:02:21 21 Nov 91 > Submitted: 09:54:31 19 Nov 91 > From: "Wayne G. Sullivan" > To: Chris Thompson > Subject: xn_over_d Thanks in advance for passing the problem on to BNB/DEK. Enclosed below are screen messages which appear when xnod.tex is processed on our ancient VM/CMS version of TeX. Other things happen with the PC version I use, but that uses an assembly language equivalent of the old xn_over_d which will have to be upgraded to whatever Knuth decides to the procedure. It is unlikely that he will employ the obvious modification. [ dek: ^^^^^^^^^^^^^^^^^^^^ which is? ] Ready; T=0.05/0.11 09:49:43 tex xnod This is TeX, VM/CMS Version 2.1 (no format preloaded) ** (xnod.tex.* > 10.00002pt. l.4 \showthe\fontdimen6 \tensy > -10.0pt. l.6 \showthe\fontdimen6 \tensy AMPX031E Low bound checking error TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE XNOVERD 6 024014 TEX MATHGLUE 4 03E1B8 TEX MLISTTOHLIST 73 041F54 TEX AFTERMATH 68 052EBE TEX MAINCONTROL 128 058098 TEX 110 05B86C TEX PASCAL/VS 05BDAE AMPX031E Low bound checking error TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE XNOVERD 10 024074 TEX MATHGLUE 4 03E1B8 TEX MLISTTOHLIST 73 041F54 TEX AFTERMATH 68 052EBE TEX MAINCONTROL 128 058098 TEX 110 05B86C TEX PASCAL/VS 05BDAE AMPX031E Low bound checking error TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE XNOVERD 6 024014 TEX MATHGLUE 7 03E236 TEX MLISTTOHLIST 73 041F54 TEX AFTERMATH 68 052EBE TEX MAINCONTROL 128 058098 TEX 110 05B86C TEX PASCAL/VS 05BDAE AMPX031E Low bound checking error TRACE BACK OF CALLED ROUTINES ROUTINE STMT AT ADDRESS IN MODULE XNOVERD 10 024074 TEX MATHGLUE 7 03E236 TEX MLISTTOHLIST 73 041F54 TEX AFTERMATH 68 052EBE TEX MAINCONTROL 128 058098 TEX 110 05B86C TEX PASCAL/VS 05BDAE ! OK (see the transcript file). l.8 \showlists [1] (see the transcript file for additional information) Output written on xnod.dvi (1 page, 264 bytes). Transcript written on xnod.texlog. ----- End of Included messages ----- Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk <<< end positive/negative xn_over_d, tex.web section 107 ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 037 ] From BNB@MATH.AMS.COM Sat Mar 14 14:26:53 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25563; Sat, 14 Mar 92 14:26:44 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GHN1L3T0Y8BESP6M@MATH.AMS.COM>; Sat, 14 Mar 1992 15:43 EST Date: Sat, 14 Mar 1992 15:43:36 -0500 (EST) From: Barbara Beeton Subject: Messages from DEK, part 4 of 5 To: tex-implementors@MATH.AMS.COM Message-Id: <700605816.970000.BNB@MATH.AMS.COM> Mail-System-Version: Date: 14 March 91 Message No: 039 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 4 ************************************************************************ >>> \TeX logo; some problems with \loop i accidentally misfiled this with another problem that had already been solved; it should have been sent to you months ago. ------- Date: Mon 24 Jun 91 14:39:46-EST From: Michael Downes Subject: \TeX % These %'s are so that you can move this file out of mail % and \TeX{} it as is to run the tests given below. % % What version of plain.tex were you using when you checked the % spacefactor after \TeX? The one running on the TeXserver is % presumably quite recent since the version of TeX is 3.1, from % the standard Unix distribution. The following test demonstrates % that the spacefactor after \TeX{} has to be manually adjusted % to get proper spacing when punctuation follows (if \nonfrenchspacing % is in effect). This can be corrected so easily by adding % \spacefactor\@m at the end of \TeX's definition that I % don't know why it hasn't been done already. Perhaps there's % some reason not to do so that Knuth can explain. [ dek: That would have been better but I think it's too late now to make such a change ] % The space after the first period differs from the space after % the second period, as demonstrated by the \showlists command. \tracingonline=1 \showboxbreadth=\maxdimen \TeX{}. Next. Third.\showlists % If Knuth changes plain.tex to fix \multispan perhaps he'll be % willing to consider another obvious improvement: % % Redefine \loop, as described in TUGboat vol 8 no 2 July 1987 % by Alois Kabelschacht. % % This allows an \else branch on the repeat condition to be used, % which seems rather important in light of the fact that TeX \ifnum % and \ifdim tests only allow <, =, and > comparisons, with no % "not" operator or <= or >= combinations. Knuth might be unwilling % to do this, however, since it would require corresponding changes % in the TeXbook in the description of plain.tex; and it might be [ dek: All the macros in Plain TeX are merely simple examples that can be improved ad infinitum ... I cannot keep doing it ] % only 99.8% backward compatible. [However, no existing TeX file % will fail to run under the following redefinition unless it % refers to the internal pieces of \loop (\body and \next), and % perhaps the most likely reason someone might have done so would % be to compensate for the lack of an \else possibility!!] \def\loop#1\repeat{\def\iterate{#1\expandafter\iterate\fi}% \iterate \let\iterate\relax} % The following example illustrates an \else condition in % the internal loop. It takes an even number and keeps % dividing it until the result is odd. Try a large % power of 2, e.g., 65536. \newcount\n \newlinechar=`\& \loop \message{&Enter a large even number (RETURN to stop): }% \read-1 to\answer \n=0\answer\relax [ dek: ^^^^^^ wow, a trick I didn't know ] \ifnum\n>0 % Braces to localize \iterate {\loop \divide \n by 2 \message{&N divided by 2 = \number\n}% \ifodd\n \else\repeat}% \repeat \bye ------- comment by bb: i thought the definition of \TeX had been changed, because the following definition accompanied DEK's article of October '90 on "The future of TeX and METAFONT": \def\TeX{T\hbox{\hskip-.1667em\lower.424ex\hbox{E}\hskip-.125em X}} i'm pretty sure if that had been my addition, i would have marked it as such, or would have made the change at a more general level. but i could very well be misremembering. ------- Date: Fri, 3 JAN 92 22:14:36 GMT Originally-from: TEX "Brian {Hamilton Kelly} " From: TEX@rmcs.cranfield.ac.uk Subject: RE: flash!!! notice that tex update is about to happen [ ... ] Brian PS I'd love to see some way of using \loop... within an \halign; at present it falls over because each & cancels a level of grouping, and so the definition of \body (or is it \repeat) keels over. But I guess this is too big a change. [ dek: Well \dosplits on page 397 does iterate within \valign (which is similar) ] ------- <<< end \TeX logo; some problems with \loop ************************************************************************ >>> cedilla accent macro in plain.tex Date: Mon, 23 Sep 91 11:56:15 +0200 From: Bernd Raichle Subject: TeX Bug Report -- plain.tex, cedilla accent macro Dear Ms. Beeton, I've found a small bug in `plain.tex' (and a missing index entry in the TeXbook). Perhaps someone has reported it before, but I have seen no fix for it... 1. There's a missing index entry for "ligatures" in the TeXbook. Page 306 contains the answer of exercise 5.1. This answer explains how to supress unwanted ligatures. [ dek: No ... p 19 (ex 5.1, about suppressing unwanted ligs) is indexed. see p 457 "an answer page is not indexed unless ..." ] 2a. The \c macro in `plain.tex' produces a box with the wrong dimensions (height = depth = 0pt) for arguments with height != 1ex. Fix 1: exchange the \unhbox\z@ and \hidewidth\char24\hidewidth in the macro definition [ dek: good $2.56 \ooops ] \def\c#1{\setbox\z@\hbox{#1}\ifdim\ht\z@=1ex\accent24 #1% \else{\ooalign{\unhbox\z@\crcr\hidewidth\char24\hidewidth}}\fi} 2b. For characters with depth != 0pt, the "cedilla" character is not lowered. I don't know of any use for this, because there's only the c-cedille accent. But the idea used in the proposed fix can be used for the under-dot and under-bar accent (and \AA) too. The accent characters of these macros are misplaced if used within a slanted font. Fix 2: use the \accent primitive and a changed ex-value to shift the accent character to the correct place \def\c#1{{\dimen@ 1ex% % Compute new ex value: % The cedilla accent should not be shifted for uppercase characters, % => new-ex := height(#1). {\setbox\z@\hbox{#1}\dimen@\ht\z@ % but we have to shift it down, if the character has depth. % => new-ex := new-ex + depth(#1) \advance\dimen@\dp\z@ % Set the `ex' value (fontdimen change is always global), \fontdimen5\font\dimen@}% % ... set the accent char, reset `ex' to the old value % and do the rest. \accent24\fontdimen5\font\dimen@ #1}} Yours, Bernd Raichle (raichle@azu.informatik.uni-stuttgart.de) PS: Please can you add me to the `tex-implementors' list? Thanks in advance. ------- Date: Sun, 29 Sep 91 15:37:10 BST From: Chris Thompson Subject: Re: [[Bernd Raichle : TeX BugReport -- plain.tex, cedi]] Dear Barbara, > 1. There's a missing index entry for "ligatures" in the TeXbook. > Page 306 contains the answer of exercise 5.1. This answer explains > how to supress unwanted ligatures. Yes, I think it plausible that 306 should appear in the list. It doesn't in the most recent 'texbook.tex' I have access to. > 2a. The \c macro in `plain.tex' produces a box with the wrong dimensions > (height = depth = 0pt) for arguments with height != 1ex. > > Fix 1: exchange the \unhbox\z@ and \hidewidth\char24\hidewidth in > the macro definition > > \def\c#1{\setbox\z@\hbox{#1}\ifdim\ht\z@=1ex\accent24 #1% > \else{\ooalign{\unhbox\z@\crcr\hidewidth\char24\hidewidth}}\fi} Amazing that this hasn't come to light earlier: \hbox{\c C} has zero height and depth as stated! Bernd's fix would appear to be the right one: but maybe there is some subtle reason why the alignment is written that way round at the moment---it's such a counterintuitive thing to do! > 2b. For characters with depth != 0pt, the "cedilla" character is not > lowered. I don't know of any use for this, because there's only the > c-cedille accent. But the idea used in the proposed fix can be > used for the under-dot and under-bar accent (and \AA) too. The > accent characters of these macros are misplaced if used within a > slanted font. > > Fix 2: use the \accent primitive and a changed ex-value to shift > the accent character to the correct place > > \def\c#1{{\dimen@ 1ex% > % Compute new ex value: > % The cedilla accent should not be shifted for uppercase characters, > % => new-ex := height(#1). > {\setbox\z@\hbox{#1}\dimen@\ht\z@ > % but we have to shift it down, if the character has depth. > % => new-ex := new-ex + depth(#1) > \advance\dimen@\dp\z@ > % Set the `ex' value (fontdimen change is always global), > \fontdimen5\font\dimen@}% > % ... set the accent char, reset `ex' to the old value > % and do the rest. > \accent24\fontdimen5\font\dimen@ #1}} This is a much more ambitious change, and I have doubts about its having sufficient backwards compatibility, and its robustness. An interesting technique, though, which relies on the x-height of the current font being recorded before the |do_assignments| call in |make_accent|. I think you should show it to Don together with the rest. Chris Thompson <<< end cedilla accent macro in plain.tex ************************************************************************ >>> missing \mathsurround in plain.tex Date: Mon, 4 Nov 91 13:35:11 +0100 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Subject: Glitches in plain.tex Barbara, while revamping the LaTeX files I found that some macros from plain.tex lack the \m@th (i.e. \mathsurround=0pt) assignment. The macros in question are: \overrightarrow \underrightarrow [ dek: (scratched out \underrightarrow) \overleftarrow ] \overbrace \underbrace \notin/\c@ncel \rightleftharpoons/\rlh@ [ dek: $2.56 x 3 A359, A360, A361 ] What do you think? I append a demo file. Rainer ------------------demo for plain tex-------------------- text$\overrightarrow{abcdef}$text text$a \notin b$text \mathsurround =10pt text$\overrightarrow{abcdef}$text text$a \notin b$text text$a \rightleftharpoons b$text \bye <<< end missing \mathsurround in plain.tex ************************************************************************ >>> \arrowvert code, plain.tex comment by bb: there are several symbols in cmex* that are meant as midsections of arrows, brackets, etc., but which are also given \cs names. these include \arrowvert \Arrowvert \bracevert it seems that as long as these are referenced indirectly, through \uparrow, \downarrow, etc., they are well behaved, but when they are referenced by the \cs name, the wrong symbol is produced. i'm not the only one who finds the \delimiter values assigned in plain somewhat mysterious. if these really are errors, there's a different claimant for each one: \arrowvert -- Phil Taylor, \Arrowvert -- me, \bracevert -- Chris Thompson. if the values are correct though, we'd really appreciate a good explanation of how this works. re the comments by michael barr regarding how much memory is required for rules vs. characters, i know that's been explained before, but perhaps it's time to do it again. i shall ask chris thompson or brian {hk}. ------- Date: Mon, 11 NOV 91 17:43:28 BST Reply-to: Philip Taylor (RHBNC) From: CHAA006@vax.rhbnc.ac.uk Subject: [Surely not another] bug in Plain.TeX ? For a book on French linguistics, I need up/down arrows which span rows in an \halign; I use \uparrow, \downarrow and \arrowvert. In both the MS/DOS (emTeX) and VAX/VMS versions of Plain.TeX, \arrowvert translates to \delimiter"33C000. Character "3C in family 3 (cmex10) is the centre part of a brace, not a vertical arrow ... [ dek: right, as stated on page 150. I admit these names are not the best, but you are supposed to regard them as _verts_ (with differing weights) derived from arrows/braces _not_ as arrows/braces ] ** Phil. P.S. Plain version 3.0 (can't find a date). ------- Date: Mon, 18 NOV 91 10:57:30 BST Reply-to: Philip Taylor (RHBNC) From: CHAA006@vax.rhbnc.ac.uk Subject: Re: [Surely not another] bug in Plain.TeX ? >>> however, now that i look at this, i'm wondering why you haven't just >>> used \uparrow, \downarrow, and \updownarrow, forcing them to whatever >>> size you want with phantoms. or have you? if so, and you're really >>> getting the wrong pieces, then there's a problem in the .tfm file too. I did, in the end; here's the current version: % \newbox \arrowbox \setbox \arrowbox = \hbox {\let|=\strut\smash{$\left\updownarrow\matrix{|\cr|\cr|\cr|\cr}\right.$}} \def \arrow {\copy \arrowbox} % \Begin {columns} \options = {\listindent = 0.25 \listindent} % \+&&Il&&&&effondr{\'e}&&&&&&&&une chaise.&\cr \+&&Pierre&&s'est&&assis&&sur&&&&&&un si{\`e}ge.&\cr \omit&\arrow&&&&\arrow&&&&&&&&\arrow&&\cr \+&&L'{\'e}tudiant&&&&install{\'e}&&&&&&&&un tabouret.&\cr \+&&&&&&&&&&&&&&un canap{\'e}.&\cr \+&&\multispan 4{\qquad \leftarrowfill\rlap {$\joinrel \rightarrow$}}&& \multispan 2{\llap {$\leftarrow$}$\joinrel$\rightarrowfill}&&&&& \multispan 2{\llap {$\leftarrow$}$\joinrel$\rightarrowfill}&\cr % \End {columns} [ ... ] ------- Date: Wed 20 Nov 91 07:02:41-EST From: bbeeton To: CET1@phx.cam.ac.uk by the way, phil taylor has found another plain bug -- the character referenced by \arrowvert is wrong; following his lead, i discovered that \Arrowvert also looks bad. (these are rarely used, but even so, it's surprising they haven't been found before; i think they've been this way since versions >1.) phil is concocting a routine to print out all the plain math characters for checking. -- bb ------- Date: Thu, 21 Nov 91 16:39:06 GMT From: Chris Thompson [ ... ] I met Phil Taylor at the UKTeX meeting in Oxford yesterday, and he told me about the \arrowvert..\Arrowvert problem. I was surprised that such a bug had gone unnoticed so long. Are you both sure about this? I get the appearance described on page 150 of the TeXbook in test programs. (All this complicated by the variable-character codes in cmex10, by which a character rarely means itself! "3C as varchar is actually repeating bits of "3F, and "3D as a varchar is repeating bits of "77.) There was a change in plain \fmtversion 2.92 (I think) in which the definitions of \arrowvert, \Arrowvert and \bracevert were changed >From "33C, "33D & "33E to "33C000, "33D000 & "33E000: I never understood in what circumstances this could make a difference! [ dek: Yes, see the bottom of page 156 ] Chris Thompson ------- Date: Fri, 22 NOV 91 11:02:27 BST Reply-to: Philip Taylor (RHBNC) From: CHAA006@vax.rhbnc.ac.uk Subject: Arrowvert (forgot to cc you) Dear Chris --- >>> I was using tests like >>> $$\left\arrowvert X\sp h \over J_y \right\Arrowvert$$ >>> and they seem to work. Have you got an example which fails? Yes ! $ \arrowvert $ ! [I suppose this means I must be abusing \arrowvert, [ dek: ^^ \bigg would produce a particular size I will pay Phil $2.56 ] but I can't for the life of me see why ...] ** Phil. ------- Date: Fri, 22 Nov 91 15:37:37 GMT From: Chris Thompson To: Philip Taylor (RHBNC) Subject: RE: \arrowvert..\Arrowvert Phil, Ah, I see! Although $\left\arrowvert...$ works, $\arrowvert$ doesn't. The trouble is, I don't see how it can be made to... When \arrowvert is used in the context of a , only the top 15 bits of the 27-bit number following \delimiter are used. Before the change in plain 2.92 (if that is in fact when it was) this would have produced \mathchar"0000 (clearly wrong); now it produces \mathchar"033C (unfortunately, still wrong). cmex10 character "3C as a math character is the middle piece of a large \lbrace as shown on p.432, but when used as a (after \left, or the macros \bigl, \Biggl, etc.) its varchar definition is used, and that says that it should be made up of a lot of copies of character "3F (no top, middle or bottom piece used). If the definition of \arrowvert was changed to "033F000, then the context would work, but the one wouldn't: instead it would be like \updownarrow. [ dek: Next version of plain has a good compromise: \lmoustache "437A340 \arrowvert "026A33C \rmoustache "437B341 \Arrowvert "026B33D \lgroup "462833A \bracevert "077C33E \rgroup "462933B ] Chris Thompson ------- Date: Thu, 21 Nov 91 11:31:24 EST X-ListName: TeX-Related Network Discussion List From: barr@triples.Math.McGill.CA Subject: Optimizing TeX? Let me begin by mentioning a misprint in the Book. On page 358, line -8 (17th printing, Jan. 1990), it says \mathchardef\mapstochar="322F \def\mapsto{\mapstochar\rightarrow} and the "322F should be (and is, in plain.tex) "3237. [ dek: $2.56 was paid to Karl Berry for this on 19 Sep 91. Labrea sources haven't changed yet but my texbook.tex had this correct as of 18 Sep 91 acc. to errata.tex ] The reason I noticed this was that a colleague had tried to make a \mapsfrom macro and discovered a curious fact. The \mapstochar is not just a vertical line, but has a blip in the middle on the right hand side that effectively ruins it for any other purpose. I told my colleague to use a rule and he did. I know from something that appeared in UKTeX a ffew weeks ago that rules are expensive in space. There is a reason for this. It requires 4 real numbers to specify a rule (the space of rules is a 4-manifold and fewer numbers will not do, no matter how you describe them). [ dek: not true; 3 numbers (height, width, depth) suffice in context ] Possibly 4 16 bit numbers would have given sufficient precision, but Knuth coded all dimensions as 32 bit numbers. So, it presumably takes 16 bytes to specify a rule and it is going to be much more wasteful of space to use a rule than it would have been to use the \mapstochar. Is this correct? More to the point, how can one find out about coding TeX to optimize for space? For time? Knuth drops a few clues in the discussion on pp. 373-374, but that is almost the only place this is done. For instance, can anyone tell me the relative cost of factoring macros into smaller ones, assuming you never reuse the pieces? Michael Barr ------- Date: Fri, 22 Nov 91 17:07:28 GMT From: Chris Thompson Subject: This and that Barbara, You write > there was another in that category reported just > today by michael barr -- the \mapstochar. if you read comp.text.tex, > look for his report, because it's asking a couple more basic questions, > namely how much memory space do certain techniques require, and what's > the most efficient way to do certain things. I follow comp.text.tex only intermittently, as its volume is really rather high. I took a look at Michael Barr's posting, and have posted a reply. He is certainly right about \mapstochar: the TeXbook said "322F in 1984, and the latest texbook.tex from labrea still says it; on the other hand plain.tex has always (correctly) had "3237 since 1.0 (I am fairly sure of this: despite the absence of a plain.tex.history file, I do do compares of old with new each time I upgrade and note down the salient changes). Chris Thompson <<< end \arrowvert code, plain.tex ************************************************************************ >>> MF -- bugs or features? (also, some ACP errata) Date: Tue, 22 Oct 91 12:04:23 +0100 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Subject: Questionable features of Metafont Barbara, here is the report about two features of Metafont that I find questionable. I hesitate to call them bugs as it is said nowhere in the book that Metafont should not behave that way. Number 1: --------- The book says that addto V also P; is "essentially the same as" V := V + P; But this is only true if the evaluation of the picture expression P does *not* change the value of the picture variable V as a side effect. The second expression is correctly evaluated left to right. The first one, however, uses the value of V *after* P has been evaluated. Looking into the program one sees that "addto" reads the token list for V, evaluates the expression P, and converts the token list for P into an edge structure afterwards. I assume that the same behaviour happens with "cull". [ dek: I suppose so -- just as you would expect in C when comparing x=x+f(x) to x+=f(x) when f has side effects ] Number 2: --------- Define the following macro: def xxx = begingroup save x; newinternal x; .... endgroup enddef; After a hundred expansions Metafont gives you an "capacity exceeded" error since every call to newinternal statically allocates a slot in an internal array, which is not freed again at the end of the group. However, in this case, it is very easy to free the slot again. [ dek: ^^^^^^^^^ STRONGLY DISAGREE that _any_ change can be described as ``very easy''! Bugs or features? At least, this behaviour should be documented! [ dek: Maybe in TUGboat. Certainly the ``capacity exceeded'' error tells you that you've exceeded the number of internals. ] [ ... ] Rainer <<< end MF -- bugs or features? (also, some ACP errata) ************************************************************************ >>> MFbook errata Date: Tue, 5 Nov 1991 05:48:03 +0100 From: Lutz Birkhahn Subject: Errors in METAFONTbook (C&T Vol. C) Barbara, just for the record, here are some minor bugs in the METAFONTbook (C & T vol. C) index: "openwindow", "or", and "string" each are METAFONT primitives, so they deserve to get a starred entry (preceded with an asterisk `*') in the index on pages 353, 354, and 356, respectively. This error shows up in my edition from 1986 (first printing, I think) as well as in the source code I have, stating "Fifth printing, revised, March 1990" (although the C & T Publication info table you posted recently said that the 4th printing of Volume C, Sept 1991, is the latest...). [ dek: 3 x $2.56 many thanks C354^2 C356 ] Bye, Lutz Lutz Birkhahn (Germany) email: lutz@bisun.nbg.sub.org (don't use another!) F"urther Str. 6 +-------------------------------------------------- D-W-8501 Cadolzburg 2 | "It is an error to not have enough arguments" Voice: 09103 / 2886 | (Hype Programmer's Guide) <<< end MFbook errata [ dek: notes to myself only boolean was done as I had intended 353 numeric 356 string 357 transform 354 openwindow or pair path pen picture ] ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 039 ] From BNB@MATH.AMS.COM Sat Mar 14 14:36:57 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25596; Sat, 14 Mar 92 14:36:50 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GHN1NBOLE8BESP6M@MATH.AMS.COM>; Sat, 14 Mar 1992 15:45 EST Date: Sat, 14 Mar 1992 15:45:24 -0500 (EST) From: Barbara Beeton Subject: Messages from DEK, part 5 of 5 To: tex-implementors@MATH.AMS.COM Message-Id: <700605924.580000.BNB@MATH.AMS.COM> Mail-System-Version: Date: 14 March 91 Message No: 040 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 5 ************************************************************************ >>> WEAVE -- Peter Breitenlohner Date: Sun, 10 Nov 91 17:11:15 GMT From: Peter Breitenlohner Organization: Max-Planck-Institut fuer Physik, Muenchen Subject: message for dek Barbara, can you please forward this message. Thanks Peter ---------------- To: Donald E. Knuth From: Peter Breitenlohner 1. Thanks for your comments on patgen. I mostly wanted to make sure that you don't object to the character translation scheme. (You had quite a few argumantss against my first ideas on how to do that.) 2. Yet another bug/misfeature in WEAVE. After a discussion with Bart Childs I looked at the way WEAVE marks modules as changed and did some experiments. Consider the following web and change file (maybe somewhat atypical, but no modules are removed or inserted): ================ cut here ==================================== @ One. 11111 @ Two-c. 22222 @ Three. 33333 @ Four-c. 44444 @ Five-c. 55555 @ Six. 66666 @ Seven. 77777 @ Eight-c. @ Nine-c. @ Ten-c. @ Eleven. @ Twelve-c?. @ Thirteen-c. @ Fourteen. @*Index. ================ cut here ==================================== @x - this changes Two @ Two-c. 22222 @y [ dek: not much of a change ] @ Two-c. 22222 @z @x - this changes Four and Five 44444 @ Five-c. 55555 @y @ Five-c. 55555 @z @x - this changes Eight, Nine, and Ten @ Eight-c. @ Nine-c. @ Ten-c. @y @ Eight-c. @ Nine-c. @ Ten-c. @z @x - this changes Thirteen (and maybe also Twelve) @ Thirteen-c. @y @ Thirteen-c. @z ================ cut here ==================================== One would think that modules Two, Four, Five, Eight, Nine, Ten, and the index are changed. One might argue whether module Twelve has been changed or not. Technically the blank preceding '@ Thirteen' is part of module Twelve. That blank could, however, also be a form feed with xord[form_feed]=" ", a situation not uncommon until recently in many web files. Thus one might prefer not to count blanks, tabs, form feeds, and blank lines as changes. Incidentally I don't quite see how a tab_mark can enter into the input buffer of WEAVE or TANGLE unless through modified xchr/xord arrays. Actually modules Four, Eight, and Nine are not marked as changed by [ dek: ^^^^ this is the only one I personally would care about, of course ] WEAVE 4.2 Unfortunately I see no simple fix for that, which would not also mark modules One and Seven as changed (I guess that would just undo the Aug 83 changes). The code below uses a completely revised changed-module-algorithm: [ dek: @ 1 ] 1. a module is marked as changed if it starts while changing is true [ dek: @ 2 ] 2. when a matching change is detected, the current module is marked as changed unless the first non blank line after the '@x' and after the '@y' both start with '@ ' or '@*' (possibly preceeded by blanks/tabs). [ dek: ^^^^^^^^^ ^^^^^^^^^^^ preceded whitespace ] ================ cut here ==================================== % All line numbers refer to WEAVE.WEB 4.2, as of September 5, 1990. @x [9] m.71 l.1274 - fix bug in changed_module reckoning @!changing: boolean; {if |true|, the current line is from |change_file|} @y @!changing: boolean; {if |true|, the current line is from |change_file|} @!change_pending: boolean; {if |true|, the current change is not yet recorded in |changed_module[module_count]|} @z %--------------------------------------- @x [9] m.79 l.1364 - fix bug in changed_module reckoning prepares to read the next line from |change_file|. @y prepares to read the next line from |change_file|. In addition the current module is marked as changed unless the first line after the \.{@@x} and after the \.{@@y} both start with either |'@@*'| or |'@@ '| (possibly preceded by blanks and\slash or blank lines). The \.{WEB} macro |if_start_of_module_then_change_pending| tests for this condition. @d if_start_of_module_then_change_pending== [ dek: ^^ (#) would be less tricky ] loc:=0; buffer[limit]:=" "; while (loc; if changing then begin @; if not changing then begin changed_module[module_count]:=true; goto restart; end; @y this is identical to the corresponding code in TANGLE begin restart: if changing then @; if not changing then begin @; if changing then goto restart; @z %--------------------------------------- @x [9] m.84 l.1453 - fix bug in changed_module reckoning if limit>1 then {check if the change has ended} @y if limit>1 then {check if the change has ended} [ dek: ^ Here I think it should be 0 ] begin if change_pending then begin if_start_of_module_then_change_pending:=false; if change_pending then begin changed_module[module_count]:=true; change_pending:=false; end; end; [ dek: And I added buffer[limit]:=" "; here @z %--------------------------------------- @x [9] m.84 l.1464 - fix bug in changed_module reckoning end; @y end; end; @z %--------------------------------------- [ dek: Suppose change-pending and limit = 1, then the change won't have ended. If the line comes in as `.' say and the _next_ line begins `@*' then we won't have marked the current module as changed. In your test the line comes in as `@' and you seem to be lucky because you increase module_count ] @x [11] m.110 l.1934 - fix bug in changed_module reckoning changed_module[module_count]:=false; {it will become |true| if any line changes} @y changed_module[module_count]:=changing; {it will become |true| if any line changes} @z %--------------------------------------- ================ cut here ==================================== Regards Peter <<< end WEAVE -- Peter Breitenlohner ************************************************************************ >>> WEAVE -- Brian {Hamilton Kelly} Date: Mon, 18 NOV 91 14:44:15 GMT Originally-from: TEX "Brian {Hamilton Kelly} " From: TEX@rmcs.cranfield.ac.uk Subject: RE: Bugs or mis-features in WEAVE barbara, I was more than somewhat disappointed by DEK's response to my message about WEAVE, which you (re-)relayed to him on 8 May 91 (your message-id <673745280.0.BNB@MATH.AMS.COM>). (I'll adopt below your usual artifice of interpolating DEK's pencilled notes.) In a message of 29-SEP-1989 13:07 BST, I wrote: > It looks as though fixing the long line bug has introduced one!!! > > I think I've engineered a fix: can you feed it through to DEK please? > > @x [Section 122 |flush_buffer|] > write_ln(tex_file); incr(out_line); > if b @y > write_ln(tex_file); incr(out_line); > per_cent:=false; > if b for k:=1 to j do > if (out_buf[k] = "%") and ((k=1) or (out_buf[k-1] <> "\")) > then per_cent:=true; > if per_cent then {Ensure continuation line starts in comment mode} > begin > out_buf[b]:="%"; > decr(b) > end; > if b @z [dek: Sorry, I do not have time to update WEAVE and TANGLE... (I prefer to shorten the long lines and get back to Vol. 4) but if Brian feels so strongly about that he *has* to see this in, he should send me *debugged* code!... The above looks flaky (has undeclared variable [anyway I'd probably use a goto!] and missing begin end at least.] OK, after more experience of actually writing in WEB ab initio (vs. just writing change files) I'm prepared to accept that the other `misfeature' reported in the same message isn't such a bad idea after all [viz. actually listing multiple references to the same WEB section when a section is invoked more than once in-line]. However, I do think that the above problem really is a bug, and one that's likely to cause all sorts of untold grief to someone who only ever uses WEAVE to pretty-print someone else's changes; especially if the author hasn't him/herself done this (cf. TeX V2.991 when DEK had index entry macros used without accounting for \_ requirements) I'll admit that I've never compiled the code above per se, because the [ dek: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AHA, just as I guessed. ] correction to the bug for the VMS implementation is subtly different, since that implementation uses a separate auxiliary buffer for outputs to the .tex file. But the same phenomenon (of Weave inserting an infelicitous line-wrap) was observed on two separate implementations of Weave for the IBM-PC: my colleague Niel Kempson fed through the above suggested fix to each of the authors, and it was included, compiled quite happily, and fixed the bug in each version without any further problem. [ dek: And this should have been stated so that I wouldn't have started with the wrong impression. ] Therefore, I cannot see why DEK says that there's an undeclared [ dek: because I don't have the foggiest idea what WEAVE does anymore ] variable: I've just gone through the above suggested correction with a fine-toothed comb, and every variable used therein is not only declared, but already in *use* in \S122 of weave.web. (OK, |per_cent| is a parameter, but it's a call-by-value one, and therefore can legitimately be re-used, after it's done it's initial job] as a local variable for [ dek: ^^^ its ] after all, it already has a most apposite name). Moreover, at the risk of some comment about grandmothers and sucking eggs, why should it require a begin...end pair?... [ dek: Well if he hadn't said ``I think i've engineered a fix'' I would have had more confidence. But this going back to maintain old programs always takes several days off my life... I have come to really hate doing it, please excuse me but it is the truth. Talk about stress! ] Forgetting about the comment, and rearranging... > if b for k:=1 to j do > if (out_buf[k] = "%") and ((k=1) or (out_buf[k-1] <> "\")) then > per_cent:=true; can be parsed into: > if b for k:=1 to j do > if then ; thence to: > if b for k:=1 to j do > ; and > if b for := to do > ; then > if b ; and finally > if then > ; So what need for a compound statment? If Weave et al made a habit of always surrounding the single statment controlled by a conditional or [ dek: Not sure what I meant but I guess I may have been thinking about efficiency -- no need to test per_cent if b>=out_ptr in that code ] iterative statement into a compound statement (as is done by many writers in C, to avoid potential problems when multiple statements are later inserted in place of the original single one), then I'd be prepared to agree that this too should become a compound statment. But there are many places in Weave, TeX and elsewhere where an if or for controls but a single unardorned statement. Of course, had this been Modula-2 (a [ dek: ^ unadorned ] language that I cannot commend too highly) there'd be no possibility of confusion about the statements controlled by any construct, although no BEGIN would ever be required, and I don't agree with Wirth's overloading of END. [ dek: Anyway I looked at it again and found a real bug in Brian's ``engineering'': Sometimes flush_buffer is called with b=out_ptr when you want to propagate a %. (You see why I can't satisfy everybody's whims without spending a lot of time that I haven't got.) The next time anybody asks for anything I just won't even try; I don't like to disappoint people, but a h*** of a lot more people will be disappointed if I don't finish Volume 4. [But after wasting this much time I _did_ put in a patch that should satisfy Brian. Hoping that he will be so happy he will help me with Vol. 4 by convincing all his friends never to pester me again.] Barbara please excuse me for getting so upset. Some day I will no doubt be happy again! ] > Again, could you please pass this to DEK? Best regards, Brian {Hamilton Kelly} +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + JANET: tex@uk.ac.cranfield.rmcs + + BITNET: tex%uk.ac.cranfield.rmcs@ac.uk + + INTERNET: tex%uk.ac.cranfield.rmcs@nsfnet-relay.ac.uk + + Smail: School of Electrical Engineering & Science, Royal Military + + College of Science, Shrivenham, SWINDON SN6 8LA, U.K. + + Phone: Swindon (0793) 785252 (UK), +44-793-785252 (International) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ------- (reply, 19 Nov 91) brian, you would like me to send it around one more time? ------- Date: Wed, 20 NOV 91 10:41:14 GMT Originally-from: TEX "Brian {Hamilton Kelly} " From: TEX@rmcs.cranfield.ac.uk Subject: RE: Bugs or mis-features in WEAVE Dear barbara, In your message <690606220.0.BNB@MATH.AMS.COM> of Tue 19 Nov 91 22:03:40-EST, you wrote: > you would like me to send it around one more time? Yes please; Chris Thompson has since come back to me (I circulated my message for you to him before sending it for you to forward to DEK) and suggested that perhaps trailing comments (apart from the percent itself) should be eliminated from the woven TeX before even reaching the output routine, so that this problem wouldn't arise. I'm not so sure that I agree, although the problem HAS only raised its ugly head because of TeX comments in the Web that were only there to explain why the Web was written that way, not why the program was so written (or in limbo material providing a revision history), so perhaps nothing's lost by discarding them. >>> WEAVE -- Brian {Hamilton Kelly} [ dek: Please tell Peter B and Brian H~K that the new WEAVE.WEB is now available for alphatesting on labrea ~ftp/alpha directory I hope they can check it before February thanks ] ************************************************************************ >>> Barry Smith, |history| variable in TeX.WEB Date: Mon 6 Jan 92 21:36:47-EST From: bbeeton To: winkler@cs.stanford.edu Subject: another tex report for don this report just arrived. --------------- Date: Mon, 6 Jan 92 17:15 PST From: barry@reed.edu (Barry Smith) To: BNB@MATH.AMS.COM Subject: Re: flash!!! notice that tex update is about to happen hmmm, mail here was down over the weekend, so this may be too late, but... I noticed (while working on Lighting Textures) that in section 1335 (final_cleanup), the |history| variable is not updated when either of these error messages is produced. Only a problem when someone (something?) external to TeX is looking at history, of course. (might even qualify as a high-value bug?) Barry Smith, Blue Sky Research barry@reed.edu ------- [ dek: I don't think I'll make any change here, since this sort of ``warning'' is not the kind for which there is info in the transcript file ... so it would look funny if TeX then said ``(see the transcript file for info)''. Of course a system dependent error level can be signalled here if somebody wants. ] <<< end Barry Smith, |history| variable in TeX.WEB ************************************************************************ >>> Chris Rowley, two items: \fontdimen2 and marks Date: Tue 7 Jan 92 11:02:12-EST From: bbeeton To: winkler@cs.stanford.edu Subject: another tex comment for don little comments keep dribbling in ... --------------- Date: Mon, 6 JAN 92 23:43:37 GMT From: CA_ROWLEY@vax.acs.open.ac.uk To: BNB <@nsfnet-relay.ac.uk:BNB@MATH.AMS.com> Subject: RE: flash!!! notice that tex update is about to happen b Just one thing on the TeXbook: I mentioned this to you but have not sent it to Don. In the table on page 447, it states that \sigma_2 is used in Rule 17. Now, \sigma_2 is (page 441) \fontdimen2 of a font in family 2, but Rule 17 uses \fontdimen2 of the font from which the character comes (page 445). This may seem a bit pedantic but it has misled more than one person in an area where a lot of misunderstandings are found (not caused by what is in the TeXbook)! [ dek: $2.56 Well it might be \sigma_2 or \xi_2 but I agree that the present chart is misleading. (TFM designers do need to make \sigma_2, \sigma_5, \xi_2, \xi_5 sensible for characters in those fonts used as Ord.) ] \chi ------- Date: Fri 10 Jan 92 22:41:54-EST From: bbeeton To: winkler@cs.stanford.edu Subject: message for don re tex update i just received this today. i'm pretty sure this will be the last, as i'm leaving sunday morning to go to los angeles for two weeks. --------------- Date: Fri, 10 JAN 92 18:53:27 GMT From: CA_ROWLEY@vax.acs.open.ac.uk To: bnb <@nsfnet-relay.ac.uk:bnb@math.ams.com> Subject: for Don, copy sent to CET1 Don and Barbara As Joachim Schrod discovered (in implementing change-bar macros), when the output routine is called with no intention to shipout anything, then not only is it necessary to ensure that the inserts are left as before (using \holdinginserts positive) but it is also sensible to leave the values of the three marks as they were, especially if they are null. Frank and I have been discussing this and we now think that what is needed for marks is a little more complex than for inserts, as follows. Even when \holdinginserts is positive, it is useful to have the three marks set as normal: this is so that the output routine can use the information in them but, in the case that \holdinginserts is positive, after the end of the output routine they should be restored to the values they had before the output routine was called. This needs to be done at the Pascal level because it is not possible, at the TeX level, to restore the pointer associated with a mark to [ dek: You can simulate the behavior without great difficulty; no need to `put back' a value into the real mark registers. Solutions to that are well known and the user can be disciplined in marks used. ] have the value "null"; and it is also difficult to detect, in a robust way, whether one of these pointers does have the value "null" or to distinguish this value from a pointer to an empty string. Therefore, at the Pascal level, the values of "top_mark", "bot_mark" and "first_mark" should be saved before these pointers are updated (in the procedure "fire_up"), and then the values of all three to be restored to these saved values when the user's output routine finishes. This will also require appropriate changes to the parts of the code which call "delete_token_ref" and "add_token_ref". [ dek: Somehow my message about TeX being frozen is not getting through. If you feel extremely strong about this, please (a) send me a carefully written detailed report about why this is an important problem -- say 5-10 pages long and easy for me to understand -- and include rigorous proofs that TeX as it stands cannot possibly deal with a certain important application -- and include rigorous justification for assertion that an incompatible change such as you propose will not affect any present users adversely (b) send me complete list of necessary changes to the Pascal code after having tested them thoroughly, including extensive documentation about the tests you have made. Vague paragraphs about `save a pointer and make appropriate changes' are inadmissible. (c) Wait until I have time to look at TeX again. ] Best wishes Chris Frank ------- <<< end Chris Rowley, two items: \fontdimen2 and marks ************************************************************************ dek: Barbara, while I've got all this whitespace I have room to tell you that I have made several changes to Computer Modern fonts: (1) Arrowheads made thicker and wider (2) Calligraphic F, H, I, and T revised. I have seen too many cases of `arrowhead burnoff' after weak laserprinter and Xeroxing, hence (1). I have begun to read lots of math papers that use the \cal letters and I've had very negative personal reaction to H and T, slightly to I, and very very slightly to F. Thus I have to try to change them so as to minimize the number of future times my mind is taken from math to fonts ... I realize it's a hassle changing fonts, so the upgrade will take awhile before it spreads around (although with Rokicki's DVIPS it's very simple, I just update the CM sources and delete the affected fonts, then DVIPS will remake them automatically as needed). The affected fonts are cmsy* (and cmbsy10) mainly, but the harpoons change in math italic and the arrows change a bit in cmtt. ==> NO TFM FILES ARE AFFECTED <== so there is no change to TeX's line breaks. I will never change the TFMs again (unless I discover a major glitch). At the same time I'm installing a new cmbase with ideas by John Hobby; it makes digitization better (_much_ better at low res and some better even at 300dpi). The new CM sources are available for alphatesting on labrea ~ftp/alpha . [ enclosed: proofs of arrows and modified \cal letters as noted above ] ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 040 ] From BNB@MATH.AMS.COM Mon Mar 16 15:21:34 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA13107; Mon, 16 Mar 92 15:21:30 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GHPV2683SMBESYNM@MATH.AMS.COM>; Mon, 16 Mar 1992 16:09 EST Date: Mon, 16 Mar 1992 16:08:53 -0500 (EST) From: bbeeton Subject: [Phyllis Winkler : note from Don Knuth] In-Reply-To: <9203162052.tex-implementors To: tex-implementors@MATH.AMS.COM Message-Id: <700780133.900000.BNB@MATH.AMS.COM> Mail-System-Version: i have just received this message, and am passing it on without having even tried to check to see whether it is true that the files at labrea have been updated. first chance i get, i will do a little more analysis and report. by the way, the relay trashed all copies of the recent message 36 to the u.k. i will be re-sending that, but not today (if i want independent transportation, i have to bail my car out of the mechanic's clutches before 5 p.m.). -- bb --------------- Date: Mon, 16 Mar 92 12:52:44 -0800 Subject: note from Don Knuth Today being a special day numerically (3:16), I am at last releasing the new versions of TeX and METAFONT and Computer Modern. All the sources at labrea are now up to date, and there is a new file DIFFS.Mar92 (about 450K bytes] that tells everything that has changed. One file of special interest to you is ~ftp/pub/tex/dist/errata/errata.seven. It's the list of all changes to Vols A-D made during the year; this should published be a supplement to TUGboat as usual. (11 pages, plus more if you add excerpts from the .bug files). The changes to Computer Modern should be widely announced. None of the tfm files is affected, but I recommend that all font files be regenerated! John Hobby's idea for better rounding is helpful at all but the highest resolutions; and at high resolution, all the symbol and math italic fonts need to be remade anyway because of important revisions to those characters. Namely, after reading lots of papers typeset with TeX, I decided to change the shapes of several of the calligraphic caps; also I made all the arrows heavier so that they won't disappear on xerox copies; also I completely changed the lowercase delta. (I have no idea where I got the model for the previous delta, but when I found myself unable to use the letter delta in a math paper because I knew I wouldn't like to see it in my own font I knew that I had to do something! And I actually am quite pleased with the new one, I think I'll use delta a lot in future...) All the changes to Computer Modern appear at the end of ~ftp/pub/tex/ dist/errata/cm85.bug. They should probably be put into hardcopy with the errata list for TUGgees. Thanks again for everything. As usual, but with ever more reason for optimism, I believe we have now seen all the important bugs in TeX and its friends, and at most one or two minor twiddles will be necessary in the future. Happy St Patrick's Day tomorrow! I go to Dublin for two weeks in April. From schoepf@sc.ZIB-Berlin.DE Tue Mar 17 13:35:39 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA21428; Tue, 17 Mar 92 13:35:37 MST Received: from serv02.ZIB-Berlin.DE by MATH.AMS.COM (PMDF #12735) id <01GHR819WSVKBJSA1Z@MATH.AMS.COM>; Tue, 17 Mar 1992 15:30 EST Received: from dagobert.ZIB-Berlin.DE by serv02.ZIB-Berlin.DE (4.0/SMI-4.0-serv02/7.11.91 ) id AA04026; Tue, 17 Mar 92 21:28:47 +0100 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/31.1.91) id AA06250; Tue, 17 Mar 92 21:28:42 +0100 Received: by quattro.ZIB-Berlin.DE (4.1/SMI-4.1) id AA17750; Tue, 17 Mar 92 21:27:04 +0100 Date: Tue, 17 Mar 92 21:28:42 +0100 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: [Phyllis Winkler : note from Don Knuth] In-Reply-To: bbeeton's message of Mon, 16 Mar 1992 16:08:53 -0500 (EST) <700780133.900000.BNB@MATH.AMS.COM> To: BNB@MATH.AMS.COM Cc: tex-implementors@MATH.AMS.COM, tex-archive@science.utah.edu Reply-To: Schoepf@sc.ZIB-Berlin.DE Message-Id: <9203172028.AA06250@sc.zib-berlin.dbp.de> Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Today I have transported everything (at least I hope that) that's new >From labrea to rusinfo. Tonight the automatic update check will write the list of new files to the corresponding CHANGES file (in soft/CHANGES) so that everybody can easily see what was changed/updated. Rainer Schoepf Konrad-Zuse-Zentrum ,,Ich mag es nicht, wenn fuer Informationstechnik Berlin sich die Dinge so frueh Heilbronner Strasse 10 am Morgen schon so D-1000 Berlin 31 dynamisch entwickeln!'' Federal Republic of Germany or From CET1@phx.cam.ac.uk Sat Mar 21 19:33:45 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA04205; Sat, 21 Mar 92 19:33:43 MST Received: from snow.csi.cam.ac.uk by MATH.AMS.COM (PMDF #12735) id <01GHX5KNYEM8BJSDKV@MATH.AMS.COM>; Sat, 21 Mar 1992 21:25 EST Received: from phx.cam.ac.uk by snow.csi.cam.ac.uk via JANET (TEST) with NIFTP (PP) id <07614-0@snow.csi.cam.ac.uk>; Sun, 22 Mar 1992 02:24:18 +0000 Date: Sun, 22 Mar 92 02:23:46 GMT From: CET1@phx.cam.ac.uk Subject: A feature(?) of TeX 3.141 discovered by Wayne Sullivan To: tex-implementors@MATH.AMS.COM Cc: Wayne Sullivan , Brian {Hamilton Kelly} , Barbara Beeton Message-Id: Wayne Sullivan has discovered a situation in which TeX 3.141 gives different character spacing from TeX 3.14. I think it is probably a `feature', but maybe it is a `bug': read the following and judge for yourselves. The change was in fact already present in the alpha test versions 3.14a and 3.14b, but Brian and I failed to discover it. Here is Wayne's original report to me > I got the WEB file for 3.141 and adapted my MS-DOS CHG file: The > resulting EXE's give no problem with the TRIP test, but as a extra > check, I ran off my test version of TEXMAN. The DVI file produced by > my 3.141 that of an '89 version differs at only one point: the index > entry for `box memory,'. The new version has a kern between the `y' > and ',' as there should be for cmr8. However, Knuth has made the comma > active for the index so that the usual inner loop does not insert > kerns (there are several instances of `y,'). The new DVI file only > inserts one kern. Is there something in the new mechanism that could > cause the tokens to be rescanned in this single case so that a kern is > inserted? The full entry in the index is > > box memory, 300, 394. > > Of course it could be a bug in my new version, but it is hard to see > how it could cause something like this to happen. and later he provided the following stripped down test case > Yet a simpler file displaying the problem. With 3.141 one `y,' has no > kern (display,) while the other does (memory,). 3.14 has no kern for > both cases. > > \let\comma=,% > \let\period=.% > \raggedright \tolerance=5000 \hbadness=5000 \parfillskip 0pt plus 3em > \hsize=17.5pc > \catcode`\,=\active \def,{\tenrm\comma} > \catcode`\.=\active \def.{\tenrm\period\par\hangindent 2em } > \parindent0pt\relax > box display, 66, 75, 79, 158--159, 302, 455. > box memory, 300, 394. > \bye The fact that the comma is active is significant only in that it generates a "\tenrm" between the "y" and the ",". This causes TeX (in |main_control|) to think they are in separate words and so the usual implicit kern doesn't get generated. There is no change here in TeX 3.141. In TeX 3.14 and earlier, if automatic hyphenation was later applied to the horizontal list, no change was made here. In TeX 3.141, however, automatic hyphenation re-asseses the relationship between the last letter of the word and subsequent punctuation, and reinserts the implicit kern. (TeX 3.141 has changes in this area to fix bugs associated with right boundary characters, discovered by Brian HK.) The effect can be demonstrated by a simple test: % TeX 3.141 (3.14a,3.14b) test following Wayne Sullivan. \ Display\tenrm,\par \ Display{}.\par \pretolerance=-1 % now force a hyphenation pass \ Display\tenrm,\par \ Display{}.\par \showboxbreadth=255 \showboxdepth=255 \showlists \bye In TeX 3.14 the kerns never reappear, in TeX 3.141 they reappear just in the last two cases. Note: I am relying on Wayne and Brian in saying that the effect happens in TeX 3.141; at the moment I only have 3.14a and 3.14b available. But I don't believe there is any significant change between 3.14b and 3.141. Of course, we are used to the fact that TeX's automatic hyphenation recalculates ligatures and implicit kerns within a word. We know that it is not sufficient to write `shelf{}ful', but that we must must use `shelf\/ful' (say) instead. The change in TeX 3.141 extends this state of affairs slightly. So: `feature' or `bug' ? Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From CET1@phx.cam.ac.uk Wed Mar 25 15:25:02 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA05469; Wed, 25 Mar 92 15:25:00 MST Received: from snow.csi.cam.ac.uk by MATH.AMS.COM (PMDF #12735) id <01GI2HPOB6N4BJSW2F@MATH.AMS.COM>; Wed, 25 Mar 1992 17:07 EST Received: from phx.cam.ac.uk by snow.csi.cam.ac.uk via JANET (TEST) with NIFTP (PP) id <28155-0@snow.csi.cam.ac.uk>; Wed, 25 Mar 1992 22:05:25 +0000 Date: Wed, 25 Mar 92 22:05:13 GMT From: Chris Thompson Subject: WARNING: tex/lib/plain.tex at labrea is broken (\c) To: tex-implementors@MATH.AMS.COM Cc: reh10@phx.cam.ac.uk Message-Id: I just tried to put the new plain.tex (\fmtversion 3.14) into service here. Robert Hunt has rapidly discovered that the changed definition of \c is broken: it refers to "\unbox", an undefined control sequence, where "\unhbox" is almost certainly meant. The missing "h" appears in all attempts to get the file from labrea.stanford.edu, so I am fairly sure that it is broken there. Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From schoepf@sc.ZIB-Berlin.DE Thu Mar 26 03:09:49 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA11935; Thu, 26 Mar 92 03:09:46 MST Received: from serv02.ZIB-Berlin.DE by MATH.AMS.COM (PMDF #12735) id <01GI360DI5JKBJSKNV@MATH.AMS.COM>; Thu, 26 Mar 1992 04:42 EST Received: from dagobert.ZIB-Berlin.DE by serv02.ZIB-Berlin.DE (4.0/SMI-4.0-serv02/7.11.91 ) id AA00567; Thu, 26 Mar 92 10:39:07 +0100 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/31.1.91) id AA00261; Thu, 26 Mar 92 10:38:47 +0100 Received: by quattro.ZIB-Berlin.DE (4.1/SMI-4.1) id AA00158; Thu, 26 Mar 92 10:37:00 +0100 Date: Thu, 26 Mar 92 10:38:47 +0100 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: WARNING: tex/lib/plain.tex at labrea is broken (\c) In-Reply-To: Chris Thompson's message of Wed, 25 Mar 92 22:05:13 GMT To: CET1@phx.cam.ac.uk Cc: tex-implementors@MATH.AMS.COM, reh10@phx.cam.ac.uk Reply-To: Schoepf@sc.ZIB-Berlin.DE Message-Id: <9203260938.AA00261@sc.zib-berlin.dbp.de> Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin From: Chris Thompson I just tried to put the new plain.tex (\fmtversion 3.14) into service here. Robert Hunt has rapidly discovered that the changed definition of \c is broken: it refers to "\unbox", an undefined control sequence, where "\unhbox" is almost certainly meant. The missing "h" appears in all attempts to get the file from labrea.stanford.edu, so I am fairly sure that it is broken there. Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk Yes, and unfortunately I propagated this to lplain.tex and splain.tex of March 18. I will put up a corrected version next week. Fortunately, \c c works, although \C c doesn 't (at least in cmrXX). Rainer From jmr@nada.kth.se Thu Mar 26 07:54:20 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA14155; Thu, 26 Mar 92 07:54:18 MST Received: from nada.kth.se (cyklop.nada.kth.se) by MATH.AMS.COM (PMDF #12735) id <01GI3G6GL5NKBJT03G@MATH.AMS.COM>; Thu, 26 Mar 1992 09:33 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA11000; Thu, 26 Mar 92 15:31:00 +0100 Date: Thu, 26 Mar 92 15:31:00 +0100 From: Jan Michael Rynning Subject: gcc 2.0 found bibtex bug To: tex-implementors@MATH.AMS.COM Message-Id: I just recompiled bibtex (bibtex.web translated to bibtex.c by the web2c package) with gcc 2.0 and received the message: bibtex.c:5059: warning:`and' of mutually exclusive equal-tests is always zero The web code which leads to this message: @= begin while ((ex_buf_xptr < ex_buf_ptr) and (lex_class[ex_buf[ex_buf_ptr]] = white_space) and (lex_class[ex_buf[ex_buf_ptr]] = sep_char)) do incr(ex_buf_xptr); {this removes leading stuff} I guess the second `and' should be changed to an `or': @= begin while ((ex_buf_xptr < ex_buf_ptr) and ((lex_class[ex_buf[ex_buf_ptr]] = white_space) or (lex_class[ex_buf[ex_buf_ptr]] = sep_char))) do incr(ex_buf_xptr); {this removes leading stuff} From BNB@MATH.AMS.COM Thu Mar 26 13:11:50 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA25580; Thu, 26 Mar 92 13:11:47 MST Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GI3OKGLIO6BIJVLA@MATH.AMS.COM>; Thu, 26 Mar 1992 13:34 EST Date: Thu, 26 Mar 1992 13:33:52 -0500 (EST) From: bbeeton Subject: plain.tex at labrea has been fixed To: tex-implementors@MATH.AMS.COM Message-Id: <701634832.490000.BNB@MATH.AMS.COM> Mail-System-Version: i've just received this message from don knuth. please re-update any copies of plain.tex that you may have retrieved since the 3.141 update. whew! -- bb -------------------- Date: Thu, 26 Mar 92 10:12:30 -0800 Subject: all fixed oops, very sorry about that. shows how much I use cedillas compared to the people in Koblenz... OK, the files on labrea (~ftp/pub/tex/lib/plain.tex and ~ftp/pub/tex/DIFFS.Mar92) are now correct. And I understand how the error crept in. The correct plain.tex has fmtversion 3.141, the bad one has fmtversion 3.14. [...] Check on the way for Haubensak! From yannis@gat.citilille.fr Wed Apr 8 17:36:35 1992 Flags: 000000000001 Return-Path: Received: from VAX01.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA08337; Wed, 8 Apr 92 17:36:32 MDT Received: from lilserv.citilille.fr by MATH.AMS.COM (PMDF #12735) id <01GIM6VD1PYOC516JU@MATH.AMS.COM>; Wed, 8 Apr 1992 19:32 EST Received: from gat.citilille.fr by lilserv.citilille.fr Thu, 9 Apr 1992 01:27:52 +0200 Received: by gat.citilille.fr Thu, 9 Apr 92 01:31:59 +0200 Date: Thu, 9 Apr 92 01:31:59 +0200 From: yannis@gat.citilille.fr (yannis) Subject: VULCANO To: tex-implementors@MATH.AMS.COM Message-Id: <01GIM6VD1PYOC516JU@MATH.AMS.COM> ************************************************************************ V U L C A N O ************************************************************************ It is with great pleasure that I announce the availability of a new TeX- related utility: VULCANO. This program solves once and for all the problems of PostScript font encodings by using virtual fonts. Out of an AFM file, a virtual font is created, containing all visible and composite PostScript characters and all kernings pairs of the original font. Furthermore this virtual font can have any encoding, including ligatures, smart ligatures, right or left boundaries and so forth. The encoding is defined in a very simple form in an external configuration file. The one for DC is already included in the distribution. Before explaining what VULCANO does in more details, here is where you can get it: by anonymous ftp at spi.ens.fr (IP 129.199.104.3) and the path is /incoming/vulcano for the moment, and probably /pub/tex/vulcano later on. The source is written in Pascal (Macintosh dialect) and executables for Mac are included. Implementations for PC and UNIX are expected very soon (any volunteers?) ************************************************************************ Now in more details: First of all VULCANO creates a PL file with all metrics of the PostScript font, but without ligatures kernings etc. This file has the same encoding as the PostScript font and serves as a kind of bridge between the virtual font and the "real" PostScript font (which is in the printer). VULCANO also reads the configuration file which contains the encoding we want our font to have [DC of course, but also any other encoding], including ligatures. These ligatures can be the ordinary ones we were used from old times (f+i=fi etc) or the new TeX3 smart ligatures, left- and right-boundary ones. The configuration file is written once and for all, so the average user doesn't has to care about it. In any case, even if one has to write a new one the syntax is trivial, so it shouldn't take more than 10 minutes to do it. Then VULCANO reads the kerning pairs of the PostScript font. These pairs are re-encoded to our new encoding. [Note that you can assign more than one position to each PostScript character; VULCANO will create the necessary kerning pairs: for example if you make a font with 127 "T"s and 127 "A"s, and of course ask for T-A and A-T kernings, VULCANO will make 32258 kerning pairs for you... :-) ] But the most interesting feature is the following: as we all know, PostScript fonts are bigger than they seem to be. A big part of the font is sometimes "hidden". That means that the character description is there, but there is no position in the font table for the character. One can recognize these characters in the AFM file: they have a character position of -1. These cha- racters become visible when the font is re-encoded. This is quite easy most of the time but when you re-encode your font and write a document, you can hardly send it to somebody else (unless you send him the font too). On the other hand, most of these "hidden" characters are "composite". This means that they are made out of other "not-hidden" characters: in Times-Roman for example, \'a is defined as "character a" superimposed to "character acute". Does this remind you of something? you're right! it's the idea of virtual fonts. And that's what VULCANO is doing: when a "hidden" character is actually a composite character, made out of "non-hidden" components then VULCANO creates the corresponding virtual character. In this way, the only characters we loose (because they are hidden) are thorn and eth (instead of *all* accented characters, as in the Adobe Standard Encoding). The recepy is the following: take a fresh original AFM file (be sure it is the original one, imitations usually have a different encoding than the font itself, and everything goes bananas...) choose an encoding (you can choose any encoding, as long as it is DC --- no, just joking...the encoding can be arbitrary) run VULCANO. Suppose your AFM file is called foo.AFM. You'll get: - a PL file "ps-foo.pl". This one should never be used by TeX. It serves as a bridge between the virtual font and the PostSCript interpreter. It is also used by VPtoVF to create the VF file. So you must run PLtoTF to get ps-foo.tfm. Put it in a dry place, where VPtoVF can find it. - a VPL file "vf-foo.vpl". That's the whole deal. It contains all information one possibly get out of an AFM file. All PostScript character names are written as COMMENTs, so you can actually *read* this file (there is no mystery anymore, like in those old cmr.pl, remember?) Run VPtoVF on this file. You'll get a tfm file "vf-foo.tfm", which you'll use in your TeX documents from now on and a vf file "vf-foo.vf", which you have to put in the special virtual fonts folder. If your DVI driver doesn't recognizes virtual fonts it's time to change it and get a serious one. - a LOG file "vf-foo.log". In this way you can use a PostScript with *no* special precaution, just like any DC font. And since one can do virtual fonts over virtual fonts, all those beautiful new font standards arising these last months (Lithuanian/Latvian, Welsh) can be applied to PostScript fonts, without changing their encoding. But that's looking in the future. For the present we have at last a serious application of virtual fonts, and I hope that in the next few years we will all work with them. Slogans: use PostScript fonts as if they were DC fonts. Let TeX and DVI be independent of malignant encodings! All implementations to other devices are more than welcome! Three complete examples (AFM,VPL,PL,VF,TFM,LOG) : Optima, Goudy, Souvenir- Light are already provided in the distribution [no PostSCript fonts!! just AFMs and new stuff]. Later on I will add the necessary files for the usual PS printer resident fonts (Times, Courier, Helvetica, HelveticaNarrow, Bookman, NewCenturySchoolbook, Palatino, AvantGarde, ZapfDingbats [which will not be re-encoded] and ZapfChancery. Enjoy! Yannis Haralambous Internet: yannis@gat.citilille.fr Fax: (33) 20.91.05.64 From rokicki@CS.Stanford.EDU Wed Apr 8 18:03:29 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA08447; Wed, 8 Apr 92 18:03:27 MDT Received: from Xenon.Stanford.EDU by MATH.AMS.COM (PMDF #12735) id <01GIM7LR6V00C5130V@MATH.AMS.COM>; Wed, 8 Apr 1992 19:53 EST Received: by Xenon.Stanford.EDU (4.1/25-XENON-eef) id AA02357; Wed, 8 Apr 92 16:53:59 PDT Date: Wed, 8 Apr 92 16:53:59 PDT From: "Tomas G. Rokicki" Subject: Re: VULCANO To: yannis@gat.citilille.fr Cc: tex-implementors@MATH.AMS.COM Message-Id: <9204082353.AA02357@Xenon.Stanford.EDU> This is precisely the functionality that is currently in afm2tfm, but afm2tfm is in portable C and is `standard' for using PostScript fonts. Afm2tfm also allows more functionality, such as slanting (obliquing) and extending (stretching) the font---and it also provides facilities to access the unencoded characters that are not composite. I'd like to promote one or the other exclusively, since the last thing we need is Times-Roman being used in a bunch of different incompatible ways . . . but I'm the author of afm2tfm so I feel a bit biased. Not only that, but there is apparently no agreement on a standard encoding; many people are pushing `ExtendedTeX'; others are pushing `DC'; others are pushing `ISOLatin1'. It would be great to have a standard on this too . . . [By currently in afm2tfm, I mean it is currently in the beta of the newest dvips . . .] From bed_gdg@SHSU.edu Wed Apr 8 19:46:41 1992 Flags: 000000000001 Return-Path: Received: from Niord.SHSU.edu by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA09141; Wed, 8 Apr 92 19:43:26 MDT Received: by SHSU.edu (MX V3.1A) id 9880; Wed, 08 Apr 1992 17:30:23 CDT Date: Wed, 08 Apr 1992 17:30:18 CDT From: "George D. Greenwade" To: aras@eceris.ece.ncsu.edu, info-tex@SHSU.edu, tex-archive@math.utah.edu Cc: texhax@cs.washington.edu, uktex@tex.ac.uk Message-Id: <00958CEA.775FF120.9880@SHSU.edu> Subject: RE: BoxedEPSF for OzteX 1.4 on FILESERV/Niord Soon after I posted my announcement about the BOXEDEPS package on FILESERV/Niord, I received a message from Walter Carlip which I thought to be passed along. Thanks to Walter for the updated information! --George =========================================================================== >Included in this package is a new file (BOXEDEPS.OZTEX) for OzTeX 1.4. >OzTeX 1.4 interprets the Mac subdirectory delimiter ":" as a separator >between keywords in the \special command. This breaks the use of >subdirectories for included graphics files. FYI: This problem has been fixed in OzTeX 1.41 which also has a revised version of boxedeps. These are ftpable (is that a word?) from midway.uchicago.edu. --Walter _____________________________________________________________________________ Walter Carlip **** carlip@ace.cs.ohiou.edu **** (the "3" is invisible) _____________________________________________________________________________ From gtoal@castle.edinburgh.ac.uk Thu Apr 9 01:11:51 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA10882; Thu, 9 Apr 92 01:11:47 MDT Received: from sun3.nsfnet-relay.ac.uk by MATH.AMS.COM (PMDF #12735) id <01GIMMPAKYI8C50MI3@MATH.AMS.COM>; Thu, 9 Apr 1992 03:05 EST Date: Thu, 9 Apr 92 8:03:22 GMT From: gtoal@castle.edinburgh.ac.uk Subject: Cancel subscription please. To: tex-implementors@MATH.AMS.COM Message-Id: <9204090803.aa05414@castle.ed.ac.uk> Via: sun3.nsfnet-relay.ac.uk; Thu, 9 Apr 1992 07:58:24 +0100 'fraid I don't have time to keep up with TeX things nowadays; could you remove me from the list please? Graham PS I tried tex-implementors-request first, but there's no such address? From spqr@minster.york.ac.uk Thu Apr 9 04:15:36 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA11269; Thu, 9 Apr 92 04:15:32 MDT Received: from sun3.nsfnet-relay.ac.uk by MATH.AMS.COM (PMDF #12735) id <01GIMT5F2JV4C511RC@MATH.AMS.COM>; Thu, 9 Apr 1992 06:10 EST Date: 9 Apr 1992 10:08:57 GMT From: spqr@minster.york.ac.uk Subject: Re: VULCANO To: rokicki@CS.Stanford.edu Cc: tex-implementors@MATH.AMS.COM, yannis@gat.citilille.fr Message-Id: Via: sun3.nsfnet-relay.ac.uk; Thu, 9 Apr 1992 11:04:14 +0100 References: <9204082353.AA02357@Xenon.Stanford.EDU> "Tomas G. Rokicki" writes: > This is precisely the functionality that is currently in afm2tfm, but > afm2tfm is in portable C and is `standard' for using PostScript fonts. > Afm2tfm also allows more functionality, such as slanting (obliquing) > and extending (stretching) the font---and it also provides facilities > to access the unencoded characters that are not composite. > .... > Not only that, but there is apparently no agreement on a standard > encoding; many people are pushing `ExtendedTeX'; others are pushing > `DC'; others are pushing `ISOLatin1'. It would be great to have a > standard on this too . . . I must say I too found Yannis' message rather confusing, as if many many people in the Unix and DOS worlds hadn't been using afm2tfm for the same purpose for several years. With all due respect to the wonderful Yannis, I think we need a review from him of how his system compares with afm2tfm. The only lack in the current `official' afm2tfm that I have met is the inability to change the encoding, but the various patches that have been done to make this work (ie I did one to make the Lucida Maths fonts work) are being subsumed in the latest version. sebastian From jmr@nada.kth.se Thu Apr 9 10:24:19 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA14810; Thu, 9 Apr 92 10:24:13 MDT Received: from nada.kth.se (cyklop.nada.kth.se) by MATH.AMS.COM (PMDF #12735) id <01GIN5QVB0S0C5191I@MATH.AMS.COM>; Thu, 9 Apr 1992 12:11 EST Received: by nada.kth.se (5.61-bind 1.4+ida/nada-mx-1.0) id AA16566; Thu, 9 Apr 92 18:10:29 +0200 Date: Thu, 9 Apr 92 18:10:29 +0200 From: Jan Michael Rynning Subject: Re: VULCANO In-Reply-To: Your message of Wed, 8 Apr 92 16:53:59 PDT To: "Tomas G. Rokicki" Cc: yannis@gat.citilille.fr, tex-implementors@MATH.AMS.COM Message-Id: Tom Rokicki writes: > This is precisely the functionality that is currently in afm2tfm, but > afm2tfm is in portable C and is `standard' for using PostScript fonts. > Afm2tfm also allows more functionality, such as slanting (obliquing) > and extending (stretching) the font---and it also provides facilities > to access the unencoded characters that are not composite. > > [By currently in afm2tfm, I mean it is currently in the beta of > the newest dvips . . .] Where can we find this `beta of the newest dvips'? > I'd like to promote one or the other exclusively, since the last > thing we need is Times-Roman being used in a bunch of different > incompatible ways . . . but I'm the author of afm2tfm so I feel a > bit biased. I think the 256-character restriction will force us to use Times Roman and other fonts with more than one mapping. 256 characters are not enough to cater for the letters and other symbols of all languages which can be written with Times Roman. > Not only that, but there is apparently no agreement on a standard > encoding; many people are pushing `ExtendedTeX'; others are pushing > `DC'; others are pushing `ISOLatin1'. It would be great to have a > standard on this too . . . There seems to be some confusion here. The `Extended TeX' encoding is the one agreed upon in Cork in 1990. It contains all letters present in ISO Latin 1 (all major Western European languages), ISO Latin 2 (all major Eastern European languages) and a few extra letters needed for Turkish. It does not contain some of the things which you'll find in ISO Latin 1, like 66 unused positions, $^1$, $^2$, $\pm$, $\mu$, ... Norbert Schwarz has created a set of fonts, based on Computer Modern, using this encoding. His fonts have names starting with `dc' rather than `cm'. That's why some people refer to this encoding as `DC'. As I said, 256 is not enough, so we'll need an afm2tfm (or some other program) which is flexible enough to handle more than one encoding for virtual fonts. We'll also need that for entirely different things like math symbols, `expert' character sets and non-Latin scripts. Since Adobe doesn't map all the characters in the fonts which have AdobeStandardEncoding, we'll also need another encoding, at the Post- Script level. Almost all of these fonts seem to contain 228 characters, including the unmapped ones. Thus, you can easily map all of them. I don't know why Adobe didn't do that in the first place. You may argue that 58 of the characters are composite, and don't need to be mapped. Well, that's not true for some fonts, like Adobe Garamond and Utopia. They have individually designed accented letters, like in traditional high-quality typography. So, I think we had better map all 228, and agree on a standard for how they should be mapped. This reencoding at the PostScript level should of course be done auto- matically by dvips (or some other program) and not be left to the user to hack up by hand for every font. From DHOSEK@HMCVAX.CLAREMONT.EDU Fri Apr 10 02:29:21 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA00485; Fri, 10 Apr 92 02:29:17 MDT Received: from FRIGGA.CLAREMONT.EDU by MATH.AMS.COM (PMDF #12735) id <01GINYJ19XWGC51CGC@MATH.AMS.COM>; Fri, 10 Apr 1992 01:55 EST Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GINS7NFYYW9JD298@HMCVAX.CLAREMONT.EDU>; Thu, 9 Apr 1992 22:54 PDT Date: Thu, 9 Apr 1992 22:54 PDT From: Don Hosek Subject: Re: VULCANO/AFM2TFM To: tex-implementors@MATH.AMS.COM Message-Id: <01GINS7NFYYW9JD298@HMCVAX.CLAREMONT.EDU> X-Vms-To: TEX_IMPLEMENTORS Issues of PostScript encoding should probably be best discussed on driv-l (@ tamvm1.tamu.edu). To subscribe send a message with the line SUBS DRIV-L (your full name) to listerv@tamvm1.tamu.edu There is some value in having a standard AFM2whatever utility. The best language for the "standard" program would be WEB or CWEB (I would not recommend any of the more "exotic" webs just because they aren't widely available). Also, AFM2TFM could be far simpler if it were AFM2PL; much of the porting difficulties I've seen come from reliably writing binary files. Writing ASCII PL files, however, is trivial: witness the brevity of change files for CWEB. -dh From bed_gdg@SHSU.edu Fri Apr 10 03:15:23 1992 Flags: 000000000001 Return-Path: Received: from Niord.SHSU.edu by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA00641; Fri, 10 Apr 92 03:13:21 MDT Received: by SHSU.edu (MX V3.1A) id 12206; Thu, 09 Apr 1992 19:22:08 CDT Date: Thu, 09 Apr 1992 10:08:07 CDT From: "George D. Greenwade" To: fuggle@sci.kun.nl Cc: info-tex@SHSU.edu, carlisle@cs.man.ac.uk, tex-archive@math.utah.edu Message-Id: <00958D75.DC0CBC80.12206@SHSU.edu> Subject: Re: Update to LONGTABLE.STY on FILESERV/Niord On Thu, 9 Apr 92 12:31:36 +0200, Ronald Kappert posted privately to me: > could it be that the file STY.LONGTABLE is corrupt? On ftp-ing, I get > > [OK until this part] > > % \begin{macrocode} > \newskip\LTpre \LTprF!in{macro}{\?LTchunksize} [junk lines deleted -- GDG] > [Lot of lines containing nothing] > > I tried to ftp the file multiple times, through at least two different > channels, but the file seems corrupt. Could you check? Ooooops, when I verified the file, I used the prior version. Yes, this file is corrupt (around lines 471 to 548, give or take). David is being cc'ed on this. I am requesting (1) that he re-submit the file to me, and (2) that anyone who has retrieved the file please forgive me for not following our local policy on checking files prior to placing them in the archives (i.e., rename the old version prior to testing before doing any checks on the revised submission). I am immediately placing a notice in the file stating that it is not available (i.e., we will not be distributing any versions of this until a correct one gets here). As soon as I have a non-corrupt file in place (yes, I will check and double check every file from now on), I will re-announce it. Sorry for any and all bandwidth wasted in the announcement, as well as retrievals, as well as time you may have used. I will make every effort to send the corrected file to those I have FILESERV records on between the time of the announcement yesterday and the present time. Regards and apologies once again, George %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% George D. Greenwade, Ph.D. Bitnet: BED_GDG@SHSU Department of Economics and Business Analysis THEnet: SHSU::BED_GDG College of Business Administration Voice: (409) 294-1266 P. O. Box 2118 FAX: (409) 294-3612 Sam Houston State University Internet: bed_gdg@SHSU.edu Huntsville, TX 77341 bed_gdg%SHSU.decnet@relay.the.net %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% From DHOSEK@HMCVAX.CLAREMONT.EDU Fri Apr 10 10:05:01 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA03818; Fri, 10 Apr 92 10:04:59 MDT Received: from FRIGGA.CLAREMONT.EDU by MATH.AMS.COM (PMDF #12735) id <01GIOJJUUJA8COZHPM@MATH.AMS.COM>; Fri, 10 Apr 1992 11:57 EST Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GIOD8JR9LC9JD2RD@HMCVAX.CLAREMONT.EDU>; Fri, 10 Apr 1992 08:56 PDT Date: Fri, 10 Apr 1992 08:56 PDT From: Don Hosek Subject: Re: VULCANO/AFM2TFM To: spqr@minster.york.ac.uk, tex-implementors@MATH.AMS.COM Message-Id: <01GIOD8JR9LC9JD2RD@HMCVAX.CLAREMONT.EDU> X-Vms-To: IN%"spqr@minster.york.ac.uk" X-Vms-Cc: TEX_IMPLEMENTORS From: spqr@minster.york.ac.uk -Don Hosek writes: - > Issues of PostScript encoding should probably be best discussed - > on driv-l (@ tamvm1.tamu.edu). To subscribe send a message with - > the line - > SUBS DRIV-L (your full name) - > to listerv@tamvm1.tamu.edu -aaaaargh another list i cant bear itt.... - > There is some value in having a standard AFM2whatever utility. -its ironic that the current state of afm2tfm is due to Don Knuth. If -he cant be bothered to code in WEB anymore. if, of course, you accept -CWEB, then you admit that people have C compilers. in which case -afm2tfm as it stands is pretty good as a basis. - > The best language for the "standard" program would be WEB or CWEB - > (I would not recommend any of the more "exotic" webs just because - > they aren't widely available). Also, AFM2TFM could be far simpler - > if it were AFM2PL; much of the porting difficulties I've seen - > come from reliably writing binary files. Writing ASCII PL files, -but when the whole world uses Unix etc etc etc yawn A few points: First, the whole world does _not_ use Unix. At last count over 50% of TeX users used TeX on a PC. VMS has a high number of TeX users. So does the Macintosh. Second, CWEB is superior to straight C if for no other reason than change files. If straight C were so wonderful, dvips for VMS would be the current version instead of 5.4 On the other hand, TeX for VMS is at the current version within days of receipt of the new WEB. It's very useful to separate the system-dependent portions of the code if for no other reason than the primary code maintainer need not be concerned with them. -dh From CET1@phx.cam.ac.uk Fri Apr 10 10:28:43 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA03997; Fri, 10 Apr 92 10:27:59 MDT Received: from snow.csi.cam.ac.uk by MATH.AMS.COM (PMDF #12735) id <01GIOK7TG3FKCOZHRB@MATH.AMS.COM>; Fri, 10 Apr 1992 12:15 EST Received: from phx.cam.ac.uk by ppsw1.cam.ac.uk with NIFTP (PP-6.0) as ppsw.cam.ac.uk id <24377-0@ppsw1.cam.ac.uk>; Fri, 10 Apr 1992 17:15:07 +0100 Date: Fri, 10 Apr 92 17:14:58 BST From: Chris Thompson Subject: Consequences of the change to cmbase.mf To: tex-implementors@MATH.AMS.COM Message-Id: Don Knuth advises in his descriptions of the 16Mar92 changes to the cm fonts that all bitmaps should be regenerated. I can confirm that every cm font at every mode and magnification I have tried has some pixels changed. What might easily be forgotten (certainly I nearly did) is that there are many other fonts that use cmbase.mf, and the bitmaps of these are also affected. The list includes: AMSFonts: all extracm fonts (of course!) all Cyrillic fonts all symbol fonts the euex* fonts in the Euler set LaTeX fonts: the la(b)sy* fonts, at least As for the cm fonts, none of the TFM files change. I have met only one problem so far: wncyi5 in mode=lowres;mag=1 gives a "strange path" error in character 127 with the new cmbase.mf, but not with the old one. (The bitmaps generated for this character, after ignoring the error, are actually unaltered---and it is a pretty undecipherable splodge at this resolution anyway.) Chris Thompson Cambridge University Computing Service JANET: cet1@uk.ac.cam.phx Internet: cet1@phx.cam.ac.uk From postmaster@MATH.AMS.COM Sun Apr 12 22:31:45 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA18795; Sun, 12 Apr 92 22:31:43 MDT Received: from MATH.AMS.COM by MATH.AMS.COM (PMDF #12735) id <01GIS2HDRAK2COZLNX@MATH.AMS.COM>; Mon, 13 Apr 1992 00:31 EST Date: Mon, 13 Apr 1992 00:31 EST From: PMDF Mail Server Subject: Undeliverable mail: temporarily unable to deliver To: postmaster@MATH.AMS.COM, beebe@math.utah.edu Message-Id: <01GIS2HDRAK2COZLNX@MATH.AMS.COM> Your message could not be delivered to: dan-tex@stealth.acf.nyu.edu Your message has been enqueued and undeliverable for 3 days. The mail system will continue to try to deliver your message for an additional 9 days. The beginning of your message follows: Received: from math.utah.edu (csc-sun.math.utah.edu) by MATH.AMS.COM (PMDF #12735) id <01GIN98K6374C516N0@MATH.AMS.COM>; Thu, 9 Apr 1992 13:51 EST Received: from solitude.math.utah.edu by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA15784; Thu, 9 Apr 92 11:50:29 MDT Date: Thu, 9 Apr 92 11:50:29 MDT From: "Nelson H. F. Beebe" Subject: PostScript font encodings To: tex-implementors@MATH.AMS.COM Cc: beebe@math.utah.edu Message-id: X-Us-Mail: "Center for Scientific Computing, South Physics, University of Utah, Salt Lake City, UT 84112" X-Telephone: (801) 581-5254 X-Fax: (801) 581-4148 Jan Michael Rynning observes that Adobe has numerous fonts with unassigned characters, leaving it up to implementors and users to define suitable mappings; these introduces a portability problem when the mappings aren't standardly defined. He also observed that 256 is a small number when it comes to dealing with even the European languages. What he did not observe is that some Adobe PostScript fonts have MORE than 256 characters. From hisato-h@ascii.co.jp Wed Apr 29 21:55:47 1992 Flags: 000000000001 Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA23677; Wed, 29 Apr 92 21:55:46 MDT Return-Path: hisato-h@ascii.co.jp Received: from ascwide.ascii.co.jp (133.152.32.11) by MATH.AMS.COM (PMDF #12735) id <01GJFS1KS6A8D7Q24G@MATH.AMS.COM>; Wed, 29 Apr 1992 23:51 EST Received: from ascgw.ascii.co.jp by ascwide.ascii.co.jp (5.65/2.7W-ascwide/2.1) with SMTP id AA27204; Thu, 30 Apr 92 13:22:38 +0900 Received: from pbguru by ascgw.ascii.co.jp (5.65/6.4J.6) id AA19971; Thu, 30 Apr 92 12:46:30 +0900 Received: by pbguru.ascii.co.jp (5.51/6.4J.6) id AA22116; Thu, 30 Apr 92 12:49:13 JST Date: Thu, 30 Apr 92 12:49:12 JST From: Hamano Hisato Subject: Cancel subscription please. To: tex-implementors@MATH.AMS.COM Message-Id: <9204300349.AA22116@pbguru.ascii.co.jp> Could you remove me from the list please? -- Hamano Hisato Editorial Automation Publishing Division hisato-h@ascii.co.jp ASCII corporation From schoepf@sc.ZIB-Berlin.DE Tue May 19 04:56:42 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA08617; Tue, 19 May 92 04:56:38 MDT Received: from serv02.ZIB-Berlin.DE by MATH.AMS.COM (PMDF #041B2) id <01GK6PEX4W0GE2XLU2@MATH.AMS.COM>; Tue, 19 May 1992 06:37:14 EST Received: from dagobert.ZIB-Berlin.DE by serv02.ZIB-Berlin.DE (4.0/SMI-4.0-serv02/7.11.91 ) id AA18759; Tue, 19 May 92 12:24:19 +0200 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/31.1.91) id AA26474; Tue, 19 May 92 12:24:15 +0200 Date: 19 May 1992 12:24:14 +0200 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: Future developments of TeX To: tex-implementors@MATH.AMS.COM Message-Id: <9205191024.AA26474@sc.zib-berlin.dbp.de> Content-Transfer-Encoding: 7BIT What is to come after TeX? -------------------------- For quite a while people have been discussing the question of what is to come after TeX---if there can be something at all. To collect ideas and to guide these ideas into one effort the german speaking TeX user's group DANTE e.V. decided to start a project for future developments of TeX. To avoid misunderstandings right from the beginning: This is not meant to be a private enterprise or even something to be led by the germans or even europeans. It is meant as a cooperation of all those who are interested in such a project. Every one is invited to contribute ideas and demands and to work in this direction. It is planned, but not fixed, to have three stages: 1. stage: What should be changed? 2. stage: How can this be realised? 3. stage: Realisation. At the annual meeting of DANTE e.V. at Hamburg some weeks ago the project was discussed and officially started by Joachim Lammarsch, the president of DANTE e.V. He asked me to serve as the technical co-ordinator of the project, to which I agreed. The name of the project is NTS, for New Typesetting System. (You might find it funny to look at the last three words of the very first line in the TeXbook by Donald E. Knuth. Thanks to Kresten Krab Thorup for pointing this out to me.) As a first action, and to open the discussion, we set up a mailing list at Heidelberg. To subscribe, send a message to LISTSERV@VM.URZ.Uni-Heidelberg.De containing SUBSCRIBE NTS-L To sign off the list, send a message to LISTSERV@VM.URZ.Uni-Heidelberg.De containing UNSUBSCRIBE NTS-L Messages to go via the list have to be addressed to NTS-L@VM.URZ.Uni-Heidelberg.De In a short time, I will send around a somewhat longer message that contains some ideas, to serve as a starting point in the discussion. Rainer Sch\"opf Schoepf@sc.ZIB-Berlin.de From schoepf@sc.ZIB-Berlin.DE Sun May 24 12:21:20 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA28074; Sun, 24 May 92 12:21:16 MDT Received: from serv02.ZIB-Berlin.DE by MATH.AMS.COM (PMDF #041B2) id <01GKE503WP00E2Y0WB@MATH.AMS.COM>; Sun, 24 May 1992 14:10:21 EST Received: from dagobert.ZIB-Berlin.DE by serv02.ZIB-Berlin.DE (4.0/SMI-4.0-serv02/7.11.91 ) id AA06427; Sun, 24 May 92 20:05:43 +0200 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/31.1.91) id AA04211; Sun, 24 May 92 20:05:39 +0200 Received: by quattro.ZIB-Berlin.DE (4.1/SMI-4.1) id AA18198; Sun, 24 May 92 20:05:11 +0200 Date: 24 May 1992 20:05:39 +0200 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: \input vs. \openin To: TeX-Implementors@MATH.AMS.COM Cc: Mittelbach@mzdmza.zdv.uni-mainz.de, CA_ROWLEY@VAX.ACS.OPEN.AC.UK, J.L.Braams@research.ptt.nl Reply-To: Schoepf@sc.ZIB-Berlin.DE Message-Id: <9205241805.AA04211@sc.zib-berlin.dbp.de> Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Content-Transfer-Encoding: 7BIT A while ago, this list saw a discussion of whether \openin should look for the file in question only in the current directory, or use some sort of search path, like \input does. The discussion ended after Barbara Beeton sent around the following [ comment by DEK: My current opinion is: If the operating system allows users to define a ``custom'' search path at run time, then both \input and \openin should be able to use it, although I would hope that people don't use \openin for `system' files but only for files they tend to control themselves. If the operating system is like WAITS (on which I developed TeX), where there's no decent way to provide a clue to TeX at runtime about a nonstandard search path, then I would provide access to the main system macro files (like plain.tex and webmac.tex) only for \input not \openin; I would use the same strategy to search user's personal files for both \input and \openin. I have found it _very_ useful with UNIX to put `..' on the standard search path. Then I can create a subdirectory called say _pages_ and cd to pages, on which I can run TeX/MF with some temporary changes to input fies and I won't clobber any of the master files or the parent directory. My applications of this idea would fail if \openin didn't also look at .. directory when unable to find an . directory. ] I want to inform everyone that LaTeX3 will actually *rely* on the fact that \openin searches the same directories as \input does. LaTeX3 will automatically generate for every document style a compact, less human readable version, if it doesn't exist already. Since the only way to test for its existence is to use the sequence \openin \ifeof, we face two choices: a) If \openin looks into the system area, these compact versions are written only once, into the system area, b) otherwise they have to be (and will be) written in every user's directory. The second solution is clearly unacceptable. Therefore we urge all implementors of TeX systems to follow DEK's suggestion and have \openin look through the search path has well. Thank you. For the LaTeX3 project, Rainer M. Sch"opf From schoepf@sc.ZIB-Berlin.DE Tue May 26 09:37:38 1992 Flags: 000000000001 Return-Path: Received: from MATH.AMS.COM by math.utah.edu (4.1/SMI-4.1-utah-csc-server) id AA14811; Tue, 26 May 92 09:37:35 MDT Received: from serv02.ZIB-Berlin.DE by MATH.AMS.COM (PMDF #041B2) id <01GKGROAIBAOE2Y38E@MATH.AMS.COM>; Tue, 26 May 1992 11:30:09 EST Received: from dagobert.ZIB-Berlin.DE by serv02.ZIB-Berlin.DE (4.0/SMI-4.0-serv02/7.11.91 ) id AA19200; Tue, 26 May 92 17:18:53 +0200 Received: from quattro.ZIB-Berlin.DE by dagobert.ZIB-Berlin.DE (4.1/SMI-4.0/31.1.91) id AA06776; Tue, 26 May 92 17:18:50 +0200 Received: by quattro.ZIB-Berlin.DE (4.1/SMI-4.1) id AA20574; Tue, 26 May 92 17:18:17 +0200 Date: 26 May 1992 17:18:50 +0200 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Subject: Re: \input vs. \openin To: TeX-implementors@MATH.AMS.COM Reply-To: Schoepf@sc.ZIB-Berlin.DE Message-Id: <9205261518.AA06776@sc.zib-berlin.dbp.de> Organization: Konrad-Zuse-Zentrum fuer Informationstechnik Berlin Content-Transfer-Encoding: 7BIT It seems that wasn't clear enough in my last message. When I wrote:a) If \openin looks into the system area, these compact versions are written only once, into the system area, I meant that these files are written during installation of the system, not by every user. My main argument was that this is useless if \openin doesn't look in the system area. Rainer Sch\"opf