# Charspace

Charspace lets you add side bearings (the blank spaces on either side of a character) to a bitmap font. This is necessary because scanned images typically do not include side bearing information, and therefore Imageto (see section Imageto) cannot determine it.

The input is a bitmap (GF or PK) font, together with one or more CMI files (see section CMI files), which specify character metric information. If a corresponding TFM file exists, it is read to get default values for the character dimensions (Charspace promptly overwrites the widths). The output is a TFM file and (typically) a revised GF file with the new width information.

The basic idea for Charspace came from Harry Smith, via Walter Tracy's book Letters of Credit. See `charspace/README' for the full citation.

## Charspace usage

Charspace makes no attempt to be intelligent about the side bearings it computes; it just follows the instructions in the CMI files.

The CMI files must be created by human hands, since the information they contain usually cannot be determined automatically. See the next section for the details on what CMI files contain.

We supply one CMI file, `common.cmi' (distributed in the `data' directory), which defines more-or-less typeface-independent definitions for most common characters. Charspace reads `common.cmi' before any of the CMI files you supply, so your definitions override its.

`common.cmi' can be used for all typefaces because its definitions are entirely symbolic; therefore, your CMI file must define actual values for the identifiers it uses. For example, `common.cmi' defines the right side bearing of `K' to be `uc-min-sb`; you yourself must define `uc-min-sb`.

You must also define side bearings for characters not in `common.cmi'. And you can redefine side bearings that are in `common.cmi', if you find its definitions unsuitable.

Once you have prepared a CMI file, you can run Charspace, e.g.:

```charspace -verbose -encoding=enc-file fontname.dpi \
-output-file=out-fontname
```

where enc-file specifies the encoding, fontname the input font, dpi the resolution, and out-fontname the name of the output font.

With these options, Charspace will write files `out-fontname.tfm' and `out-fontname.dpigf'. You can then run TeX on `testfont.tex', telling TeX to use the font out-fontname. This produces a DVI file which you can print or preview as you usually do with TeX documents.

This will probably reveal problems in your CMI file, e.g., the spacing for some characters or character combinations will be poor. So you need to iterate.

However, if you are planning to eventually run your bitmap font through Limn (see section Limn) and BZRto (see section BZRto) to make an outline font, there's little point in excessively fine-tuning the spacing of the original bitmap font. The reason is that the generated outline font will inevitably rasterize differently than the original bitmaps, and the change in character shapes will almost certainly affect the spacing.

## CMI files

Character metric information (CMI) files are free-format text files which (primarily) describe the side bearings for characters in a font. Side bearings are the blank spaces to the left and right of a character which makeprinted type easier to read, as well as more pleasing visually.

In addition to side bearing definitions, CMI files can also contain kerns, which insert or remove space between particular letter pairs; and font dimensions, global information about the font which is stored in the TFM file (see section TFM fontdimens).

If your font is named `foo.300gf' (or `... pk'), it is customary to name the corresponding CMI file `foo.300cmi'. That is what Charspace looks for by default. If you name it something else, you must use the `-cmi-files' option to tell Charspace its name. It is reasonable to use the resolution as part of the CMI filename, since the values written in it are (for the most part) in pixels.

See section Common file syntax, for a precise description of syntax elements common to all data files processed by these programs, including comments.

In the following sections, we describe the individual commands, the tokens that comprise them, and the way Charspace processes them.

### CMI tokens

Tokens in a CMI file are one of the following.

1. A numeric constant consists of (in order) an optional sign, zero or more digits, an optional decimal point, and zero or more digits--but at least one digit must be present. For example, `+0', `-0', `0', `.0', and `-0.0' are all valid ways to write the number zero.
2. A string constant consists of all characters between two double-quote characters `"'. We made no provision for quoting `"', because our particular uses for string constants never need quote characters.
3. A comma is a self-terminating token. It serves merely to separate two expressions.
4. An identifier is any number of characters starting with a non-whitespace character (whitespace being defined by the C facility `isspace`) not listed above, and terminated by a whitespace character. In some contexts, an identifier is taken as a character name---a name from the encoding file Charspace is using, either the default or one you specified with `-encoding' (see section Invoking Charspace). See section Encoding files, for the definition of encoding files. In all other cases, identifiers are internal to Charspace. The particular commands describe the semantics which apply to them. Some identifiers are reserved, i.e., they cannot be used in any context except as described in the following sections. Reserved words are always shown in typewriter type.

An expression in a CMI file is one of: a number, an identifier, or a number followed by an identifier. This last, as in `.75 foo', denotes multiplication.

### `char` command

The `char` command specifies both side bearings for a single character. It has the form:

```char charname expr1 , expr2
```

where:

charname
is a character name from the font encoding. See section Invoking Charspace, for how to specify the encoding file.
expr1
expr2
specify the left and right side bearings, in pixels, respectively: the character widths in the output TFM and GF files are @math{expr1 + expr2 + `width` (charname)}. If these expressions contain identifiers, the values of those identifiers are not resolved until after Charspace has read all the CMI files.

Giving the side bearings symbolically is useful when the character definition is intended to be used for more than one typeface. For example, `common.cmi' (see section Charspace usage) contains:

```char K H-sb , uc-min-sb
char L H-sb , uc-min-sb
```

Then the CMI file you write for a particular font can define `H-sb` and `uc-min-sb`, and not have to redefine the side bearings for `K` and `L`.

### `char-width` command

The `char-width` command specifies the set width and left side bearing as a percentage of the total remaining space for a single character. It has the form:

```char-width charname width-expr , lsb-%-expr
```

where:

charname
is a character name from the font encoding. See section Invoking Charspace, for how to specify the encoding file.
width-expr
specifies the set width of the character in pixels. The set width is the sum of the bitmap width, left side bearing, and right side bearing.
lsb-%-expr
specifies the left side bearing as a percentage of width-expr minus the bitmap width of the character. Expressing the lsb as a percentage means that you need not think about the width of the character image: if you want to center a character, for example, `.5' for lsb-%-expr will always work.

The `char-width` command is useful when you want a character to have a particular set width, since it's much simpler to specify that width and the left side bearing (and let the program compute the right side bearing) than to somehow estimate the bitmap width and then choose the side bearings to add up to the desired set width.

For example, in most fonts, the numerals all have the same width, to ease typesetting of columns of them in tables. Thus, `common.cmi' defines `eight` (the name for the numeral `8') as follows:

```char-width eight numeral-width , eight-lsb-percent
```

Since the numeral width is traditionally one-half the em width of the font, `common.cmi' defines `numeral-width` as `enspace`, which in turn is defined to be half the `quad` fontdimen.

`eight-lsb-percent` is defined to be `.5', thus centering the `8'.

The other numerals are also defined to have width `numeral-width`, but the `lsb-percent`s vary according to the character shapes.

### `define` command

The `define` command defines an identifier as a number. This is useful to give a symbolic name to a constant used in more than one character or fontdimen definition, for ease of change. It has the form:

```define id expr
```

The identifier id is defined to be the expression expr. Any previous definition of id is replaced. The id can be used prior to the `define` command; Charspace doesn't try to resolve any definitions in the CMI files until after all files have been read.

### `kern` command

The `kern` command defines a space to insert or remove between two particular characters. The kerning information is written only to the TFM file. It has the form:

```kern name1 name2 expr
```

where name1 and name2 are character names, as in the `char` command (see section `char` command), and expr is the amount of the kern in pixels.

For example:

```kern F dot -7.5
```

would put an entry in the TFM file's kerning table such that when TeX typesets a `F' followed by a `.', it inserts an additional space equivalent to @math{-7.5} pixels in the resolution of Charspace's input font, i.e., it moves the two characters closer together.

### `codingscheme` command

The `codingscheme` command defines the encoding scheme to be used for the output files. (See section Encoding files, for a full description of font encodings.) It has the form:

```codingscheme string-constant
```

where string-constant is a coding scheme string; for example, `"GNU Latin text"'. This string is looked up in the data file `encoding.map' to find the name of the corresponding encoding file (see section Coding scheme map file).

### `fontdimen` command

The `fontdimen` command defines a font parameter to be put in the TFM file. It has the form:

```fontdimen fontdimen-name expr
```

where fontdimen-name is any of the fontdimen names listed in the section below, and expr gives the new value of the fontdimen, in pixels.

For example, `common.cmi' (see section Charspace usage) makes the following definitions:

```fontdimen quad designsize
```

This defines the fontdimen `quad`, which determines the width of the `em` dimension in TeX, to be the same as the design size of the font. (This is traditionally the case, although it is not a hard-and-fast rule.) Then it defines the fontdimen `space`, which is the normal interword space in TeX, to be one-third of the quad.

Because of the way that Charspace processes the CMI files (see section CMI processing), if you redefine the `quad` fontdimen in another CMI file, the value of `space` will change correspondingly.

The section below lists all the TFM fontdimen names Charspace recognizes, and their meaning to TeX.

#### TFM fontdimens

This section lists all the TFM fontdimens recognized by these programs: all those recognized by TeX, plus a few others we thought would prove useful when writing TeX macros.

A fontdimen is an arbitrary number, in all cases but one (`slant`, see below) measured in printer's points, which is associated with a particular font. Their values are stored in the TFM file for the font. We also refer, context permitting, to fontdimens as "font parameters", or simply "parameters".

Fontdimens affect many aspects of TeX's behavior: the interword spacing, accent placement, and math formula construction. The math fontdimens in particular are fairly obscure; if you don't have a firm grasp on how TeX constructs math formulas, the explanations below will probably be meaningless to you, and--unless you're making a font for math typesetting--can be ignored.

The `common.cmi' file which Charspace reads sets reasonable defaults for the fontdimens relevant to normal text typesetting.

When TeX (or other programs) scale a font, its fontdimen values are scaled proportionally to the design size. For example, suppose the designsize of some font f is 10pt, and some fontdimen in f has the value 7.5pt. Then if the font is used scaled to 20pt, the fontdimen's value is scaled to 15pt.

You can get the table of fontdimen values in a particular TFM file by running the standard TeX utility program PLtoTF and inspecting its (human-readable text) output.

In our programs and in PLtoTF, fontdimens are typically shown by their names. But each also has a number, starting at 1. You can use either the number or the name on the command line (in the argument to the `-fontdimens' option). The numbers are given in parentheses after the name in the table below.

In a few cases (fontdimens 8--13), the same number fontdimen has two different names, and two different meanings. This does not cause problems in practice, because these fontdimens are used only in the TeX math symbol and math extension fonts, which TeX can distinguish via its "math families" (see The TeXbook for the details).

`slant (1)`
Unlike all other fontdimens, the `slant` parameter is not scaled with the font when it is loaded. It defines the "slant per pt" of the font; for example, a `slant` of 0.2 means a 1pt-high character stem would end 0.2pt to the right of where it began. This value is typical for slanted or italic fonts; for normal upright fonts, `slant` is zero, naturally. TeX uses this to position accents.
`space (2)`
The `space` parameter defines the normal interword space of the font. This is typically about one-third of the design size, but it varies according to the type design: a narrow, spiky typeface will have a small interword space relative to a wide, regular one. Exception: in math fonts, the interword space is zero.
`stretch (3)`
The `stretch` parameter defines the interword stretch of the font. This is typically about one-half of the `space` parameter. TeX is reluctant to increase interword spacing beyond the width @math{`space` + `stretch`}. In monospaced fonts, the stretch is typically zero.
`shrink (4)`
The `shrink` parameter defines the interword shrink of the font. This is typically about one-third of the `space` parameter. TeX does not decrease interword spacing beyond the width @math{`space` - `shrink`}. In monospaced fonts, the shrink is typically zero.
`xheight (5)`
The `xheight` parameter defines the x-height of the font, i.e., the main body size. The height of the lowercase `x' is often used for this, since neither the top nor the bottom of `x' are curves. There is no hard-and-fast rule in TeX that the x-height must equal the height of `x', however. This fontdimen defines the value of the `ex` dimension in TeX. TeX also uses this to position: it assumes the accents in the font are properly positioned over a character that is exactly 1ex high.
`quad (6)`
The `quad` fontdimen defines the value of the `em` dimension in TeX. This is often the same as the design size of the font, but as usual, that's not an absolute requirement. Typesetters often use `em`s and `ex`s instead of hardwiring dimensions in terms of (say) points; that way, experimenting with different fonts for a particular job does not require changing the dimensions.
`extraspace (7)`
The `extraspace` fontdimen defines the space TeX puts at the end of sentence. (Technically, when the `\spacefactor` is 20000 or more.) This is typically about one-sixth of the normal interword space.
`num1 (8)`
(Sorry, we haven't written a description of the math fontdimens yet.)
`num2 (9)`
`num3 (10)`
`denom1 (11)`
`denom2 (12)`
`sup1 (13)`
`sup2 (14)`
`sup3 (15)`
`sub1 (16)`
`sub2 (17)`
`supdrop (18)`
`subdrop (19)`
`delim1 (20)`
`delim2 (21)`
`axisheight (22)`
`defaultrulethickness (8)`
`bigopspacing1 (9)`
`bigopspacing2 (10)`
`bigopspacing3 (11)`
`bigopspacing4 (12)`
`bigopspacing5 (13)`
`leadingheight (23)`
The `leadingheight` parameter defines the height component of the recommended leading for this font. Leading is the baseline-to-baseline distance when setting lines of type. TeX does not automatically use this fontdimen, and the standard TeX fonts do not define it, but you may wish to include it in new fonts for the benefit of future TeX macro. This fontdimen is a GNU extension.
`leadingdepth (24)`
The `leadingdepth` parameters defines the depth of the recommended leading for this font. See `leadingheight` directly above. This fontdimen is a GNU extension.
`fontsize (25)`
The `fontsize` parameter is the design size of the font. This is needed for TeX macros to find the font's design size. This fontdimen is a GNU extension.
`version (26)`
The `version` parameter identifies a particular version of the TFM file. Whenever the character dimensions, kerns, or ligature table for a font changes, it is good to increment the version number. It is also good to keep such changes to a minimum, since they can change the line breaks and page breaks in documents typeset with previous versions. This fontdimen is a GNU extension.

### CMI processing

Here are some further details on how Charspace processes the CMI files:

• Charspace uses a single namespace; i.e., each defined identifier, whether it be a character name, an internal identifier, a fontdimen name, or whatever, is stored in the same table. Furthermore, Charspace does not complain, or even warn, about redefinition of identifiers: as we build up CMI files to be shared among different fonts, we felt such redefinition would be common.
• Charspace does not insist that identifiers be used before they are defined. For example, the following sequence:
```define foo bar
define bar 1.0
char A foo , bar
```
is valid, and defines both side bearings of `A' to be 1.0. (See the preceding sections for the definition of the various commands allowed in CMI files.)
• Charspace only tries to resolve the definitions of those identifiers which are actually used to produce the output files (i.e., those in a sidebearing definition, a kern value, or a fontdimen value). Thus, something like
```define foo bar
```
will elicit no complaint, if `foo' is not needed to make the output files.
• Charspace reads the contents of all the CMI files before attempting to resolve any definitions. Thus, it is the last definition which counts. For example:
```define bar 100
define foo 2 bar
define bar 1
char A foo , foo
```
defines both side bearings of `A' to be 2, not 200.
• Charspace predefines one identifier, `designsize`, to be the design size of the input font (in pixels). It can be redefined like any other identifier.

If you can read programs in the C language, you may find it instructive to examine the implementation of CMI file processing in the source files `charspace/char.c' and `charspace/cmi.y'. The source provides the full details of CMI processing.

## Invoking Charspace

This section describes the options that Charspace accepts. See section Command-line options, for general option syntax.

The root of the main input fontname is called font-name below.

`-cmi-files file1,file2,...'
read the CMI files `file1.dpicmi', `file2.dpicmi', etc., where dpi is the resolution of the main input font. Default is to read `font-name.dpicmi'. The `.dpicmi' is not appended to any of the files which already have a suffix. `common.cmi' is read before any of these files.
`-dpi unsigned'
The resolution, in pixels per inch. See section Common options.
`-encoding enc-file'
The encoding file to read for the mapping between character codes in the input font and character names. See section Encoding files. If enc-file has no suffix, `.enc' is appended. The default is to read the encoding file specified via the `codingscheme` command (see section `codingscheme` command). If a TFM file `font-name.tfm' exists, it is also read for default ligature, headerbyte, and fontdimen information. Definitions in the CMI files override those in such a TFM file.
`-fontdimens fd1:value1,fd2:value2,...'
See section TFM fontdimens.
`-help'
Print a usage message. See section Common options.
`-no-gf'
Don't output a revised GF file. This is primarily useful while debugging the TFM output, since without a bitmap font to match the TFM output, you can't actually print anything reliably.
`-output-file filename'
If filename does not have a suffix, write the output to `filename.tfm' and (if `-no-gf' was not specified) `filename.dpigf'. If this would overwrite an input file, prepend an `x' to the output name. If filename has a suffix, and `-no-gf' was not specified, Charspace complains and gives up, since it can't output two files with the same name. By default, use the name of the main input font for filename.
`-range char1-char2'
Only output characters with codes between char1 and char2, inclusive. (See section Common options, and section Specifying character codes.)
`-verbose'
Output progress reports.
`-version'
Print the version number.
`-xheight-char code'
Use the TFM height of code for the `xheight` fontdimen (see section TFM fontdimens); default is 120 (ASCII `x'). (It is reasonable to use 120 instead of whatever `x' is in the underlying character set because most font encoding schemes are based on ASCII regardless of the host computer's character set.)