mkid program builds the ID database. To do this it must scan
each of the files included in the database. This takes some time, but
once the work is done the query programs run very rapidly.
mkid program knows how to scan a variety of of files. For
example, it knows how to skip over comments and strings in a C program,
only picking out the identifiers used in the code.
Identifiers are not the only thing included in the database. Numbers are also scanned and included in the database indexed by their binary value. Since the same number can be written many different ways (47, 0x2f, 057 in a C program for instance), this feature allows you to find hard coded uses of constants without regard to the radix used to specify them.
All the places in this document where identifiers are written about should really mention identifiers and numbers, but that gets fairly clumsy after a while, so you should always keep in mind that numbers are included in the database as well as identifiers.
-Soption is used to specify arguments to the various language scanners. See section Scanner Arguments, for details.
-by itself means read arguments from stdin.
ID(in the current directory), but you may specify any name. The file names stored in the database will be stored relative to the directory containing the database, so if you move the database after creating it, you may have trouble finding files unless they remain in the same relative position.
-uoption updates an existing database by rescanning any files that have changed since the database was written. Unfortunately you cannot incrementally add new files to a database.
Scanner arguments all start with
-S. Scanner arguments are used to tell
mkid which language scanner to use for which files, to pass language
specific options to the individual scanners, and to get some limited
online help about scanner options.
Mkid usually determines which language scanner to use on a file
by looking at the suffix of the file name. The suffix starts at the last
`.' in a file name and includes the `.' and all remaining
characters (for example the suffix of `fred.c' is `.c'). Not
all files have a suffix, and not all suffixes are bound to a specific
language by mkid. If
mkid cannot determine what language a file
is, it will use the language bound to the `.default' suffix. The
plain text scanner is normally bound to `.default', but the
-S option can be used to change any language bindings.
There are several different forms for scanner options:
Mkiddetermines which language scanner to use on a file by examining the file name suffix. The `.' is part of the suffix and must be specified in this form of the
-Soption. For example `-S.y=c' tells
mkidto use the `c' language scanner for all files ending in the `.y' suffix.
Mkidhas several built in suffixes it already recognizes. Passing a `?' will cause it to print the language it will use to scan files with that suffix.
-Soption is used to pass arbitrary arguments to the language scanners.
-S<new language>/<builtin language>/<filter command>
If you run
mkid -S?=? you will find bindings for a number of
languages; unfortunately pascal, though mentioned in the list, is not
actually supported. The supported languages are documented below
The C scanner is probably the most popular. It scans identifiers out of C programs, skipping over comments and strings in the process. The normal `.c' and `.h' suffixes are automatically recognized as C language, as well as the more obscure `.y' (yacc) and `.l' (lex) suffixes.
-S options recognized by the C scanner are:
$in identifiers, so you could say
-Sc-s$to accept that dialect).
The plain text scanner is designed for scanning documents. This is
typically the scanner used when adding custom scanners, and several
custom scanners are built in to
mkid and defined in terms of filters
and the text scanner. A troff scanner runs
deroff over the file
then feeds the result to the text scanner. A compressed man page scanner
pcat piped into
col -b, and a TeX scanner runs
Assemblers come in several flavors, so there are several options to control scanning of assembly code:
There are two ways to add new scanners to
mkid. The first is to
modify the code in `getscan.c' and add a new `scan-*.c' file
with the code for your scanner. This is not too hard, but it requires
relinking and installing a new version of
mkid, which might be
inconvenient, and would lead to the proliferation of
The second technique uses the
-S option to specify a new language scanner. In this form
the first language is the name of the new language to be defined,
the second language is the name of an existing language scanner to
be invoked on the output of the filter command specified as the
third component of the
The filter is an arbitrary shell command. Somewhere in the filter string,
%s should occur. This
%s is replaced by the name of the
source file being scanned, the shell command is invoked, and whatever
comes out on stdout is scanned using the builtin scanner.
For example, no scanner is provided for texinfo files (like this one). If I wished to index the contents of this file, but avoid indexing the texinfo directives, I would need a filter that stripped out the texinfo directives, but left the remainder of the file intact. I could then use the plain text scanner on the remainder. A quick way to specify this might be:
'-S/texinfo/text/sed s,@[a-z]*,,g < %s'
This defines a new language scanner (texinfo) defined in terms of
sed command to strip out texinfo directives (at signs followed
by letters). Once the directives are stripped, the remaining text is run
through the plain text scanner.
This is just an example, to do a better job I would actually need to
delete some lines (such as those beginning with
@end) as well
as deleting the
@ directives embedded in the text.
The simplest example of
mkid is something like:
This will build an ID database indexing all the
identifiers and numbers in the `.c', `.h', and `.y' files
in the current directory. Because those suffixes are already known to
mkid as C language files, no other special arguments are required.
From a simple example, lets go to a more complex one. Suppose you want
to build a database indexing the contents of all the man pages.
mkid already knows how to deal with `.z' files, let's
assume your system is using the
compress program to store
compressed cattable versions of the man pages. The
compress program creates files with a
.Z suffix, so
mkid will have to be told how to scan `.Z' files. The
following code shows how to combine the
find command with the
special scanner arguments to
mkid to generate the required ID
cd /usr/catman find . -name '*.Z' -print | mkid '-Sman/text/uncompress -c < %s' -S.Z=man -
This example first switches to the `/usr/catman' directory where
the compressed man pages are stored. The
find command then
finds all the `.Z' files under that directory and prints their
names. This list is piped into the
mkid program. The
argument by itself (at the end of the line) tells
mkid to read
arguments (in this case the list of file names) from stdin. The
-S argument defines a new language (man) in terms of
uncompress utility and the existing text scanner. The second
-S argument tells
mkid to treat all `.Z' files as
language man. In practice, you might find the
arguments need to be even more complex, something like:
mkid '-Sman/text/uncompress -c < %s | col -b' -S.Z=man -
This will take the additional step of getting rid of any underlining and backspacing which might be present in the compressed man pages.