This appendix contains information mainly of interest to implementors and
gawk. Everything in it applies specifically to
gawk, and not to other implementations.
See section Extensions in
gawk Not in POSIX
for a summary of the GNU extensions to the
awk language and program.
All of these features can be turned off by invoking
gawk with the
`--traditional' option, or with the `--posix' option.
gawk is compiled for debugging with `-DDEBUG', then there
is one more option available on the command line:
This option is intended only for serious
and not for the casual user. It probably has not even been compiled into
your version of
gawk, since it slows down execution.
If you should find that you wish to enhance
gawk in a significant
fashion, you are perfectly free to do so. That is the point of having
free software; the source code is available, and you are free to change
it as you wish (see section GNU GENERAL PUBLIC LICENSE).
This section discusses the ways you might wish to change
and any considerations you should bear in mind.
gawkto a new operating system.
You are free to add any new features you like to
However, if you want your changes to be incorporated into the
distribution, there are several steps that you need to take in order to
make it possible for me to include to your changes.
gawk. If your version of
gawkis very old, I may not be able to integrate them at all. See section Getting the
gawkDistribution, for information on getting the latest version of
gawk. (The GNU Coding Standards are available as part of the Autoconf distribution, from the FSF.)
gawkcoding style. The C code for
gawkfollows the instructions in the GNU Coding Standards, with minor exceptions. The code is formatted using the traditional "K&R" style, particularly as regards the placement of braces and the use of tabs. In brief, the coding rules for
int, on the line above the line with the name and arguments of the function.
forloop initialization and increment parts, and in macro bodies.
'\0'in the conditions of
forstatements, and in the
switchstatements, instead of just the plain pointer or character value.
NULLsymbolic constants, and the character constant
'\0'where appropriate, instead of
gawk, I may not bother.
gawksource tree with your version. (I find context diffs to be more readable, but unified diffs are more compact.) I recommend using the GNU version of
diff. Send the output produced by either run of
diffto me when you submit your changes. See section Reporting Problems and Bugs, for the electronic mail information. Using this format makes it easy for me to apply your changes to the master version of the
gawksource code (using
patch). If I have to apply the changes manually, using a text editor, I may not do so, particularly if there are lots of changes.
Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it, and if it isn't possible for me to do that with a minimum of extra work, then I probably will not.
gawkto a New Operating System
If you wish to port
gawk to a new operating system, there are
several steps to follow.
gawk, and the other ports. Avoid gratuitous changes to the system-independent parts of the code. If at all possible, avoid sprinkling `#ifdef's just for your port throughout the code. If the changes needed for a particular system affect too much of the code, I probably will not accept them. In such a case, you will, of course, be able to distribute your changes on your own, as long as you comply with the GPL (see section GNU GENERAL PUBLIC LICENSE).
gawkare maintained by other people at the Free Software Foundation. Thus, you should not change them unless it is for a very good reason. I.e. changes are not out of the question, but changes to these files will be scrutinized extra carefully. The files are `alloca.c', `getopt.h', `getopt.c', `getopt1.c', `regex.h', `regex.c', `dfa.h', `dfa.c', `install-sh', and `mkinstalldirs'.
gawkon their systems. If no-one volunteers to maintain a port, that port becomes unsupported, and it may be necessary to remove it from the distribution.
gawkfor your system.
Following these steps will make it much easier to integrate your changes
gawk, and have them co-exist happily with the code for other
operating systems that is already there.
In the code that you supply, and that you maintain, feel free to use a coding style and brace layout that suits your taste.
AWK is a language similar to PERL, only considerably more elegant. Arnold Robbins Hey! Larry Wall
This section briefly lists extensions and possible improvements
that indicate the directions we are
currently considering for
gawk. The file `FUTURES' in the
gawk distributions lists these extensions as well.
This is a list of probable future changes that will be usable by the
awk language programmer.
gawkprint its warnings and error messages in languages other than English. It may be possible for
awkprograms to also use the multiple language facilities, separate from
gawk) may be superseded by a
PROCINFOarray that would provide the same information, in an easier to access fashion.
gawkto the array
ENVIRONmay be propagated to subprocesses run by
This is a list of probable improvements that will make
dfapattern matcher from GNU
grephas some problems. Either a new version or a fixed one will deal with some important regexp matching issues.
mmapsystem call, its use would provide much faster file input, and considerably simplified input buffer management.
malloccould potentially speed up
gawk, since it relies heavily on the use of dynamic memory allocation.
rxregular expression library could potentially speed up all regexp operations that require knowing the exact location of matches. This includes record termination, field and array splitting, and the
Here are some projects that would-be
gawk hackers might like to take
on. They vary in size from a few days to a few weeks of programming,
depending on which one you choose and how fast a programmer you are. Please
send any improvements you write to the maintainers at the GNU project.
See section Adding New Features,
for guidelines to follow when adding new features to
See section Reporting Problems and Bugs, for information on
contacting the maintainers.
gawkuses a Bison (YACC-like) parser to convert the script given it into a syntax tree; the syntax tree is then executed by a simple recursive evaluator. This method incurs a lot of overhead, since the recursive evaluator performs many procedure calls to do even the simplest things. It should be possible for
gawkto convert the script's parse tree into a C program which the user would then compile, using the normal C compiler and a special
gawklibrary to provide all the needed functions (regexps, fields, associative arrays, type coercion, and so on). An easier possibility might be for an intermediate phase of
awkto convert the parse tree into a linear byte code form like the one used in GNU Emacs Lisp. The recursive evaluator would then be replaced by a straight line byte code interpreter that would be intermediate in speed between running a compiled program and doing what