%% /usr/local/src/mg/mg-1.3x/README.mg-1.3x, Mon May 26 14:22:22 1997 %% Edit by Nelson H. F. Beebe ======== PROBLEMS ======== Any problems encountered with installation of mg-1.3x should be reported directly to Nelson H. F. Beebe Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 USA Email: beebe@math.utah.edu (Internet) WWW URL: http://www.math.utah.edu/~beebe Problems with mg-1.3 should be reported on the mg mailing list mg-users@mds.rmit.edu.au since having a larger based of problem solvers reduces the burden on the mg developers. ======================== HOW TO MAKE VERSION 1.3x ======================== Version 1.3x of mg can be created by the application of a set of patches in this distribution to a directory tree containing the mg 1.3 distribution from the New Zealand Digital Library project at the University of Waikato at http://www.cs.waikato.ac.nz/~nzdl/technology/mg-1.3.tar.gz Fetch that tar file and install in an clean directory like this: gunzip > ... >> -taso >> Tell the linker that the executable file should be loaded in the lower >> 31-bit addressable virtual address range. The -T and -D flags to the ld >> command can also be used to ensure that the text and data segments >> addresses, respectively, are loaded into low memory. >> >> The -taso flag, however, in addition to setting default addresses for >> text and data segments, also causes shared libraries linked outside the >> 31-bit address space to be appropriately relocated by the loader. If >> you specify -taso and also specify text and data segment addresses with >> -T and -D, those addresses override the -taso default addresses. The >> -taso flag is useful for porting 32-bit programs to DEC OSF/1. >> ... Problem (4) is considerably more difficult, and is discussed further below. These observations suggest a possible brute-force attack to complete a port of mg 1.3x to the DEC Alpha: (1) Change all "long" variables to "mg_long", and all "u_long" variables to "mg_u_long", and then at compile time, or in the config.h file, define mg_long to be int for this system, and long for others. A suitable sed script to do this on (a COPY of) the mg 1.3x source tree is included as the file long-to-mg_long.sh in this distribution. It can be run like this: cd wherever-you-put-the-COPY-of-mg-1.3x ./long-to-mg_long.sh These changes will be incorrect where there are calls to library functions that expect long arguments. Fortunately, Standard C prototypes will produce correct promotion in all but the case of the *printf() functions, where %l format items must have a corresponding long argument. These cases must be separately identified and corrected [there 64 %l format items in the source code]. [NB: This has NOT yet been done in the preliminary port of mg 1.3x to the DEC Alpha.] (2) Build the program with pointers restricted to the lower 32-bits, so that they can be copied in and out of int variables without bit loss: env CC='cc -D_OSF_SOURCE -Dmg_long=int -ieee_with_inexact -taso' ./configure && make The -ieee_with_inexact switch is needed to get IEEE 754 behavior so that mg can correctly handle NaN and Infinity; they are both possible in statistics computation from terms of the form 0/0 and x/0. Otherwise, the DEC Alpha terminates process execution when those expressions are evaluated. I tried this, and it DOES produce a working mgquery that can read binary databases created on a 32-bit big-endian system, PROVIDED that the fast-loading index has not been built. However, if that file (*.text.dict.fast) exists, then mgquery core dumps with this traceback: signal Segmentation fault at [Load_Fast_Comp_Dict:760 +0x18,0x1202085c] NTOHSI(cd->cfh[i]->hd.num_codes); (dbx) where > 0 Load_Fast_Comp_Dict(text_fast_comp_dict = 0x14024c40) ["text_get.c":760, 0x1202085c] 1 LoadCompDict(text_comp_dict = (nil), text_aux_dict = (nil), text_fast_comp_dict = 0x14024c40) ["text_get.c":804, 0x12020d5c] 2 InitQuerySystem(dir = 0x140268c0 = "/tmp/mg/mgdata/", name = 0x14025ea0 = "bibfiles/bibfiles", iqt = 0x11ffea20) ["backend.c":359, 0x12025324] 3 query() ["mgquery.c":1127, 0x12011a80] 4 main(argc = 2, argv = 0x11ffec08) ["mgquery.c":1412, 0x12012c94] Further study of the code reveals the problem: in mg_fast_comp_dict.c, in function save_fast_dict(), pointers are replaced by their offsets. In text_get.c, in function Load_Fast_Comp_Dict(), those offsets are replaced by pointers. As long as the original pointer is non-NULL, this is correct. However, when the pointer is NULL, the computed offset, NULL - base, will be negative, or equivalently, a large unsigned value. The solution is to modify Load_Fast_Comp_Dict() to check the expression ((mg_long)offset < 0) to detect cases when the reconstructed pointer should be a NULL, rather than a bogus value that points below the base address. This error has been fixed in the mg 1.3x distribution, since it potentially affects other architectures too. With this change, mg 1.3x now works correctly on the DEC Alpha, as long as the fast loading compression dictionary is NOT used. There are two other compiler options that may eventually lead to a solution of the problem of making mg work properly on this architecture: >> ... >> -xtaso >> Cause the compiler to respond to the #pragma pointer_size preprocessor >> directives which control pointer size allocations. This flag allows >> you to specify 32-bit pointers when used in conjunction with the pragma >> pointer_size directive. You must place pragmas where appropriate in >> your program to use 32-bit pointers. Images built with this flag must >> be linked with the -taso flag in order to run correctly. See the Pro- >> grammer's Guide for information on #pragma pointer_size. >> >> -xtaso_short >> Force the compiler to allocate 32-bit pointers by default. You can >> still use 64-bit pointers, but only by the use of pragmas. >> ... Experiments with small test programs show that it is not sufficient to use -xtaso_short, because the short pointers thus created are not handled correctly by C library routines, and segment violations immediately provoke core dumps. I made an experiment with insertion of #if defined(__alpha) #pragma pointer_size save #pragma pointer_size short #endif ... #if defined(__alpha) #pragma pointer_size restore #endif around structures in src/text/backend.h to try to get the pointers involved in the fast loading compression dictionary to be 32-bit values instead of 64-bit values, and I also changed the definition of WORDNO in src/text/mg_fast_comp_dict.c and src/text/text_get.c to #if defined(__alpha) #pragma pointer_size save #pragma pointer_size short static size_t MG_POINTER_SIZE = sizeof(u_char*); #pragma pointer_size restore #else #define MG_POINTER_SIZE sizeof(u_char*) #endif #define WORDNO(p, base) ((((char*)(p))-((char*)(base)))/MG_POINTER_SIZE) This produced an executable that successfully read the fast loading compression dictionary, but then later failed elsewhere with invalid addresses. This suggests that further pursuit of this approach might eventually provide a solution. However, it is my belief that since most other RISC vendors are moving toward 64-bit software architectures (DEC Alpha, SGI MIPS R4x00 and later, and SUN UltraSPARC already have 64-bit hardware), this problem will soon strike newer systems from those vendors too, and will best be addressed by a complete rewrite of the two routines src/text/mg_fast_comp_dict.c and src/text/text_get.c to avoid pointers altogether in the fast loading compression dictionary file. At the same time, thought needs to be given to the entire idea of 32-bit longs vs 64-bit longs, so that the `Managing Gigabytes' software can actually work on more than two gigabytes (== 2^{31} - 1) of data, which is the current limit imposed by signed longs and offsets in the binary files.