Index of /~beebe/support/html_analyzer-0.30

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[TXT]Copyright1993-08-17 22:10 4.0K 
[TXT]Example1993-10-07 23:47 5.9K 
[TXT]Example.html1993-10-07 23:36 6.3K 
[TXT]Installation1993-10-07 23:47 751  
[TXT]Installation.html1993-10-09 17:12 1.1K 
[TXT]Makefile1997-11-21 18:40 2.8K 
[TXT]README1994-05-17 14:40 8.6K 
[TXT]README.html1994-05-17 14:39 8.8K 
[   ]WHERE-FROM1997-11-21 18:28 331  
[TXT]glimpse.html1995-01-03 09:54 3.2K 
[DIR]libhtmlw/1999-08-10 10:46 -  
[DIR]libskiplist/1999-08-10 10:46 -  
[DIR]libwww2/1999-08-10 10:47 -  
[DIR]src/1999-08-10 10:47 -  


The html_analyzer-0.30 README file


This file contains information outlining the types of processing performed by the html_analyzer software as well as copyright, disclaimer, and funding information. Please read the file Installation in this directory for information on installing the software. To walk through and example run of the analyzer, see Example.


The software is currently being distributed via anonymous ftp from: in the /Mosaic/misc direcortory in compress'd and gzip'd forms. It is also available via anonymous ftp from: in pub/gvu/www/pitkow/html_analyzer.


The intent of the html_analyzer is to assist the maintenance of HyperText MarkUp Language (HTML) databases. As the number of HTML databases increases, the potential for hyperlinks that point to files or servers that no longer exist also increases. This results in the need for an automated hyperlink validation program. This is exactly what the html_analyzer does. The program also explores the relationship between hyperlinks and the contents of the hyperlink.


This directory contains the software to perform analysis of HTML databases. Specifically, the following tasks are performed:


We believe that there ought to exist a one-to-one correspondence between hyperlinks and the hyperlink's contents, such that every occurrence of the hyperlink points to only one document ( or section of document). This means every time a user sees a hyperlink, it will always point to the same section of a document. It also means that each section of document will only have one hyperlink pointing to it. We hypothesize that such a correspondence is necessary to create a clear internal representation in the user of the connections in the HTML database.


To run the html_analyzer after it has been installed (Please read the file Installation in this directory for information on installing the software), type:

html_analyzer [-val] [-com] [-con] directory [path of repository]

The -val, -com, and -con turn off the validation, completeness, and consistency tests. Only the name of a directory can be specified to check. If a directory is specified, all *.html files within the directory hierarchy will be processed. The path of the temporary repository (default is /var/tmp) can be used if /var/tmp is full or not desirable. A directory (/html_analyzer) is created in this directory to store the temporary files generated by execution. The program does not create the temporary repository.


The libwww2 directory is the modified WWW library that accompanies xmosaic-pre4. The libhtmlw directory is also from the prerelease.i Mosaic was developed by Marc Anderson at the National Center for Super- Computing Applications. This code is available from in the /Web directory. The original WWWLibrary2 library was developed by Tim Berners-Lee at the European Laboratory for Particle Physics (CERN). This code is available from in the /pub/www/src directory Please see the file Copyrights in this directory for more information on the copyrights that exist to these portions of code.

The Regents of the University of Colorado claim copyright on the other portions of the distribution.

This distribution of the software may be freely distributed, used, and modified but may not be sold as a whole nor in parts without permission of the copyright owners of the parts.


This software is provided as is. The Laboratory for Atmospheric and Space Physics (LASP) and the author are not responsible for support of this distribution.


Development of this software was funded by the NASA Earth Observing System Project under NASA contract NAS5-32392.


Version 0.30 from 0.10

Version 0.10 from 0.02

version 0.02 from 0.01:


Here's a list of things that could be done to improve the html_analyzer:


The purpose of this distribution is to further the development of HTML database creation and maintenance utilities. Comments, questions, and REVISIONS are indeed welcome.

To be added to the html_analyzer mailing list, mail with the subject: html_analyzer add

James E. Pitkow
Graphics, Visualization and Usability Laboratory
Georgia Institute of Technology