[htdig] [ANNOUNCE] ht://Dig 3.2.0b1

Subject: [htdig] [ANNOUNCE] ht://Dig 3.2.0b1
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Feb 04 2000 - 22:05:21 PST

I'm very glad to announce the release of version 3.2.0b1. As the
version number denotes, this is a beta release. We're looking for
feedback on the 3.2 codebase, as far as documentation, performance,
features, suggestions, and of course bugs.

The documentation for the 3.2.0bX series can be found in
The release notes for 3.2.0b1 are at
To download the source, see <http://www.htdig.org/files/htdig-3.2.0b1.tar.gz>

Feedback on the release should be primarily directed to htdig3-dev@htdig.org

-Geoff Hutchison
Williams Students Online

    Release notes for htdig-3.2.0b14 Feb 2000
    This marks the first beta version of the 3.2.0 codebase, over a year
    in the works. Since it has not received as much testing as the 3.1.x
    series, it is *not* recommended for production environments. A full
    description of how to upgrade is provided at

      NOTE: Read this document before upgrading. You have been warned.

      * Fixed a bug in htdig where hopcounts could be calculated
        incorrectly between multiple servers.
      * Fixed a bug that could cause problems with 8-bit characters on
        some systems.
      * Fixed handling of unreachable servers. First, the new
        [4]max_retries attribute allows htdig to attempt multiple
        connections. Secondly, if the server is not available, htdig will
        stop trying to connect.
      * Fixed handling of SGML entities: htdig will still decode them to
        store as single characters in the database, but htsearch now
        encodes them back for compliant results.
      * Rewrote the database formats, allowing room for more sophisticated
        searches and compression of the word database using the new
        attribute wordlist_compress. These changes include the removal
        of the word_list file (db.wordlist) and the addition of the new
        doc_excerpt database.
      * Cleaned up many parts of the code, including the URL and HTML
        parsers. Additionally, on platforms that support it, much of the
        code will be built as shared libraries, which should help memory
        utilization, especially under high load.
      * Removed the modification_time_is_now attribute, which is now on by
        default. This means the time at indexing is taken as the date of
        the document if the server does not return a date.
      * Added the new attribute use_doc_date to use the date specified
        in a META date tag.
      * Merged all heading_factor attributes into one new attribute,
      * As a result of the new database format, all _factor attributes
        (like title_factor and keywords_factor are now dynamic--you
        do not have to rebuild your database to change the scaling.
      * Changed attributes bad_querystr, exclude_urls,
        limit_urls_to, limit_normalized, http_proxy_exclude to
        allow full regular expressions when the regex are surrounded by [
        and ].
      * Changed htsearch fields restrict and exclude to allow regular
        expressions when the regex are surrounded by [ and ].
      * Added phrase searching support to htsearch--queries enclosed in
        quotes will be checked to ensure the words occur in that exact
        order in the documents.
      * Added the build_select_lists attribute to allow the config
        file to specify <select> form elements in htsearch output as a
        template variable, much like $(SORT) and $(METHOD).
      * Added a regex fuzzy method. This will allow searches to include
        regex that match words. The fuzzy method will return up to
        regex_max_words matches.
      * Added a speling [sic] fuzzy method. This attempts several simple
        spelling mistakes (like transposed letters and extra letters) to
        find matches. This adds the new attribute
        minimum_speling_length to restrict whether small words should
        be checked. Transposing letters in smaller words can give
        unrelated correctly-spelled words.
      * Added support for external transport methods, using the
        external_protocols attribute, an analogue of the
        external_parsers system.
      * Added support for HTTP/1.1, including persistent connections. This
        can be configured using the new attributes
        persistent_connections, head_before_get, and
      * Added support for file:// URLs and support for using the
        mime_types file to decide whether local files are parsable.
      * Added two new formats for variables in htsearch templates,
        $%(var), which escapes the variable for a URL, and $&(var), which
        HTML-escapes the variable as necessary.
      * Added support for reading the list of URLs to index with htdig
        by supplying the command-line option -.
      * Added a flag -m to htdig to index only the files given in the
      * There are many more changes especially to the internal code
        structure, so a huge thank you goes out to everyone who helped
        make this release!

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Fri Feb 04 2000 - 22:14:38 PST