BOUNCE htdig: Admin request


owner-htdig@sdsu.edu
Tue, 29 Dec 1998 01:33:34 -0800 (PST)


>From andrew@contigo.com Tue Dec 29 01:33:31 1998
Received: from spartacus (spartacus.a2000.nl [62.108.1.20])
        by sdsu.edu (8.8.7/8.8.7) with ESMTP id BAA20490
        for <htdig@sdsu.edu>; Tue, 29 Dec 1998 01:33:30 -0800 (PST)
Received: from node149c.a2000.nl ([62.108.20.156] helo=albert)
        by spartacus with smtp (Exim 2.02 #4)
        id 0zuvWq-0005w7-00
        for htdig@sdsu.edu; Tue, 29 Dec 1998 10:33:28 +0100
Message-Id: <4.1.19981229095041.040894b0@pop3.demon.nl>
X-Sender: javawoma@pop.javawoman.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.1
Date: Tue, 29 Dec 1998 10:33:21 +0100
To: htdig@sdsu.edu
From: Marjolein Katsma <webmaster@javawoman.com>
Subject: Word sort looping - and solved
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

That subject could have been written only after I solved it. Here's what
happened since it may help others. Report of some detective work (it took
about 24 hours to get to where I'm now...)

OS: BSDI 3.1 (virtual kernel #14)
gcc/g++ 2.7.2.1

This is a *virtual* server where I run two domains, one a virtual domain
mapping to a subdirectory of the main host.
The virtual server gives me root access when I telnet in; with ftp I see
only the root of my home directory. Can take some getting used to: /usr/bin
is something different in both cases; as seen from telnet, there's /usr/bin
and there's /usr/home/<username>/usr/bin . A CGI program run from the web
server can also only "see" the virtual root.

Installed htdig under /usr/home/<username>/usr/local/htdig. Installation
went smoothly (mostly); the only hitch was that htsearch for some reason
was not installed in the cgi-bin but it was easy enough to copy there. Make
is not GNU make, and I didn't want to install an extra piece of software
right now, so I simply went through the Makefiles and made (and commented)
small edits.

To account for the two domains, I simply made separate configuration files
and some separate subdirectories (databases in subdirectories of db, for
instance). By now, I've set it up so I can have separate badwords lists and
even synonyms lists for each. With the excellent documentation that was
easy enough to accomplish.

After first installation, I ran the rundig for one site which currently
still has only a single page (redirecting to the "old" location). No
problem just a few warnings. I then tested the sample search form; adapted
to point to the site-specific configuration. It didn't work - the error
message made it clear it was looking at paths from the (real) server root,
not the virtual root. So I split up configuration files again: a separate
file for htsearch with paths for the virtual server, the other for
htdig/htmerge/htnotify/htfuzzy. I also changed the Makefile for htsearch to
reflect the different pathing and remade.

Second test: htsearch now correctly found the right database and
configuration file, and returned the single page.

I was happy, and at this point started fully setting up all the
configuration for the two sites. So now I have two "rundig" scripts, and
four configuration files. (I left configuring the templates for later).

All set - and I ran the rundig script for the other site - and *something*
started looping endlessly. The only way to get out of that was to kill my
telnet session... A message rolled over the screen so fast I could not read
it; I had to fiddle to make a full-screen screen capture to be able to read
what it said: warning:line too long: ignoring - followed by the "line". It
took a while to register that the "line" was actually three lines with the
beginning of a forth : showing the start of db.wordlist. The same message
(and same "line") repeated endlessly.
Since it seemed reasonable to me that the db.wordlist should be sorted
before further processing, I went digging - how and where was sorting done?
I found that -n and -o options were used as well as -T (but not -t). -T was
not listed in my Unix in a nutshell...

Next hurdle: In my virtual server situation I have actually access to *two*
sort programs. one in /usr/bin and one in /usr/home/<username>/usr/bin .
Same size, different date. A cmp showed there *were* different. Configure
had picked up the first one. I decided to try the second one. More digging,
found two Makefiles where this was referenced (htcommon and htmerge);
remade the whole thing.

New test: same result. Ran the rundig for the other (one-page) site again
and now realized one of the warnings there was also significant: warning:
last character not record delimiter. It now seemed that the whole process
here had gone correctly by chance because there was only one page...

Final solution: I tried man sort (yes, I should have doen this before....).
What I don't know is to which (or both) of the two sort programs this
referred, but I found a -T paraneter. It is used to set the record
delimiter!!! While htdig was using it to set a TMPDIR. Some more digging,
since I couln't remember where I'd seen TMPDIR set (remember, by now I had
changed *all* Makefiles, some several times, and I had four configuration
files and two scripts...). I found it, and commented it out. Ran rundig
again - and it worked!

Phew. And I even left out some details ;-) Initial setup took just four
hours (I am the RTFMB type); sorting out the sort problem took the other 20.

Question: is it somehow possible (for configure) to *test* what a parameter
actually does? IOW, is there some way to prevent this? If not, I think it
would be wise to document that TMPDIR is used for the -T parameter of sort
which may not work as expected on all systems...

Marjolein Katsma webmaster@javawoman.com
Java Woman - http://javawoman.com/



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:55 PST