BOUNCE htdig: Admin request


owner-htdig@sdsu.edu
Wed, 13 Jan 1999 14:23:21 -0800 (PST)


>From andrew@contigo.com Wed Jan 13 14:23:18 1999
Received: from smtp04.primenet.com (daemon@smtp04.primenet.com [206.165.6.134])
        by sdsu.edu (8.8.7/8.8.7) with ESMTP id OAA18596
        for <htdig@sdsu.edu>; Wed, 13 Jan 1999 14:23:18 -0800 (PST)
Received: (from daemon@localhost)
        by smtp04.primenet.com (8.8.8/8.8.8) id PAA14855
        for <htdig@sdsu.edu>; Wed, 13 Jan 1999 15:23:05 -0700 (MST)
Message-Id: <199901132223.PAA14855@smtp04.primenet.com>
Received: from ip-43-215.tus.primenet.com(206.165.43.215), claiming to be "kocgemhz"
 via SMTP by smtp04.primenet.com, id smtpd014801; Wed Jan 13 15:22:58 1999
X-Sender: inmass@pop.primenet.com
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0
Date: Wed, 13 Jan 1999 15:23:08 -0700
To: htdig@sdsu.edu
From: INMASS/MRP <support@inmass.com>
Subject: Indexing error?
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

Hello,

I am using htDig through mindspring and their techs also are stumped by my
problem, and suggested I "ask htDig". I'm hoping someone here can help...<G>.

I am trying to index a specific (/support) directory on our web site and
all of its included subdirectories. htdig indexes if I use the default
start_url of http://wfp14994.w1.com/

However, it only indexes the main directory, none of the subdirectories. I
initially had .htaccess enabled for some sections, so took those out and
tried again -- still no indexing of the subdirectories.

If I change the start_url to http://wfp14994.w1.com/support/ I get the
following error:

> New server: www.wfp14994.w1.com, 80
> htdig: Run complete
> htdig: 1 server seen:
> htdig: www.wfp14994.w1.com:80 0 documents
>
> htmerge: Unable to open word list file
> '/web/u84/wfp14994/htdig/db/db.wordlist'

The tech folks at mindspring have tried several workarounds which I've
listed at the end of this message. Any ideas? The default config only
contains a few lines:

database_dir: /web/u84/wfp14994/htdig/db
start_url: http://www.wfp14994.w1.com/
limit_urls_to: ${start_url}
exclude_urls: /cgi-bin/ .cgi
max_head_length: 10000
search_algorithm: exact:1 synonyms:0.5 endings:0.1

Following are the solutions already attempted by the techs at mindspring:

> I tried assigning the start_url: right to the index.html file
>in the support/ directory. That didn't work. I also totally commented
>out limit_urls_to: so that it wouldn't limit it at all. No go there either.

>If I set start_url: http://www.wfp14994.w1.com/
>it works like a charm. I was looking at the output of the index
>from that last one. It listed payroll.htm as one of the pages that
>it indexed. So, I set the start_url to
>start_url: http://www.wfp14994.w1.com/payroll.htm
>
>that worked too. It started at payroll and went from there. So,
>I added a link in index.html for support/index.html so that it
>would be part of the main page. I renamed the one I
>modified to indextest.html and restored the original one now.
>I made a period in the bottom of the page the link. I reindex
>using
>
>start_url: http://www.wfp14994.w1.com/
>
>It indexed but the indexing didn't seem to include support/
>at all.

Thanks much,

Degan



This archive was generated by hypermail 2.0b3 on Thu Jan 14 1999 - 08:17:18 PST