Re: htdig: Missing Files in Search Results


Jeff Hill (jhill@hronline.com)
Sun, 08 Nov 1998 00:22:13 -0500


Geoff Hutchison wrote:
> > I'm not getting hits in all the documents that I should. If I
> > search using htDig, I miss files that I can get by searching using the
> > exact same word/term with grep. While some of the files in the directory
>
> I guess my first suggestion would be to upgrade to 3.1.0b2 and rebuild
> your databases. It's a pain in the neck, but in previous versions, there's
> been a database corruption problem. I saw a lot of strange stuff in my DB
> disappear when that bug was fixed.

Well, on to debugging I guess. I upgraded to 3.1.0b2, including
replacing the
/common and all the other files. I'm using rundig with no modifications
except # out htnotify, so we're making completely new dbs.

I've taken a file that was indexed previously and simply copied it to
another file name. The file has exactly the same contents, exactly the
same permissions. Htdig indexes the old file, not the new one. Htdig is
indexing files of a newer date as I've got some recent additions that
are indexed.

I'm indexing using local file system, with htdig.conf modified to:

                local_urls: http://www.hronline.com/=/httpd/html/
                start_url: http://www.hronline.com/

                max_head_length: 50000

Otherwise, everything's pretty much vanilla.

Best I can see, it's arbitrarily skipping files. But of course, it can't
be arbitrary.

Any suggestions appreciated.

Regards,

Jeff H.
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:45 PST