Re: htdig: Missing Files in Search Results

Jeff Hill (
Mon, 09 Nov 1998 01:38:33 -0500

Geoff Hutchison wrote:
> At 12:41 PM -0500 11/8/98, Jeff Hill wrote:
> >and everything (seems to be) indexed.
> OK. Let's just output a list of the URLs in the database. Then you can
> check to see if everything's there.
> Put this in your htdig.conf
> create_url_list: true
> url_list: /name/of/url-list

Didn't know about that test. Tried it, grepping the url_list created,
and I find none of the URLs
to the files that are missing in the search results. Not dug, not likely
to come up in the search
results. So, why wouldn't it dig these other files?

Again, it picks up my file 980522a.html in /httpd/html/news/art/ and in
but when I make an exact copy of this file with exactly the same
permissions and put it in the same directory -- renaming it only -- it
doesn't dig the file. The copy (980522x.html) is linked in
/httpd/html/news/index.html, the same and only file that links
980522a.html. Both 980522x.html
and 980522a.html are only linked through /httpd/html/news/index.html

As the file names indicate, these are added by date. HtDig is still
indexing a few files here and there throughout the directory up until
9810*, so it doesn't just stop, but seems to skip some files, index
some, and then skip to some more -- in the same directory, same
permissions, same file content.

This is way beyond my abilities to pin down, so I very much appreciate
any help.


Jeff h.

> This will output a file after running htdig with the name specified in
> url_list. You'll probably want to run this through "sort -u" since it will
> give you a list of every URL ht://Dig has seen!
> Then do some grep-ing to see if your test documents have been indexed.
> -Geoff Hutchison
> Williams Students Online


********* HR On-Line: The Network for Workplace Issues ******** ** Ph:416-604-7251 -- Fax:416-604-4708 ** ** ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:46 PST