Jeff Hill (firstname.lastname@example.org)
Fri, 13 Nov 1998 11:43:24 -0500
HtDig is still missing digging files, and I'm unable to determine why.
(I tried posting this before, but the message didn't seem like to make
it to the list.)
After testing with "create_url_list: % true url_list: /name/of/url-list"
in htdig.conf, t, and grepping that list, I can tell that HtDig is not
indexing certain files: even though they are identical in all respects
For example, HtDig picks up my file 980522a.html in
and in /httpd/html, but when I make an exact copy of this file with
the same permissions and put it in the same directory -- renaming it
it doesn't dig the file. The copy (980522x.html) is linked in
/httpd/html/news/index.html, the same and only file that links
Both 980522x.html and 980522a.html are only linked through
As the file names indicate, these are added by date. HtDig is still
indexing a few files here and there throughout the directory up until
9810*, so it doesn't just stop, but seems to skip some files, index
some, and then skip to some more -- in the same directory, same
permissions, same file content.
I very much appreciate any help.
Geoff Hutchison wrote:
> At 12:41 PM -0500 11/8/98, Jeff Hill wrote:
> >and everything (seems to be) indexed.
> OK. Let's just output a list of the URLs in the database. Then you can
> check to see if everything's there.
> Put this in your htdig.conf
> create_url_list: true
> url_list: /name/of/url-list
> This will output a file after running htdig with the name specified in
> url_list. You'll probably want to run this through "sort -u" since it will
> give you a list of every URL ht://Dig has seen!
> Then do some grep-ing to see if your test documents have been indexed.
> -Geoff Hutchison
> Williams Students Online
********* HR On-Line: The Network for Workplace Issues ******** ** Ph:416-604-7251 -- Fax:416-604-4708 ** http://www.hronline.com ** ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:48 PST