Re: htdig: Missing Files in Search Results


Jeff Hill (jhill@hronline.com)
Fri, 13 Nov 1998 11:41:16 -0500


HtDig is still missing digging files, and I'm unable to determine why.
(I tried posting this before, but the message didn't seem like to make
it to the list.)

After testing with "create_url_list: % true url_list: /name/of/url-list"
in htdig.conf,
t, and grepping that list, I can tell that HtDig is not indexing certain
files: even though they
are identical in all respects except name.

For example, HtDig picks up my file 980522a.html in
/httpd/html/news/art/ and in
/httpd/html, but when I make an exact copy of this file with exactly the
same
permissions and put it in the same directory -- renaming it only -- it
doesn't dig
the file. The copy (980522x.html) is linked in
/httpd/html/news/index.html, the same
and only file that links 980522a.html. Both 980522x.html and
980522a.html are only
linked through /httpd/html/news/index.html

As the file names indicate, these are added by date. HtDig is still
indexing a few files here and there throughout the directory up until
9810*, so it doesn't just stop, but seems to skip some files, index
some, and then skip to some more -- in the same directory, same
permissions, same file content.

I very much appreciate any help.

Regards,

Jeff Hill

Geoff Hutchison wrote:
>
> At 12:41 PM -0500 11/8/98, Jeff Hill wrote:
> >and everything (seems to be) indexed.
>
> OK. Let's just output a list of the URLs in the database. Then you can
> check to see if everything's there.
>
> Put this in your htdig.conf
> create_url_list: true
> url_list: /name/of/url-list
>
> This will output a file after running htdig with the name specified in
> url_list. You'll probably want to run this through "sort -u" since it will
> give you a list of every URL ht://Dig has seen!
>
> Then do some grep-ing to see if your test documents have been indexed.
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/

-- 

********* HR On-Line: The Network for Workplace Issues ******** ** Ph:416-604-7251 -- Fax:416-604-4708 ** http://www.hronline.com ** ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:48 PST