Re: [htdig] Problem with indexing user homes


Subject: Re: [htdig] Problem with indexing user homes
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Feb 28 2000 - 14:57:57 PST


According to Feczak Szabolcs:
> A searched all over the documentation, faq, and the archive
> but I don't know what am I doing wrong.
>
> In /etc/htdig/htdig.conf
> I made entries like :
>
> start_url: http://koli.kando.hu/
> local_urls: http://koli.kando.hu/=/pub/www/html/
> local_user_urls: http://koli.kando.hu/=/home/,/public_html/
> limit_urls_to: drama.obuda.kando.hu koli.kando.hu
>
> After all made:
>
> drama:/var/spool/htdig# htdig
> drama:/var/spool/htdig# htmerge
> drama:/var/spool/htdig# ls
> db.docdb db.docs.index db.wordlist db.words.db
>
> But it only indexes the http://koli.kando.hu and the links can be
> followed from here, it doesn't index the /home/,/public_html/
> as I wanted. Off course it helps if I make a page linked to the
> html root that consist all address of the users, but I want to
> know what did I make badly.

You didn't do anything wrong in your configuration. You're just expecting
htdig to do something it was never designed to do.

In the "How it works" document, it states about htdig that "the program
will act as a regular web user, except that it will follow all hyperlinks
that it comes across." Thus, htdig only follows hypertext links, as a
typical web spider, and does not look at files or directories outside of
this spidering process, unless explicitly listed in start_url. This is
done on purpose. When documents are not linked to your main document tree,
or don't have any links at all pointing to them, it's not expected that
spiders will reach them. They are effectively hidden from view until they
appear in some hypertext link. The htdig spider works the same way.

On my system, I automatically generate an HTML document with links to
all the user pages, and list that document's URL in start_url.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 28 2000 - 15:02:24 PST