Re: [htdig] how it works, question


nets@searchtools.com
Wed, 15 Sep 1999 08:35:04 -0700


If you're using a robot to index pages, there is no way for it to
know about the contents of your directory. All it can know about is
the pages linked from somewhere else -- the local directory listing
is not available.

You could make a link page that only has listings of pages, and
include that with your starting point. Be sure to set the META
ROBOTS tag to NOINDEX,FOLLOW, so the page itself is not indexed.

Hope that helps,

Avi

At 7:31 AM -0700 9/15/99, Sadhunathan Nadesan wrote:
>hmmmmm .... i have been searching the web site, faq's etc, for how htdig
>actually works, and, it doesn't tell, although gives a clue. rather than
>digging in the source, can someone confirm this?
>
>htdig only follows links
>
>
>is that so obvious that everyone assumes it? wasn't obvious to me, if it
>is true. i expected it to recursively search every sub directory under the
>start url looking for all .html or text files. now i am beginning to think
>that perhaps this is a false assumption.
>
>in other words, if i have an index.html page in the start url, and it
>doesnt happen to have any links to many subdirectories beneath it which
>also have html pages .. none of the other pages get indexed. is that the
>case? if so, perhaps this info ought to be placed in the faq. if not, i
>am still stuck as to why it doesnt find everything under a start url.
>
>the problem being, i have many directories with html pages which are not
>pointed to by any html page on the site, the links are on other servers
>(not necessarily being indexed). so i guess i have to list each directory
>explicitly then???
>
>well, any comments appreciated,
>thank you
>sadhu
>
>
>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig@htdig.org containing the single word unsubscribe in
>the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Sep 15 1999 - 08:39:40 PDT