Re: [htdig] excluding file trees from indexing process


Subject: Re: [htdig] excluding file trees from indexing process
From: Torsten Neuer (tneuer@inwise.de)
Date: Thu Dec 02 1999 - 01:15:51 PST


Jens Moellenhoff wrote:
>
> tneuer@inwise.de schrieb:
>
> > > As I stated at the beginning of this thread, I want absolutely no search
> > > result showing a directory tree.
> >
> > That's where the robots exclusion standard comes in and that's why you
> > need to customize this default document.
> >
> > Of course, you can also have another tool, gathering the URLs (i.e.
> > documents) to be indexed from the directory structure and include
> > this URL list in the start_urls directive of your Ht://Dig conf.
> >
> > But I'm not sure if this is really required, since any auto-index
> > document which has a <META NAME="robots" CONTENT="noindex,follow">
> > in its header should do just that automatically.
>
> My colleague had that idea with <meta name="robots"...> before, i just
> wanted to check if that's okay with you and others on the list. He also
> said that it might be necessary to put a <meta http-equiv="refresh"...>
> into every index page, because when i put that <meta name="robots"
> content="noindex,follow"> into the index file, it does not show the file
> tree, and therefore the search engine might not know which link to
> follow. With the "refresh" option, I tell the robot where to go to next.

The "refresh" stuff works only for a minority of browsers and is used
to redirect them to different pages. It will not work with most robots.

The "robots" stuff will in fact parse the document for links and follow
them (if not "nofollow" is selected) but it will leave the document out
of the search index (if "noindex" is selected).

However, it is important to have a recent version of Ht://Dig installed
on your system, since there were some misbehaviour regarding "robots" in
previous releases.

hth,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Dec 02 1999 - 01:28:56 PST