Re: [htdig] excluding file trees from indexing process


Subject: Re: [htdig] excluding file trees from indexing process
From: Torsten Neuer (tneuer@inwise.de)
Date: Tue Nov 30 1999 - 05:17:25 PST


Jens Moellenhoff wrote:
>
> Hello,
>
> This may be just another one of these newbie questions, but how can I
> exclude virtual file trees from being indexed? Whenever I enter the
> keyword "index" in my search form, it returns a lot of hits like
> "Index of folder1/folder2/folder3/" and shows the folder's index when I
> click on one of these hits.
>
> I know this can be avoided (e. g. by using "exclude_urls" or
> "bad_extensions"?), but not how exactly. I searched the mailing list
> database for I don't know how long and read the FAQ, but I don't have a
> clue yet.

If you need the virtual trees to be walked by the indexer (e.g. in order
to fetch some non-HTML documents from them), you cannot use the
exclude_urls
directive of Ht://Dig. Since the index is generated automatically by
your
web server, you need to add some indexer control information to this
auto-
generation of index documents.

A portable approach would be to back off from automatical indexing by
the
web server and switch to some server side scripting (server-parsed HTML,
PHP, ASP or some CGI) which produce the directory listings (this would
also allow you to add some design to it). These listings should include
a proper "robots" meta tag (or be stuffed with Ht://Dig specific indexer
control) to control the dig process.

For the Apache web server, you could also hack the mod_autoindex to
also include robots control.

hth,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 05:30:10 PST