Re: [htdig] excluding file trees from indexing process

Subject: Re: [htdig] excluding file trees from indexing process
From: Jens Moellenhoff (
Date: Tue Nov 30 1999 - 06:33:55 PST


I cannot change the web server settings (i am no admin), but what about
adding the search term "index" to the "bad_words" file? As far as I
understand it, this will prevent the user from being able to see the
file trees as a search result.

I know that this is not a very elegant solution, but it works... I just
tried it 5 minutes ago. At least it works perfectly if you do not need
the word "index" as a regular search term in your HTML/PDF/whatever

Kind regards,
Jens Moellenhoff schrieb:
> Jens Moellenhoff wrote:
> >
> > Hello,
> >
> > This may be just another one of these newbie questions, but how can I
> > exclude virtual file trees from being indexed? Whenever I enter the
> > keyword "index" in my search form, it returns a lot of hits like
> > "Index of folder1/folder2/folder3/" and shows the folder's index when I
> > click on one of these hits.
> >
> > I know this can be avoided (e. g. by using "exclude_urls" or
> > "bad_extensions"?), but not how exactly. I searched the mailing list
> > database for I don't know how long and read the FAQ, but I don't have a
> > clue yet.
> If you need the virtual trees to be walked by the indexer (e.g. in order
> to fetch some non-HTML documents from them), you cannot use the
> exclude_urls
> directive of Ht://Dig. Since the index is generated automatically by
> your
> web server, you need to add some indexer control information to this
> auto-
> generation of index documents.
> A portable approach would be to back off from automatical indexing by
> the
> web server and switch to some server side scripting (server-parsed HTML,
> PHP, ASP or some CGI) which produce the directory listings (this would
> also allow you to add some design to it). These listings should include
> a proper "robots" meta tag (or be stuffed with Ht://Dig specific indexer
> control) to control the dig process.
> For the Apache web server, you could also hack the mod_autoindex to
> also include robots control.
> hth,
> Torsten
> --
> InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
> Waldhofstraße 14 Tel: +49-4101-403605
> D-25474 Ellerbek Fax: +49-4101-403606
> E-Mail: Internet:

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 08:04:34 PST