Re: [htdig] excluding file trees from indexing process


Subject: Re: [htdig] excluding file trees from indexing process
From: Jens Moellenhoff (Jens.Moellenhoff@partner.bmw.de)
Date: Tue Nov 30 1999 - 06:33:55 PST


Hello,

I cannot change the web server settings (i am no admin), but what about
adding the search term "index" to the "bad_words" file? As far as I
understand it, this will prevent the user from being able to see the
file trees as a search result.

I know that this is not a very elegant solution, but it works... I just
tried it 5 minutes ago. At least it works perfectly if you do not need
the word "index" as a regular search term in your HTML/PDF/whatever
files.

Kind regards,
Jens Moellenhoff

tneuer@inwise.de schrieb:
>
> Jens Moellenhoff wrote:
> >
> > Hello,
> >
> > This may be just another one of these newbie questions, but how can I
> > exclude virtual file trees from being indexed? Whenever I enter the
> > keyword "index" in my search form, it returns a lot of hits like
> > "Index of folder1/folder2/folder3/" and shows the folder's index when I
> > click on one of these hits.
> >
> > I know this can be avoided (e. g. by using "exclude_urls" or
> > "bad_extensions"?), but not how exactly. I searched the mailing list
> > database for I don't know how long and read the FAQ, but I don't have a
> > clue yet.
>
> If you need the virtual trees to be walked by the indexer (e.g. in order
> to fetch some non-HTML documents from them), you cannot use the
> exclude_urls
> directive of Ht://Dig. Since the index is generated automatically by
> your
> web server, you need to add some indexer control information to this
> auto-
> generation of index documents.
>
> A portable approach would be to back off from automatical indexing by
> the
> web server and switch to some server side scripting (server-parsed HTML,
> PHP, ASP or some CGI) which produce the directory listings (this would
> also allow you to add some design to it). These listings should include
> a proper "robots" meta tag (or be stuffed with Ht://Dig specific indexer
> control) to control the dig process.
>
> For the Apache web server, you could also hack the mod_autoindex to
> also include robots control.
>
> hth,
> Torsten
>
> --
> InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
> Waldhofstraße 14 Tel: +49-4101-403605
> D-25474 Ellerbek Fax: +49-4101-403606
> E-Mail: info@inwise.de Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Tue Nov 30 1999 - 08:04:34 PST