Re: [htdig] excluding index-pages


Subject: Re: [htdig] excluding index-pages
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue May 23 2000 - 09:05:12 PDT


On 23 May 2000, Andreas Vogt wrote:

> I want htdig to search the links on that index.html, because these are the
> messagexxxx.html, but I don't want to search the text of index.html
> (It's like indexing the whole book and also the index and content pages)
>
> If I add index.html to the exclude patterns, not only the text is gone,
> but also the text of the hyperlinks.

You say that you don't want to *search* the text of index.html, so I would
do exactly that. I would index normally and in the search form use either:

<input type="hidden" name="exclude" value="index.html">

or (more likely to work):

<input type="hidden" name="restrict" value="message">

One tidbit--when the text of links is indexed, it counts as plain text for
the page it's on, but it counts a description (i.e. description_factor)
for the page that's the target of the link. So that hyperlink text counts
for the messagexxxx.html pages automatically.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue May 23 2000 - 06:53:51 PDT