Re: [htdig] url_part_aliases


Subject: Re: [htdig] url_part_aliases
From: Torsten Neuer (tneuer@inwise.de)
Date: Sat Sep 09 2000 - 02:55:38 PDT


Jim Cole wrote:
>
> Torsten Neuer's bits of Fri, 8 Sep 2000 translated to:
>
> >This can normally be achieved on the server itself using mod_rewrite on
> >Apache.
> >On other servers I don't know, but I guess that they probably offer
> >similar
> >functionality.
>
> I thought about doing this, but I wasn't convinced it would solve the
> problem. If htdig grabs a page with a bunch of links of the form I am
> trying to avoid, won't it still add those URL's to the database,
> regardless of what mod_rewrite is doing? Or will htdig somehow correct
> those links when it retrieves the corresponding pages?

Well, Ht://Dig will fetch the URLs as it sees them in the HTML, but the
server will then redirect it to another URL which then is stored in the
search database. The document of the URL the indexer initially used to
fetch the contents of teh actual document is of size 0 and will be
stripped
from the search database.

cheers,

  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Sat Sep 09 2000 - 02:58:12 PDT