Re: [htdig] Excluding URLs with the ? query string character


Subject: Re: [htdig] Excluding URLs with the ? query string character
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Feb 07 2000 - 12:33:35 PST


According to Manuel Lemos:
> Is there a way to exclude URLs in the search that contain ? query string
> character like most crawlers do?
>
> For instance this URL would not be indexed:
>
> http://www.mysite.com/login.html?user_name=mlemos
>
> But this would be indexed
>
> http://www.mysite.com/login.html

You can add a ? to the exclude_urls attribute in htdig.conf. This won't
strip off the query string from URLs that have one, but will exclude these
URLs from the index. If it finds URLs to the same page but without a "?"
or query string, it will index the page.

See http://www.htdig.org/attrs.html#exclude_urls

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 07 2000 - 12:59:30 PST