Re: [htdig] Exclude_urls (3.1.5).


Subject: Re: [htdig] Exclude_urls (3.1.5).
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Nov 01 2000 - 09:28:30 PST


According to Sphboc@aol.com:
> documentation says:
> If a URL contains any of the space separated patterns, it will be rejected.
>
> Consider the following:
>
> Exclude_urls: fuseaction=readmessage (in the config file)
> http://www.autobytel.com/content/service/index.cfm?fuseaction=readmessage&m=30
>
> 5&id=4&f=4:
>
> A. Will this url be excluded? (the "exclude" phrase FOLLOWS the ?).
> B. Is the exclusion done PRIOR TO requesting content from the server?
>
> Point being that NOT communicating with the server, when I'm not interested
> in the reply, avoids significant effort for all concerned . . .

Yes, this URL should be excluded by the pattern you give, provided your
"exclude_urls" attribute definition is entered correctly - the attribute
name must be all in lower case. The exclusion is not only done prior
to asking the server for the document in question, but the URL will not
even be queued up if it's excluded. The URLs are tested for validity
against limit_urls_to, exclude_urls, bad_extensions, valid_extensions
and bad_querystr when they're encountered in a document's href, or as
a redirect, and are only queued if they pass these 5 tests.

BTW, running htdig with -vvv likely would have answered your question
for you.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Nov 01 2000 - 09:34:47 PST