RE: [htdig] htdig


Subject: RE: [htdig] htdig
From: Richard Bethany (richard.bethany@s1.com)
Date: Thu Jan 11 2001 - 19:05:35 PST


Gilles, your suggestion below worked to perfection. I didn't think about
the fact that I only needed a snippet of the whole string to eliminate it.
I ended up using 441 (21 x 21) bad_querystr entries. This will allow the
use of up to 21 menu headings on a page. The whole `rundig` process
finished in about five minutes!

Thanks!!!
Richard Bethany
S1 Corporation

-----Original Message-----
From: Gilles Detillieux [mailto:grdetil@scrc.umanitoba.ca]
Sent: Thursday, January 11, 2001 4:01 PM
To: Richard Bethany
Cc: grdetil@scrc.umanitoba.ca; ghutchis@wso.williams.edu;
htdig@htdig.org; Chuck Umeh
Subject: Re: [htdig] htdig

According to Richard Bethany:
> That was my fear as well. For the one link below with eight menu items, I
> need to accept p=1: through p=8: to pick up any/all links in the submenus,
> but I would have to reject the other 40,312 possible combinations of
values
> that "p" can have. As you stated, that would be a mite cumbersome and, if
> we had pages with more menu items (we do), it would become exponentially
> more impossible (<-- can something be "more" impossible? How about more
> improbable?) to limit the accepted values.
>
> Does the 3.2 beta release seem pretty stable? Does the regex
functionality
> work properly? If so, perhaps I'll give that a shot. If not, I suppose
> I'll just dig around in the code to see if I can find a way to get it to
do
> what we need.

The current 3.2 beta release (b2) isn't stable. The latest development
snapshot for 3.2.0b3 is much more so, but IMHO still not quite ready
for prime-time. Ironically, one of the remaining problems is that long,
complex regular expressions seem to be silently failing right now,
so we still need to get to the bottom of that.

However, even you you need to reject 40,312 possible combinations of
values, it doesn't mean you'd need to explicitly list each of those,
as many of them could be covered by the same substring. The current
handling of exclude_urls and bad_querystr does substring matching, so
there's an implied .* on either side of each string you give for these
two attributes. Because any of 1 though 8 can be used as the intial p=
value, it makes the problem more complicated than I assumed, but not
by a huge amount. If I understand correctly, as long as there's only
one menu value specified, it's OK, but if there are two or more, it's
not OK, and only 1 through 8 will appear as possible menu values. Now,
a string of more than two menu values will be matched by a substring of
only two values, so all you need are all possible series of two values,
or 8 x 8 = 64 patterns, &p=1:1 through to &p=8:8. Correct?

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930



This archive was generated by hypermail 2b28 : Thu Jan 11 2001 - 19:20:33 PST