RE: htdig: indexing a site


Ashley Hindmarsh (ashleyh@yitm.com)
Wed, 9 Sep 1998 16:40:28 +0100


Yes - htdig identifies itself via the HTTP_USER_AGENT environment variable
e.g. to test in Perl: $htdig = 1 if ($ENV{'HTTP_USER_AGENT'} =~ /htdig/);

Of course, someone could rig a browser up to pretend to be htdig in order to
get
'free' bandwidth, but I think that it's unlikely!

-----Original Message-----
From: owner-htdig@sdsu.edu [mailto:owner-htdig@sdsu.edu]On Behalf Of
Colin Viebrock
Sent: 09 September 1998 15:17
To: Adam Crews; htdig@sdsu.edu
Subject: Re: htdig: indexing a site

Also sprach Adam Crews (at 07:00 PM 9/8/98 -0700) ...
>I bill my clients by the amount of bandwidth that they use. I have one
>site that is about 40mb or so.. The indexing of the site is causing a
>large skew in the actual bandwidth that they use. The search engine is a
>"free" benefit of their site. I would like to be able to have htdig simply
>read the html pages from a starting directory and then go from there...

Not in the docs (I don't think) but doesn't htdig identify itself to the
server when requesting files?

If I look at my access logs for www.summerworks.on.ca, I see entries like:

www.summerworks.on.ca - - [09/Sep/1998:10:20:35 -0400] "GET
/downloads/app98.pdf HTTP/1.0" 200 74091
www.summerworks.on.ca - - [09/Sep/1998:10:20:44 -0400] "GET
/plab-sally.php3 HTTP/1.0" 200 1806
www.summerworks.on.ca - - [09/Sep/1998:10:20:44 -0400] "GET /plab-ken.php3
HTTP/1.0" 200 3762

So, technically, I could exclude all the bandwidth used by
www.summerworks.on.ca to get a count of "real" usage.

________________________________________________________________________
Colin Viebrock Creative Director
cmv@privateworld.com Private World Communciations
                                             http://www.privateworld.com

                                                   Your mouse has moved.
                                           Windows must be restarted for
                                              the change to take effect.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:47 PST