Re: htdig: indexing a site


Colin Viebrock (cmv@privateworld.com)
Wed, 09 Sep 1998 10:17:18 -0400


Also sprach Adam Crews (at 07:00 PM 9/8/98 -0700) ...
>I bill my clients by the amount of bandwidth that they use. I have one
>site that is about 40mb or so.. The indexing of the site is causing a
>large skew in the actual bandwidth that they use. The search engine is a
>"free" benefit of their site. I would like to be able to have htdig simply
>read the html pages from a starting directory and then go from there...

Not in the docs (I don't think) but doesn't htdig identify itself to the
server when requesting files?

If I look at my access logs for www.summerworks.on.ca, I see entries like:

www.summerworks.on.ca - - [09/Sep/1998:10:20:35 -0400] "GET
/downloads/app98.pdf HTTP/1.0" 200 74091
www.summerworks.on.ca - - [09/Sep/1998:10:20:44 -0400] "GET
/plab-sally.php3 HTTP/1.0" 200 1806
www.summerworks.on.ca - - [09/Sep/1998:10:20:44 -0400] "GET /plab-ken.php3
HTTP/1.0" 200 3762

So, technically, I could exclude all the bandwidth used by
www.summerworks.on.ca to get a count of "real" usage.

________________________________________________________________________
Colin Viebrock Creative Director
cmv@privateworld.com Private World Communciations
                                             http://www.privateworld.com

                                                   Your mouse has moved.
                                           Windows must be restarted for
                                              the change to take effect.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:46 PST