Re: [htdig] Number of documents in a Web site.


Subject: Re: [htdig] Number of documents in a Web site.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Feb 22 2000 - 11:59:10 PST


According to Andre LAGADEC:
> For some statistiques about our Web site, I have to give the number of
> pages in our Web site.
>
> Can I give the number NNNN of the result of htdig when indexing my web site
> or the number YYYY ?
> + /usr/local/htdig/bin/htdig -a -i -s
> htdig: Run complete
> htdig: 1 server seen:
> htdig: www.my.domain.com:80 NNNN document
> + /usr/local/htdig/bin/htmerge -a -s
> htmerge: Total word count: X
> htmerge: Total documents: YYYY
> htmerge: Total doc db size (in K): Z

That depends on what you want. NNNN will be the total number of document
seen by htdig's spidering, while htmerge will report YYYY as the total
number retained in the index, which may be less than or equal to NNNN.
If it's less, it's because htmerge removed some from the index, because
they were empty, had noindex tags, were disallowed by robots.txt, or
couldn't be fetched.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 12:02:43 PST