Re: [htdig] Number of documents in a Web site.

Subject: Re: [htdig] Number of documents in a Web site.
From: Gilles Detillieux (
Date: Tue Feb 22 2000 - 11:59:10 PST

According to Andre LAGADEC:
> For some statistiques about our Web site, I have to give the number of
> pages in our Web site.
> Can I give the number NNNN of the result of htdig when indexing my web site
> or the number YYYY ?
> + /usr/local/htdig/bin/htdig -a -i -s
> htdig: Run complete
> htdig: 1 server seen:
> htdig: NNNN document
> + /usr/local/htdig/bin/htmerge -a -s
> htmerge: Total word count: X
> htmerge: Total documents: YYYY
> htmerge: Total doc db size (in K): Z

That depends on what you want. NNNN will be the total number of document
seen by htdig's spidering, while htmerge will report YYYY as the total
number retained in the index, which may be less than or equal to NNNN.
If it's less, it's because htmerge removed some from the index, because
they were empty, had noindex tags, were disallowed by robots.txt, or
couldn't be fetched.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 12:02:43 PST