[htdig] Real time Indexing

Thu, 27 May 1999 20:10:51 +0530

I want to index all the HTML pages that are accessed by my corporate users
while surfing the Internet. I am planning to develop a plugin that will sit
with the web-proxy and will capture all the HTMLs before giving them to the
users. The corporate web-administrator can view these pages by providing
querying on the indexed database. This query can be done at any time of the
As all the search engine tools are designed to deliver search results much
faster than indexing HTML pages, I am not sure which tool to choose for this
requirement. Has anybody ever tried real time indexing with htdig? I would
like to know if htdig can be suitable for this requirement. As an
approximation I would like the system to be able to index upto 5 HTMLs /
sec. Also I need to keep these HTML pages for 7 days. Which will be around
210K pages.
Raghvendra Varma,
Infosys Technologies Limited
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Thu May 27 1999 - 06:55:22 PDT