Re: [htdig] Large Websites Indexing vs Dynamic Database query...pros & cons...


Subject: Re: [htdig] Large Websites Indexing vs Dynamic Database query...pros & cons...
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Sat Dec 16 2000 - 06:53:34 PST


At 3:47 PM +0000 12/16/00, Sanjay Arora wrote:
>If one has a static pages site having 12-15 GB of static web-pages, what
>would be the index size?

It depends on a lot of factors (e.g. whether you use compression, how
much of the excerpts you decide to store), but it would be pretty
large.

>Would HtDig be suitable for such an application? Any pointers to web
>resources for comparisons with other open source search engines?

Perhaps, though you'd need a server platform that can handle
files >2GB in size. Several people do have setups of ht://Dig this
large, though in most cases, they prefer to cut the database into
sections. For example one such site indexes mailing lists, so each
list has its own database.

>Would having a Dynamically generated site be a better option? What about
>server load issues of a search engine vs dynamically served pages? Can
>somebody please guide me to resources for further reading on this subject?

It depends a lot on what you mean by dynamic and how much traffic you
expect. Certainly dynamic content requires server horsepower. But you
can certainly buy enough hardware to handle it. But "dynamically
generated" and "search engine" are not mutually exclusive.

Well, there's the excellent SearchTools site: <http://www.searchtools.com/>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Sat Dec 16 2000 - 07:04:18 PST