[htdig] Speed of Indexing and size of Index


Subject: [htdig] Speed of Indexing and size of Index
From: amit tewari (amit_tewari@yahoo.com)
Date: Tue Jun 06 2000 - 11:23:28 PDT


Hi All ,

I am trying to evaluate htDig for the speed of
indexing, size of indexes and speed of search ,

I tried to Index a directory which has around 2080
documents and total size of the documents is around 1
GB .

It took me around 50 minutes to index the documents on
a sun-Ultra sparc 10 with around 1.5 GB of Swap and
500 MB of physical Ram . I found it too slow specially
if I am trying to index the local files ( not through
HTTTP but through the fiel system added to that I
never saw wait I/O while indexing was going on )

Size of the Index diecroty was about 300 MB ( which
about 1/3 the size of the documents ) I find it too
big specially if I consider some other engines which
give the index size of about 1/10 of the document size

I wonder if I am missing something ???

Please fine attached the output of htdig -v

######Contents of htdig.conf file BEGIN##############

database_dir: /ora05/index/index
start_url: http://dev2.mydomain.com/
local_urls:
http://dev2.zantaz.com=/uhome/atewari/tmp/Docs/
local_urls_only: true
local_default_doc Index.lst default.html
default.htm index.html index.htm
limit_urls_to: ${start_url}
/uhome/atewari/tmp/Docs/
exclude_urls: /cgi-bin/ .cgi
bad_extensions: .wav .gz .z .sit .au .zip .tar
.hqx .exe .com .gif \
                        .jpg .jpeg .aiff .class .map
.ram .tgz .bin .rpm .mpg .mov .avi

maintainer:
unconfigured@htdig.searchengine.maintainer
max_head_length: 100
max_doc_size: 500000000
no_excerpt_show_top: true
search_algorithm: exact:1

######Contents of htdig.conf file END##############

__________________________________________________
Do You Yahoo!?
Yahoo! Photos -- now, 100 FREE prints!
http://photos.yahoo.com


log.txt

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Jun 06 2000 - 09:13:16 PDT