RE: [htdig] Segfault indexing a site with 3.2.0b2


Subject: RE: [htdig] Segfault indexing a site with 3.2.0b2
From: NEPOTE Charles (Neuilly Gestion) (charles.nepote@cetelem.fr)
Date: Mon May 29 2000 - 02:33:20 PDT


Geoff Hutchison wrote:

> At 11:41 AM +0200 5/24/00, NEPOTE Charles (Neuilly Gestion) wrote:
> >Because of the bug I submitted I need to use
> worldlist_compress: false.
> >Is it the reason for such a growing of db.words.db ???
>
> Yes. Remember that the most frequently requested feature was phrase
> searching (and near searching). In order to do this properly, you
> really have to store all the word locations instead of just one copy
> of a word per document.

All right. Phrase and near searching are important features.

 
> You mentioned that you are running on Linux. Can you give me the
> exact versions of ht://Dig (i.e. are you running a snapshot or the
> 3.2.0b2 release), your compiler, and your Linux distribution?
>
> I have never seen problems with the compression before, so we would
> like to work this out.

[after many tests]
The problem of segfault whith wordlist_compress: true is half-solve.
I was using egc++ RPM instead of libstdc++-devel RPM.
Know the crash comes near the end of indexing process, nor at the beginning.

[What I have done]
I am running ht://Dig 3.2.0b2 downloded at:
http://www.htdig.org/files/htdig-3.2.0b2.tar.gz

I am using Linux Mandrake 7.0, fresh installed (only installed for the use
of the beta of ht://Dig ; with no particular modification).

I :
-- untar
-- ./configure
-- it points that fstream.h was missing and ask me to install lbstdc++
-- installed libstdc++-devel packages (which contains fstream.h)
-- ./configure
-- make
-- make install
-- create directories and SymLinks to let the install works
-- rundig -v -s|tee /opt/www/var/htdig/xxx01.txt
-- [wait more than 10 hours and :]
54211:54385:5:http://xxx.zzz.fr/qqq.htm
  DMA memory shortage. Temporaly falling back on virtual DMA
FAT bread failed

Note that it index about 20 000 documents in about 1 hour ; then, the more
it indexes the more it slows ; the disk usage is quite high after one hour.
The machine has 128 Mo of RAM ; I put wordlist_cache_size: 50000000

 
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/



This archive was generated by hypermail 2b28 : Mon May 29 2000 - 00:23:35 PDT