Re: [htdig] Htmerge: "Deleted, invalid"


Subject: Re: [htdig] Htmerge: "Deleted, invalid"
From: David Adams (D.J.Adams@soton.ac.uk)
Date: Mon Jul 24 2000 - 03:04:11 PDT


Quoting Gilles Detillieux <grdetil@scrc.umanitoba.ca>:

> According to David Adams:
> > I use the standard MIPSpro compiler. The script I use (thanks to my
> former
> > collegeaue James Hammick) to setup the Makefile is:
> >
> > #!/bin/sh
> > CFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CFLAGS
> > CPPFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CPPFLAGS
> > LDFLAGS="-mips4 -L/usr/lib32 -rpath /opt/local/htdig-3.1.5/lib";
> > export LDFLAGS
> > ./configure --prefix=/opt/local/htdig-3.1.5 \
> > --with-cgi-bin-dir=/opt/local/htdig-3.1.5/cgi-bin \
> > --with-image-dir=/opt/local/htdig-3.1.5/graphics \
> > --with-search-dir=/opt/local/htdig-3.1.5/htdocs/sample
> >
> > A lot of that is site-specific, and the "-rpath <directory>" option is
> only
> > needed because the compression library is not in a standard place on the
> > machine on which htdig is run.
> >
> > The "-woff all" option suppresses most warning messages. I will remove
> it,
> > recompile htdig and send the result directly to Gilles, it might contain a
> clue.
>
> As Sinclair mentioned, 'you need to have the 2.95.2 gcc and the latest
> gnu "make".' I don't know that anyone has ever gotten ht://Dig to work
> with SGI's own compiler. If fact, we got a lot of reports from folks
> who couldn't even get it to compile.
>
> If you're really determined to get to the bottom of this and make it work
> with the SGI compiler, I wish you well, but I doubt I can help much.
> I looked at the output you sent me, and didn't really see any red
> flags pointing to an obvious problem. I know that the Serialize and
> Deserialize functions for the db.docdb records can be a tad finicky, so
> that would probably be a place to look. There could also be problems
> with incorrect assumptions about word sizes, e.g. if the SGI compiler
> has 64-bit long ints. I'd also look at the db.wordlist records (they're
> ASCII text) before and after htmerge, to see if htdig is actually telling
> htmerge to remove some of these documents, or if htmerge is deciding to
> do so on its own.
>
> For the time being, the ht://Dig code hasn't had much of a workout on
> non-GNU compilers, so it doesn't seem to do well on them. If you can
> help remedy that, great. If you want to get the package working as
> quickly and easily as possible, I'd suggest trying the GNU C and C++
> compilers.
>
> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW:
> http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>

I have been using htdig (3.1.2 and then 3.1.5) on an IRIX system for about a
year and I have been very pleased with it. I would say that we've given it a
good workout here. The problem with the "Deleted, invalid" messages only
occurs with a second, relatively new search index.

The first index is made from a single run of htdig covering 33 servers, all in
the local domain, and on this week's initial dig htmerge reports 49,233
documents and not a single "Deleted, invalid".

The second index is made from two runs of htdig covering a total 969 (yes 969
!) servers using a proxy. Htmerge reports a mere 3,096 documents and 86
"Deleted, invalid".

I have looked at the db.wordlist files (which are written to only by htdig - is
that right?) and it would appear that htdig is flagging the pages for htmerge
to delete and is not finding any words in them.

I can advance these theories:

    It is not a bug, but is due to the use of a proxy. (I use a proxy
    because without one, a portion of the sites on any run of htdig were
    found to be not responding or even unknown. With a proxy, htdig appears
    to have no such problems.)

    It is a bug due to the use of a proxy.

    It is a bug which only shows when compiled under IRIX.

    It is a bug which only occurs when there many different servers.

I intend to re-build the second index using htdig -vvv and perhaps learn
something.

--
David Adams
<D.J.Adams@soton.ac.uk>
Computing Services
Southampton University

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sun Jul 23 2000 - 17:02:01 PDT