Re: [htdig] has anybody seen this before ?


Subject: Re: [htdig] has anybody seen this before ?
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Sep 11 2000 - 14:37:07 PDT


According to Tod Thomas:
> I'm getting this very infrequently but the message is curious. Do I
> have a problem with my db?
>
> DB2 problem...: /opt/app/htdig/common/word2root.db: Error 0
> DB2 problem...: /opt/app/htdig/common/word2root.db: file size not a
> multiple of the pagesize

This is one of the databases used by the "endings" fuzzy match algorithm.
You can rebuild it using "htfuzzy endings". If the error persists after
that, you may have a problem with the DB package built on your system.

> Right now I'm running both htdig and htmerg nightly in a shell script
> with the -a flag. Once both stages are complete I just mv the .work
> files over the existing db files. Is this the wrong thing to do? Is
> there a better way?

This should be fine. The only snag is that if you want to to an update of
an existing database with the -a flag, rather than reindexing everything
from scratch, you will need the .work copies of db.wordlist and db.docdb
around before starting another htdig -a. The db.wordlist isn't used
by htsearch anyway, so it can just stay with the .work suffix on it.
The db.docdb file is used by htsearch, as well as htdig/htmerge, so you
should either copy db.docdb.work to db.docdb after htmerge -a is done,
and keep both copies around, or move it to db.docdb, and copy it back
to db.docdb.work just before beginning the next htdig -a.

See also the contrib/examples/rundig.sh script.

> Could this be a reason some users are complaining
> that content they know exists doesn't show up in the search? Any ideas
> would be appreciated.

There can be many reasons why documents aren't being indexed, or not
showing up in searches. A problem with the endings database would only
cause this if they were counting on the fuzzy matching to get a match.
It shouldn't prevent exact matches from working.

More likely, the documents in questioned aren't even being indexed.
The most common cause of this is that none of the documents in your
start_url lead, either directly or indirectly, to these document
through HTML hypertext links, which is the only means htdig uses to
spider through and index your pages. Another possibility is that the
documents are being disallowed or excluded for any of a number of reasons.

See http://www.htdig.org/FAQ.html#q4.1
    http://www.htdig.org/FAQ.html#q4.15
    http://www.htdig.org/FAQ.html#q5.1
and http://www.htdig.org/FAQ.html#q5.18

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Mon Sep 11 2000 - 14:39:14 PDT