Re: [htdig] Using a different program for digging


Subject: Re: [htdig] Using a different program for digging
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Sep 06 2000 - 15:07:36 PDT


According to Luis Henrique Cassis Fagundes:
> I need a search engine for a heavy loaded website with a lot of
> information, and I'd like to use htdig. The problem is that the texts to
> be indexed are not in a page, they're in an Oracle database, so htdig
> can't index them. I want to make a program (that I believe it will be
> much simpler than htdig itself) to read the database and generate
> db.docdb and db.wordlist, so htmerge would create the word database as
> it were from the website, as I want.
> For that I need the specification of these two files, I didn't find it
> in the site. Has anyone develop something like this before, or know
> where to find these specifications?

We don't really have any documentation that gives these specifications,
and the specifications do tend to change from time to time, so the source
code is really the only specification to go on.

I can point out a few problems with your approach, though:

1) The database specs are changing quite substantially from 3.1.x to
3.2.x, so by developing for 3.1.x you're hitching your cart to a dying
horse. In 3.2.x, there is no longer a db.wordlist, htmerge is only for
merging databases, and htdig will create the db.docdb, db.docs.index
and db.words.db files itself.

2) The results from searches in htsearch are supposed to be web pages,
each with their own URL. I don't know how you'd end up interfacing
htsearch to your database if it doesn't have a web interface of some sort.

If it does have a web interface, why not use htdig to index it through
that interface? You do mention that it is a website, so why do you feel
htdig can't index its contents? It is capable of dealing with dynamic
content, and isn't restricted to static pages.

The usual solution to indexing a database using htdig really comes down
to developing your own web interface to the database, if you don't already
have one, or using the DBMS's own search facilities and not using htdig.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Sep 06 2000 - 15:13:13 PDT