Re: [htdig] Using a different program for digging


Subject: Re: [htdig] Using a different program for digging
From: Luis Henrique Cassis Fagundes (lhfagund@ig.com.br)
Date: Thu Sep 07 2000 - 01:10:33 PDT


        Hi,
        Maybe I should have told why I want to do these, since I knew the idea
sounds very strange :-).
        The fact is that all the pages are static, but if the page does not
exist, the server will create them from the database. We have about
250.000 old articles in the database backup and we want a search to
access these articles, and we are studing the best way to do this.
Restoring the backup and creating links to the pages to allow htdig
search them is not a good solution, because if we change the layout of
the page tomorrow, we will crash our server to generate all pages again
(we publish about 1000 articles a day). Another problem is that we don't
want to index words that are in the page but not in the article.
        I'm convinced that digging the database instead of digging pages is not
a good idea. Now I'm looking for a way to make htdig search pages in one
address, but index as it were in another address, indexing the way I
need.
        Thanks,
        []s
        Luis

Gilles Detillieux wrote:
>
> We don't really have any documentation that gives these specifications,
> and the specifications do tend to change from time to time, so the source
> code is really the only specification to go on.
>
> I can point out a few problems with your approach, though:
>
> 1) The database specs are changing quite substantially from 3.1.x to
> 3.2.x, so by developing for 3.1.x you're hitching your cart to a dying
> horse. In 3.2.x, there is no longer a db.wordlist, htmerge is only for
> merging databases, and htdig will create the db.docdb, db.docs.index
> and db.words.db files itself.
>
> 2) The results from searches in htsearch are supposed to be web pages,
> each with their own URL. I don't know how you'd end up interfacing
> htsearch to your database if it doesn't have a web interface of some sort.
>
> If it does have a web interface, why not use htdig to index it through
> that interface? You do mention that it is a website, so why do you feel
> htdig can't index its contents? It is capable of dealing with dynamic
> content, and isn't restricted to static pages.
>
> The usual solution to indexing a database using htdig really comes down
> to developing your own web interface to the database, if you don't already
> have one, or using the DBMS's own search facilities and not using htdig.
>
> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-unsubscribe@htdig.org
> You will receive a message to confirm this.
> List archives: <http://www.htdig.org/mail/menu.html>
> FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Sep 07 2000 - 06:20:55 PDT