Re: [htdig] docs indexed twice


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 30 Jun 1999 11:49:50 -0500 (CDT)


According to Wolfgang Gaida:
> my problem is the following one:
> after indexing a site, in the search results a lot of documents are presented
> twice; once as http://www.foo.com/path/ and once as
> http://www.foo.com/path/index.htm. I tried a "remove_default_doc: index.htm" in
> the *.conf file, but that didn't solve the problem.
> Do You have any idea where I'm wrong ?

I think once the dups are in the index, they'd tend to stay there.
You'd need to rebuild your database from scratch, after defining
remove_default_doc as you want it, in order to have an index that's free
of the superfluous documents. This attribute can be a list as well.
For instance, if your server will recognize any of index.html, index.htm
or default.htm as a directory index file, you can define:

remove_default_doc: index.html index.htm default.htm

Don't add names to the list that aren't treated as directory index files
by your server(s), or you may remove documents that you don't really
want removed from the index.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Jun 30 1999 - 09:05:48 PDT