heddy Boubaker (firstname.lastname@example.org)
09 Apr 1998 11:53:22 +0200
<> "Andrew" == Andrew Scherpbier <email@example.com> writes:
Andrew> There is actually another method that may or may not be easier to
Andrew> maintain. htdig looks for the HTML meta tag name "htdig-noindex".
Andrew> So if the documents you do not want to cover in a search contain
Andrew> "<meta name=htdig-noindex
Andrew> value=foo>", they will not be found in a search.
Andrew> Unfortunately, this only covers HTML documents.
Ok! but that not enough, lets elaborate a little: Suppose we want to have 2
database: 1 for our Intranet that will index everything that is accessible
from our local net, and the other for the `Externet' (the Internet) for
everything that is accessible from the outside only. Your solution is not
good in this case because htdig will not index the document in both cases
... The only solution we have for now is to make htdig run under IP addresses
matching the local/extern stuff (as explained in my previous msg). Maybe a
new META will help, BTW it will be nice that htdig take into account some
other few metas (it could be in the TODO list):
new: name=DISTRIBUTION content="(external|extern)|(internal|intern|intranet|local)"
Tell htdig what is the distribution of the document (htdig should know
in what mode it is running, this must be new option to add).
- these 2 following are often used by others search engines -
use: DESCRIPTION, htsearch should use what is in the description meta tag
instead of the title of the document.
use: ROBOTS - there is a patch for that I think !
Another thing that could be changed IMHO is what is displayed in long-format
when no keyword is found in the description stored for a doc: currently "none
of the keywords was found in the top of this document" is very confusing for
users, they often think that ht://Dig is buggy and that it show documents not
matching the request. We have to found a new message for that, maybe
"keywords was found in this document but no description available" should be
more clear ? What do you think of that ?
Lastly, another thing to add maybe could be the generalization of the use of
regexp instead of substring (exclude-url, limit-urls-to ...)
BTW where can we find soundex, metaphone and endings rules for French (any
other froggies out there ? ;-))
ht://Dig is a very useful and very well written tool but it still need some
very few little ameliorations to became the perfect search engine of our
dreams, thanks a lot for it Andrew - hope you'll have time again to work on
it next -.
- heddy -
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:01 PST