[htdig] Still not all search hits shown


Subject: [htdig] Still not all search hits shown
From: Reich, Stefan (Stefan.Reich@dgn-service.de)
Date: Wed Sep 06 2000 - 02:47:02 PDT


Sorry Guys and Girls,

the problem wasn't solved by reindexing with option -i.

I still have the problem, but it's only happening on the site I'm using
url_part_aliases.

Maybe a snippet of my config could bring some light into the dark:

--------- so that's in my htdig config -----------

start_url: \
        http://194.115.222.91/ \
        http://www.arzt.de/ \
        http://www.yavivo.de/ \
        http://community.yavivo.de/Expertenrat/Forum/ \
        http://arzt.dgn.de/doc/public/Anbieter/KVTH/ \
        <long list of other urls>

limit_urls_to: ${start_url} http://www.dgn.de/

url_part_aliases: http://194.115.222.91/ replace#1

--------- and that's in my htsearch config -------------

url_part_aliases: http://www.dgn.de/ replace#1

--------------------------------------------------------

The behaviour is very strange: Search for Homepage (restrict=www.dgn.de)
returns 700 Results,

Page 1 shows 10, Page 2 shows some 4, Page 3 is empty as 4 and 5, 6 to 10
show all the same two results.

????????

HELP !!!

P.S.: Restricted searches on all other URLs seem to work fine.

-----Ursprüngliche Nachricht-----
Von: Geoff Hutchison [mailto:ghutchis@wso.williams.edu]
Gesendet: Dienstag, 5. September 2000 14:53
An: Reich, Stefan
Cc: htdig@htdig.org
Betreff: Re: [htdig] Not all search hits shown

At 10:53 AM +0200 9/5/00, Reich, Stefan wrote:
>I do an url replacement (only for this site!) in the htdig config replacing
>ip by replace#1 and in htsearch config replace#1 by fqdn.
>Nevertheless, there are some search results from the suspicious site shown.

In answer to the later question about "what can happen in an update,"
it depends.

If you've modified the url_part_aliases attribute and then run an
update run, it's not going to re-code the URLs that are already in
the database. And IIRC, htsearch doesn't look for alternative
encodings. So only new URLs will come up in searches.

Once you've re-run with -i, all URLs are now properly encoded with
the new url_part_aliases setting and things should work fine. Updates
after this point should also work fine.

I'm classifying this as a bug report and I'll see what I can do with
htsearch. However, it won't be able to guess different encodings, so
it will only help if you add in a new encoding (v. changing an old
one).

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Sep 06 2000 - 02:50:07 PDT