Re: htdig: Searching not working?


Brian Litke (blitke@sedl.org)
Thu, 18 Jun 1998 14:19:55 -0500


Colin,

I agree with Andrew, corrupted db and nice customization!!

I have been runing the "rundig" command by hand instead of doing it from a
CRON, since periodically the CRON rundig would produce corrupted results
with results pointing to pages that didn't include the keyword I was
searching for. I still haven't figured out why. I seem to remember
another post last month saying their CRON rundig was occasionally failing.

Regarding the "(None of the search words were found in the top of this
document.)" message, that does appear sometimes even when the database is
fine. For instance, if you search my site for the word "find" and go to
page 9 of the results,
http://www.sedl.org/cgi-bin/htsearch?restrict=&exclude=&config=htdig&method=or&f
ormat=builtin%2Dlong&words=find&page=9
you'll see that the third link on the results page has that message. If
you go to the linked page and do a find for "find", you'll see the word is
indeed on the page, just far enough down that HTDIG doesn't print the word
and context on the result page.

By the way, I have a question. I noticed in your e-mail:
>search_algorithm: exact:1 synonyms:0.5 endings:0.1

I've been wondering if HTDIG supports partial word searches, such as
searching for "app" to find both "apple" and "application". The user
interface doesn't have a partial/exact word search function. Is this
something I can change in the configuration file?

Thanks,
Brian

>Have a look at:
>
> http://www.summerworks.on.ca
>
>And do a search for "john". There are a bunch of results that come back
>with "(None of the search words were found in the top of this document.)".
>
>First, what does that mean? And second, none of the pages that generate
>this message actually have the string "john" on them ... so why do they
>show up in the search?
>
>Here is the conf file, if it helps:
>
>--- start conf file ---
>
>database_dir: /export/htdig/db/sw98
>start_url: http://www.summerworks.on.ca/
>limit_urls_to: http://www.summerworks.on.ca/
>exclude_urls: /staff/ /search/ .inc .doc .mcw nboard/edit.php3
>nboard/delete.php3
>max_head_length: 10000
>search_algorithm: exact:1 synonyms:0.5 endings:0.1
>matches_per_page: 50
>excerpt_legnth: 200
>
>template_map: sw98 sw98
>/www/summerworks/search/results-template.html
>
>star_image: http://www.summerworks.on.ca/gifs/x-star.gif
>star_blank: http://www.summerworks.on.ca/gifs/x-nostar.gif
>max_stars: 5
>
>search_results_header: /www/summerworks/search/results-header.html
>search_results_footer:
>nothing_found_file: /www/summerworks/search/results-nomatch.html
>
>
>
>.........................................................................
>Colin Viebrock Creative Director - Private World Communciations
>cmv@privateworld.com 331 - 67 Mowat Avenue
>http://www.privateworld.com Toronto, Ontario, CANADA, M6K 3E3
>ICQ: 11386088
>
> 85.7% of all statistics
> are made up on the spot.
>----------------------------------------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-request@sdsu.edu containing the single word "unsubscribe" in
>the body of the message.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:34 PST