[htdig] Re: htsearch: No title in search results (htdig-3.2.0b3-112600)(PR#964)


Subject: [htdig] Re: htsearch: No title in search results (htdig-3.2.0b3-112600)(PR#964)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Sun Dec 10 2000 - 20:16:46 PST


Robert,

This has *always* been the case with server_max_docs. It controls the
number of documents retrieved from the server, not necessarily the number
of documents in your databases. As links are encountered, "stubs" are
added to the document with the information so far (like link text).

In version 3.1.x, you simply *had* to run htmerge before you could get
databases to be used by htsearch. In the 3.2 code, this step is
removed--technically you can search databases right as they're produced by
htdig. It's generally advised to run the databases through htpurge.

The appropriate htpurge attribute to get rid of these "stubs" is
remove_unretrieved_urls:

<http://dev.htdig.org/htdig-3.2/attrs.html#remove_unretrieved_urls>

Personally, I think it's just fine to see stubs in search results, as long
as you know why they're there. (Google and many other engines have similar
effects.)

-Geoff

On Sun, 10 Dec 2000, Robert La Ferla wrote:

> Geoff,
>
> I did what you suggested and I think I know what may be happening. I
> have server_max_docs set to 2 (for testing but it does still occur at
> much higher values) The documents that don't have titles/descriptions
> have not be retrieved by htdig so there is no title or description.
> Instead, a parent document that has a HREF to those documents was
> retrieved and parsed. Even if I set server_max_docs to something
> higher, this will still occur for a good number of documents that were
> referenced but not retrieved. I'm not sure if there's a configuration
> option that will correct this or not. I suppose if the document
> referenced results appeared after the document retrieved results, it
> would be alot better. Is this possible? I will now try with the latest
> snapshot to see if things have improved.
>
> Thanks,
> Robert
>
>
>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Sun Dec 10 2000 - 20:26:25 PST