Re: htdig: max_hop_count unreliable?


Geoff Hutchison (ghutchis@wso.williams.edu)
Mon, 28 Dec 1998 12:00:06 -0500 (EST)


On Mon, 28 Dec 1998, Jeff Breidenbach wrote:

> I was reading the documentation today at http://www.htdig.org
> and noticed a disclaimer about the max_hop_count setting.
>
> "Unfortunately, [max_hop_count] only works reliably when a complete
> index is created, not an update."

We've done some work on making the hop_counts more reliable. I don't know
*how* reliable, since it hasn't gotten as much rigorous testing as some
other areas. See below for a little more on this.

> change. So I was going to create a page that links to all the new
> pages, and run an update dig against that page with a max_hop_count of
> 1.

Hmm. This might be great for you, but I get pretty solid results from just
running update digs on the normal config. It does this by checking every
URL in the database and updating modified ones, adding new links as
necessary.

What you suggest may or may not work. Since the database keeps track of
the hop_count of the document, trying max_hop_count 1 probably wouldn't
work, since it's the maximum from the start_url. So if the page you're
starting at has a recorded hop_count of 3, nothing will be indexed, since
links from this will all have hop_counts of 4.

I haven't spent a lot of time looking at this section of code, so my
impressions might be a little off. Your idea might work better when the db
merging code is finished (next week, hopefully).

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:56 PST