Re: htdig: max_hop_count unreliable?


Jeff Breidenbach (jeff@jab.org)
Mon, 28 Dec 1998 13:55:03 -0500


Let's dub the "update with max_hop_count: 1" trick 'Umhop' for the
sake of easier discussion. I'm hoping Umhop will make a substantial
improvement in update dig times for certain situtations.

For example, my archive of the linux-kernel mailing list currently
has over 30,000 URLs, and a relatively small number of (known) new
URLs are added every day.

>What you suggest may or may not work. Since the database keeps track
>of the hop_count of the document, trying max_hop_count 1 probably
>wouldn't work, since it's the maximum from the start_url. So if the
>page you're starting at has a recorded hop_count of 3, nothing will
>be indexed, since links from this will all have hop_counts of 4.

Sounds like it may work. I'll continue setting the start_url to
"the mailing list index page that containing links to the
last n recent messages"

>Your idea might work better when the db merging code is finished
>(next week, hopefully).

Thank you for the advice, and I'll give it a shot after the next
release of htdig and report the results.

Jeff
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:57 PST