htdig: patch to allow URL additions


Edmond Abrahamian (edmond@greencedars.com.lb)
Thu, 16 Apr 1998 22:14:51 +0200


Hi All,

I am submitting this small patch for the file htdig/main.cc. It allows
the option of digging new URLs and adding them to an existing database
without having to re-dig all the URLs that are already in the database.

Please note that this new option is not useful if you use the
-i switch!

The patch can be obtained by ftp from:

   ftp://ftp.greencedars.com.lb/pub/htdig/htdig_0.2_patch.tar.gz
   please login as user 'guest' and use password 'guest'.

I should mention that I am talking about the htdig-3.0.8b2 version
of htdig's main.cc in this context.

How it works: if you supply the (new) -n option to htdig, it no
longer will make use of the config variables start_url and
limit_urls_to. Instead, it makes use of the new config variables
new_url and limit_new_urls_to and in addition does not initialize
the retriever object with all the urls already existing in the
database.

Of course, once the new url(s) are digged, we still have to htmerge
afterwards but we save an awful amount of time by incrementally adding
URLS rather than digging everything from scratch.

I have tested it on several small databases and one rather large
database, with success.

Please communicate bugs through the usual channels (the htdig
mailing list?), or email me directly.

    Edmond Abrahamian (edmond@greencedars.com.lb)
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:01 PST