Tim Perdue, Geocrawler.com (firstname.lastname@example.org)
Tue, 27 Apr 1999 19:08:17 -0500
I have over 1.6 millions pages on my site, and ht://dig wants to reindex
*all* of them every time it digs.
I tried setting up a page that only includes *new* links for it to dig, but
it goes ahead and digs all the old links in its database as well.
I am *not* using the -i option.
Why won't it just dig the new links and add those pages to the database?
It's totally impractical to have it reindex the entire web site everyday (in
fact, it takes 4 days for each dig).
/atlas18gb/htdig/bin/htdig -c /atlas18gb/htdig/conf/1.conf -s >>
This is my 1.conf, excluding the .gif stuff:
limit_urls_to: <<--- OK I'll fix this.
exclude_urls: /cgi-bin/ .cgi
search_algorithm: exact:1 synonyms:0.5 endings:0.1
Thanks! ht://dig is working really well, if I can just get rid of these last
PHPBuilder.com / GotoCity.com / Geocrawler.com
To unsubscribe from the htdig mailing list, send a message to
email@example.com containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Apr 27 1999 - 17:27:00 PDT