Re: [htdig] Exclude_urls


Subject: Re: [htdig] Exclude_urls
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Mar 24 2000 - 06:09:00 PST


At 10:53 AM +0000 3/24/00, bs@hi.is wrote:
>I'm experiencing some mayor difficulties trying to exclude some 6000 urls
>to userpages for my local domain.
>The urls vary so I generate a list before every dig which I then include
>into the conf I am using. First off I tried putting each url on a different

Well if you're generating a list, for sanity's sake, you can include
a file into the config file:

exclude_urls: `/path/to/excludes`

>A dig that normally took 4 hours is now still going after 14 hours. A
>request to /cgi-bin/htsearch is now hogging the system for a long time and

Whoa. Hold on a second. Don't tell me you tried to search while htdig
was running? That's a no no.

>So I guess my questions are : Can I exclude these 6000 user urls ? If so,
>how ?

Sure. You're probably doing it the right way. You say it's still
going after 14 hours--does it seem like it's hitting the disk more or
swapping in and out of memory? What is the load like on the server? I
would guess that adding 6000 patterns to exclude_urls might be
pushing you "over the top" and you're now using VM.

Also, you should probably try running with htdig -v once and taking a
look at the "trace" that it gives of what URLs it's indexing. Does
this match what you saw before? How fast is it making progress? Is it
taking 14 hours because each individual URL now takes longer to
process or because it's "escaped" somewhere.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Mar 24 2000 - 05:28:38 PST