Brandon LaBonte (B.LaBonte@ttu.edu)
Thu, 13 May 1999 10:30:34 -0500
Thanks for the response...
We think about 250 servers, and probably 100,000 unqiue URL's....Does this
exceed the reasonable capacity of htdig?
Also, we are running this on a P2 with only 64 megs of Ram, and IDE Drives.
We are trying to get the ram to 128 (maybe 256), and the drives to SCSI.
Will this make a marked improvement in performance, or is this more an
--- Brandon LaBonte Micro Computer/Network Support Specialist Academic Computing Services - Texas Tech University B.LaBonte@ttu.edu
"Imagination is more important than knowledge "- Albert Einstein
> -----Original Message----- > From: Geoff Hutchison [mailto:email@example.com] > Sent: Thursday, May 13, 1999 10:12 AM > To: Brandon LaBonte > Cc: firstname.lastname@example.org > Subject: Re: [htdig] Quick Question > > > On Thu, 13 May 1999, Brandon LaBonte wrote: > > > I am indexing a bunch of web servers here at ttu.edu. When I > start htdig it > > runs for little 12+ hours, before I kill it usually. Is this > normal, am I > > missing some obvious optimization? > > That depends. How many documents do you have? How many servers? How fast > do your servers return requests? How big are your documents? etc. > > If you're worried about what it's doing, you can run with more verbose > messages by adding -v flags (or -vv or -vvv or...). One flag will give you > a short outline of what htdig is doing. > > You can also limit the depth of the initial indexing using server_max_docs > or max_hop_count. If you then index without these, it should go back and > index pages it didn't visit earlier. > > > Secondly, As a fallback position, I would like to be able to > index servers > > that have ttu.edu on the end, AND www in the URL (primary servers > > only)...Any way to do this? I see that the Limit URL's stuff > is all OR'd > > together. > > Not easily. You're right that the limit_url_to patterns are OR'ed. Here at > Williams we can easily list the servers that we want, for example in a > separate file, and index only those. e.g.: > > start_url: `/opt/htdig/conf/williams.urls` > > -Geoff Hutchison > Williams Students Online > http://wso.williams.edu/ >
This archive was generated by hypermail 2.0b3 on Thu May 13 1999 - 08:34:49 PDT