Re: [htdig] Quick Question


Geoff Hutchison (ghutchis@wso.williams.edu)
Thu, 13 May 1999 11:11:47 -0400 (EDT)


On Thu, 13 May 1999, Brandon LaBonte wrote:

> I am indexing a bunch of web servers here at ttu.edu. When I start htdig it
> runs for little 12+ hours, before I kill it usually. Is this normal, am I
> missing some obvious optimization?

That depends. How many documents do you have? How many servers? How fast
do your servers return requests? How big are your documents? etc.

If you're worried about what it's doing, you can run with more verbose
messages by adding -v flags (or -vv or -vvv or...). One flag will give you
a short outline of what htdig is doing.

You can also limit the depth of the initial indexing using server_max_docs
or max_hop_count. If you then index without these, it should go back and
index pages it didn't visit earlier.

> Secondly, As a fallback position, I would like to be able to index servers
> that have ttu.edu on the end, AND www in the URL (primary servers
> only)...Any way to do this? I see that the Limit URL's stuff is all OR'd
> together.

Not easily. You're right that the limit_url_to patterns are OR'ed. Here at
Williams we can easily list the servers that we want, for example in a
separate file, and index only those. e.g.:

start_url: `/opt/htdig/conf/williams.urls`

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu May 13 1999 - 08:22:25 PDT