Geoff Hutchison (ghutchis@wso.williams.edu)
Sat, 1 May 1999 16:33:43 -0400 (EDT)
On Sat, 1 May 1999, Gabriel Fenteany wrote:
> conundrum, right? The robot after all cannot divine the intent of people
> who write awful sites and pages, and there's no easy way to teach the robot
> this is there?
You can make limit_urls_to != start_urls. That is, you make up a list of
the servers ending in / for limit_urls_to and leave the start_urls the
same. I would agree with you that I cannot understand why you'd want your
site to have a URL like http://www.foo.com/dumbname.html...
> to index, but I couldn't find how to do it? I'd really appreciate it if you
> told me, and also should the file be a .txt file with tab- or
See http://www.htdig.org/cf_variables.html
You can delimit with whatever whitespace you want. I usually do
one-per-line since it's easy to read.
> file? And, does this file have to be located on the local server or can
> htdig use http to find it elsewhere (may be useful in the future, though
You must have it on the filesystem. If you define limit_urls_to
separately, you could have a "starting" page over HTTP.
> Finally, can you also use an external file for limit_url too?
Yes. see above.
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Sat May 01 1999 - 13:48:06 PDT