Re: [htdig] How to include external list to start_url:


Geoff Hutchison (ghutchis@wso.williams.edu)
Sat, 1 May 1999 16:33:43 -0400 (EDT)


On Sat, 1 May 1999, Gabriel Fenteany wrote:

> conundrum, right? The robot after all cannot divine the intent of people
> who write awful sites and pages, and there's no easy way to teach the robot
> this is there?

You can make limit_urls_to != start_urls. That is, you make up a list of
the servers ending in / for limit_urls_to and leave the start_urls the
same. I would agree with you that I cannot understand why you'd want your
site to have a URL like http://www.foo.com/dumbname.html...

> to index, but I couldn't find how to do it? I'd really appreciate it if you
> told me, and also should the file be a .txt file with tab- or

See http://www.htdig.org/cf_variables.html
You can delimit with whatever whitespace you want. I usually do
one-per-line since it's easy to read.

> file? And, does this file have to be located on the local server or can
> htdig use http to find it elsewhere (may be useful in the future, though

You must have it on the filesystem. If you define limit_urls_to
separately, you could have a "starting" page over HTTP.

> Finally, can you also use an external file for limit_url too?

Yes. see above.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sat May 01 1999 - 13:48:06 PDT