Gabriel Fenteany (fenteany@calvin.bwh.harvard.edu)
Tue, 04 May 1999 07:42:12 -0400

Sorry about all the questions, but since I am dealing with about 300 sites,
I want to make sure before hand-editing for the sites without an index page.

In limit_urls_to can I do:

limit_urls_to: {start_url} http://foo3.com/foodirectory/

etc. where {start_url} are all the sites that have an index page and I was
able to just type in http://foo.com/ in start_url, while
http://foo3.com/foodirectory/ is the directory that the additional start_url
for a site without an index file.

In other words, can I mix {start_url} with additional specific domains for
the "troublesome" sites without index pages in limit_url_to?

Thanks a bunch.


Gabriel Fenteany, Ph.D.
Post-doctoral Fellow
Tel: (617) 278-0390; Fax: (617) 734-2248

>>> For the example above, what they would want me to do is index everything >> located in "http://foo3.com/foostuff/" BUT only if it is linked to >> "foofile.html" > > Right. The start_url is the list of pages that ht://Dig uses for starting > an indexing run. It follows the links as long as they're within > limit_urls_to. It won't find pages that aren't linked to those pages and > it won't go outside of the limit_urls_to directive. > > The limit_urls_to directive is not used except as a comparison for new > URLs. If a new URL doesn't match something in the limit_urls_to directive, > it's ignored. > > So the example I gave is *exactly* what you want. Promise. > >> But, for all but the most terrible sites I need to index, maybe the solution >> you give will work. For the truly terrible ones, well either they fall in >> line or only their starting_url gets indexed. > > I don't think you'll need to make people "fall in line." Life gets > a bit complicated if people want you to index a random assortment of pages > on a server, but it can still be done. > > -Geoff Hutchison > Williams Students Online > http://wso.williams.edu/

