Re: htdig:start_url problem


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 15 Dec 1998 14:18:37 -0600 (CST)


According to Jesús Arribi:
> I want to index the next documents:
>
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/arquivoscorunha.html
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/cinemascor.html
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/copascorunha.html
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/excursionsan.html
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/expomuseoscor.html
> http://www.cesga.es/gacetatri/galego/textos/textosaxenda/hoteiscorunha.html
>
> I set:
> start_url: http://www.cesga.es/
> and
> limits_urls_to: ${start_url}/gacetatri/galego/textos/textosaxenda/
>
> I execute rundig and I obtain the next output:
>
> htdig: 1 server seen:
> htdig: www.cesga.es:80 1 document
> htmerge: Total word count: 72
> htmerge: Total documents: 1

Unless the main index page for the site in your start_url contains
links directly to the pages you want to index in the list above, then
htdig won't even see them. This is because every href it finds in your
start_url, or any subsequent document it digs, is checked against the
limit_urls_to lists. It it doesn't match, it doesn't get looked at.

What you should probably do is set your parameters as:

start_url: http://www.cesga.es/gacetatri/galego/textos/textosaxenda/
limit_urls_to: ${start_url}

and make sure the index.html in .../textos/textosaxenda links to all the
documents you want in that directory, or let the server generate the
index from the directory on the fly.

If the start_url you're using now does point directly to the documents
you want to index, the problem may be that the start_url ends with a
slash, and you're appending another slash in limit_urls_to, so it's
trying to match a double slash. Try:

limits_urls_to: ${start_url}gacetatri/galego/textos/textosaxenda/

without a "/" after the "}".

By the way, Jesús, if you haven't heard, 3.1.0b3 is out now, so if you
want to try the latest version, it should compile properly on your system.
There's a lot that's changed since 3.0.7!

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:52 PST