Re: [htdig] htdig not seeing links


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 7 Jun 1999 09:17:21 -0500 (CDT)


According to scottb:
> I am trying to index a site that has around 800 or so documents. For
> some reason, htdig fails to see the links.
>
> My .conf file resembles:
>
> start_url: http://www.somewhere.com
> limit_urls_to: ${start_url}
> exclude_urls: /cgi-bin/ .cgi
>
> Okay, so far, no voodoo there. I have this one HTML file that has a
> large table in it, about 70 x 3 (200+ cells). Yep, you guessed it, htdig
> fails to see ANY of the links in the table. According to my logs, it
> never even retrieves it....(i.e. there is no "Retrieval command for
> http://whatever for this particular file).
>
> The file is definitely linked within the site - off the front page for
> that matter (as well as a few other places). The log files show that it
> sees the actual link TO the file (from other files), but it never
> attempts to retrieve it.... :-(

What version of htdig are you running? On what system? How does the
front page, as well as the other places, link to the page that's not
being indexed? Remember that htdig doesn't handle JavaScript links, only
standard HTML links. Also, versions before 3.1.2 didn't handle links
that were missing the closing </a> tag at all, and even 3.1.2 may still
have problems with this. make sure that the links to this document are
properly structured. An htdig -vvv of any document that links to your
missing document should show how all the links in that first document
are being parsed, so you should see what's being picked up and what isn't.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Jun 07 1999 - 06:34:00 PDT