Re: [htdig] htdig not seeing links


Gabriel Fenteany (fenteany@calvin.bwh.harvard.edu)
Sat, 05 Jun 1999 07:50:44 -0400


 Hmmm. I have links in tables (and for that matter in pull-down menus too)
  that are followed and indexed just fine. (ht://Dig is the greatest thing
  since sliced bread!) Are the linked files in the same or a sub-directory
to
  the index file entry page? If not (if they do not contain all the URL
  strings of the start_url) then the domain paths have to be defined in
  limit_urls_to also; you can include a list of URLs of arbitrary length in
  both limit_urls_to and start_url, each item separated by a whitespace. Do
  you have a robots.txt file or do you use robots metatags? Maybe you
  inadvertently excluded certain files or directories?

  I'd start a completely new dig htdig -i (and -a if you want alternate work
  files used) after checking these things and then see again. If that
fails,
  I'd explicitly add the the page to start_url and see what happens. The
more
  info the better.

  Good luck.

  Gabriel

>
>
> Hi gang - this one is absolutely making me crazy. I'm not sure if this
> has come up on the list before, but I have no other options at this
> point:
>
> I am trying to index a site that has around 800 or so documents. For
> some reason, htdig fails to see the links.
>
> My .conf file resembles:
>
> start_url: http://www.somewhere.com
> limit_urls_to: ${start_url}
> exclude_urls: /cgi-bin/ .cgi
>
> Okay, so far, no voodoo there. I have this one HTML file that has a
> large table in it, about 70 x 3 (200+ cells). Yep, you guessed it, htdig
> fails to see ANY of the links in the table. According to my logs, it
> never even retrieves it....(i.e. there is no "Retrieval command for
> http://whatever for this particular file).
>
> The file is definitely linked within the site - off the front page for
> that matter (as well as a few other places). The log files show that it
> sees the actual link TO the file (from other files), but it never
> attempts to retrieve it.... :-(
>
> Any ideas on this one? I'm about to take my own life - hehe.
>
> Cheers.
> Scott
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig@htdig.org containing the single word "unsubscribe" in
> the SUBJECT of the message.
>

  --
  Gabriel Fenteany, Ph.D.
  Post-doctoral Fellow &
  WWW VL: Cell Biology Maintainer
  http://vl.bwh.harvard.edu
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sat Jun 05 1999 - 04:06:32 PDT