Re: [htdig] indexing robots and weird links?


Geoff Hutchison (ghutchis@wso.williams.edu)
Sat, 29 May 1999 16:50:33 -0400


At 3:41 PM -0400 5/29/99, nets@searchtools.com wrote:

>For example, do some robots follow links in forms? I've read that
>plug-ins and multimedia data can keep robots from spidering a page --
>is that still true? Any advice for framed sites beyond using the

I doubt they keep robots from spidering a page. However, I doubt many (if
at all) follow forms, much less <EMBED> or <OBJECT> tags. The robots will
still index the rest of the page, but they won't consider these as links.

Most spiders themselves are good about following up frames, though there
doesn't seem to be a good solution for how to return frames in search
results. For example, how do you specify a URL to the framset with one
frame changed from the default?

><noframes> tags? What about complex JavaScripts that expect the user
>to pick or type something?

JavaScripts are bad news from a robot's perspective. People expect that
they'll somehow be able to pick up URLs in JavaScripts. But there's no easy
way to do this without worrying about what the JavaScript writer intended,
much less if there are relative or absolute URLs constructed on the fly,
much less strings that look something like a URL that aren't.

The best solution is to ensure users have links beyond JavaScript
navigation. This also ensures users can browse with non-JavaScript browsers.

>views. And I'm sure that there are some other wild ones out there,
>so send them to me (and let me know if you want to be identified
>and/or quoted).

Apache's FancyIndexing listings can cause problems. In the early Apache 1.3
betas, they added links on the indexes that let you sort by the columns in
increasing or decreasing order. Already this presents problems to a robot
since there are several pages with essentially the same information, but
with different URLs. Worse still, the early beta releases would fall into
an infinite-loop of URLs. After selecting a sorting option, the URLs would
be relative to that: e.g.

http://www.foo.com/ <- main FancyIndexing listing
http://www.foo.com/?N=D <- one sorting option
http://www.foo.com/?N=D?S=D <- a link on the above page.
.. ad nauseum

Fortunately, this bug was fixed quickly. :-)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sat May 29 1999 - 13:09:00 PDT