Geoff Hutchison (firstname.lastname@example.org)
Sat, 29 May 1999 16:50:33 -0400
At 3:41 PM -0400 5/29/99, email@example.com wrote:
>For example, do some robots follow links in forms? I've read that
>plug-ins and multimedia data can keep robots from spidering a page --
>is that still true? Any advice for framed sites beyond using the
I doubt they keep robots from spidering a page. However, I doubt many (if
at all) follow forms, much less <EMBED> or <OBJECT> tags. The robots will
still index the rest of the page, but they won't consider these as links.
Most spiders themselves are good about following up frames, though there
doesn't seem to be a good solution for how to return frames in search
results. For example, how do you specify a URL to the framset with one
frame changed from the default?
>to pick or type something?
much less if there are relative or absolute URLs constructed on the fly,
much less strings that look something like a URL that aren't.
>views. And I'm sure that there are some other wild ones out there,
>so send them to me (and let me know if you want to be identified
Apache's FancyIndexing listings can cause problems. In the early Apache 1.3
betas, they added links on the indexes that let you sort by the columns in
increasing or decreasing order. Already this presents problems to a robot
since there are several pages with essentially the same information, but
with different URLs. Worse still, the early beta releases would fall into
an infinite-loop of URLs. After selecting a sorting option, the URLs would
be relative to that: e.g.
http://www.foo.com/ <- main FancyIndexing listing
http://www.foo.com/?N=D <- one sorting option
http://www.foo.com/?N=D?S=D <- a link on the above page.
.. ad nauseum
Fortunately, this bug was fixed quickly. :-)
Williams Students Online
To unsubscribe from the htdig mailing list, send a message to
firstname.lastname@example.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Sat May 29 1999 - 13:09:00 PDT