Re: [htdig] How to get htdig to search through java servlets

Torsten Neuer (
Wed, 16 Jun 1999 10:01:42 +0200

According to Lim Swee Tat IS NCS:
>Basically, I'm supposed to implement a search engine in the company's
>intranet. But due to the way the intranet is working now, I'm having a bit
>of major trouble getting any spider/robot or agent to set up the basic
>database at all.
>Basically, when the user makes a request to the site, on a prespecified port
>number say, 81, a servlet thread class is called to set up the connection
>and authenticate the user. Thereafter, each page he goes to is basically
>generated by the different servlets.
>The basic problem here is that coz the servlet produces output directly to
>the server, the search robot/spider or agent will be best placed to search
>information by placing the request directly to the server, but there is no
>method to authenticate the robot/spider or agent. And, since the servlets
>are the ones generating the output, how can the robot/spider or agent search
>that particular output rather than the .java file which it is not searching

Basically, servlets are not a problem as long as you go via HTTP and have
not listed the appropriate URLs in the bad_urls list.

Your problem seems to be with authentication. AFAIK, ht://Dig supports
the standard (basic) authentication scheme.

What authentication scheme are you using?
What does htdig (verbose mode) tell you?
What do the server logs say?


InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail:            Internet:

------------------------------------ To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Wed Jun 16 1999 - 00:22:31 PDT