Re: htdig: Htdig and wwwoffle


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Fri, 28 Aug 1998 10:46:49 -0400 (EDT)


> 2) htdig would need to use only the URLs provided to be searched, not
> follow links.

This is already there. You could simply provide the list as the start_urls
and set max_hops to 0. Though I haven't tried this sort of thing, this
should do what you want--ht://Dig will index the URLs you provide and
won't follow any links.

> 3) htdig would need to not use the robots.txt because these will not
> have been cached.

Hm. Well ht://Dig checks for the existence of the file. Perhaps a request
to wwwoffle for the robots.txt should just return 404? This may already be
the current behavior. If ht://Dig doesn't find the file, it assumes there
are no restrictions (as per the standard).

> 4) wwwoffle will need to provide the CGI interface to htdig.

In answer to your question and the question on writing a Java servlet,
you don't have to use htsearch directly to interface to ht://Dig. For one,
the databases and config files are all there for anyone to use. For
another, htsearch will run from the command line, which can circumvent CGI
problems somewhat.

> Is htdig still in development and would these changes be possible?

Yes, ht://Dig is still in development. I'll be releasing a new version
called 3.1.0b1 in a week or so (I'm the maintainer of htdig3 as
development on htdig4 begins). I don't think these changes are great for
either ht://Dig or WWWoffle.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:19 PST