htdig: WWWOFFLE and htdig


Peter Stamfest (e9125730@student.tuwien.ac.at)
Thu, 7 Jan 1999 11:15:28 +0100 (CET)


Hi,

I'm a very happy user of WWWOFFLE and htdig by now, so I want to share
some of my thought regarding both packages..

This message is about both of them, so if you are a user of only one of
them, please feel free to ignore the parts talking about the other :)

MAINLY WWWOFFLE:

I'm quite happy about the possibility to search the wwwoffle cache via
htdig, but I have some thoughts about the current implementation and I
find it rather inefficient.

Wouldn't it be better to have some way to hand URL/filename pairs to
htdig, maybe providing some filter to get rid of wwwoffle peculiarities in
the cache files [1]. This would get rid of the overhead to pass every
single cached file through the HTTP protocol (including all the
networking/socket/forking stuff in-between).

I also do not really like the very strong coupling between the two
packages. Having to configure one package (htdig) expecially for the other
(wwwoffle) is not something i like...

As I did not use the htdig CONFIG file (I want to be able to use
htdig independent from wwwoffle as well) provided with wwwoffle, I faced
some problems: No clear way to give the path to the htsearch binary in
wwwoffle-htsearch but editing it (or putting htsearch on the PATH used by
wwwoffled - something I do not want to do)

For the WWWOFFLE side I would also suggest some generic way to provide the
urls for ANY indexing engine via HTTP (and not just for htdig). To
restrict such robots to request new URLs via wwwoffle could be done via a
new configuration section, like

NoNewRequestThroughUserAgents
{
        htdig/*
        webcrawler/*
}

And I would suggest an extension to the minimalistic cgi-bin interface
currently provided by wwwoffle just for the sake of htdig - or its entire
removal and the possibility to just give URLs (or HTML code) to wwwoffle
to interpolate into its pages - maybe via magic variables expanded within
documents served by wwwoffle. This way, one could simply split wwwoffle
and htdig.

HTDIG:

The htdig side of thev game might also need some work: There should be a
way to specify the image_url_prefix at runtime (which should, in theory,
work now as well, but the references to image_url_prefix in htdigs
htcommon/defaults.cc seem to be processed before its use in star_blank and
star_image - which makes it neccessary to either hardcode it the prog
during compile time [2] or to specify every star_blank and star_image in
every htdig config file.

One possible solution would be to have late binding (which seems not to
be the case right now) of variables in defaults.cc (i.e. at usage time of
a config variable). I would also suggest the possibility to have such
variables be interpolated into the documents in the common
directory - this would make it easier to have very generic headers
and footers with no hardcoded urls for the images.
 
****
[1] which would probably correspond to a commandline version of wwwoffles
[2] Which makes it nearly impossible to distribute binary versions of
htdig.

I'm happy to provide you with any clarifications on my thoughts ;-)

Keep up the good work!

peter

--
*  peter stamfest                  +-- i do believe what i say --+
** i got something better than love - how you like me now?        
**                                                          (beck)

---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-request@sdsu.edu containing the single word "unsubscribe" in the body of the message.



This archive was generated by hypermail 2.0b3 on Sun Jan 10 1999 - 16:36:29 PST