[htdig] htdig+wwwoffle=knowledge base tool

Subject: [htdig] htdig+wwwoffle=knowledge base tool
From: Giancarlo (ping@alt.it)
Date: Tue Nov 23 1999 - 03:30:43 PST


I've discovered some interesting usage of htdig in conjuction with

Some info first:
wwwoffle is a caching-proxy/offline browser. It lets you surf the web
and keep copies of it in a private or public cache.

Late versions suggest how you can index your wwwoffle proxy cache with
htdig and make searches on it.

What's interesting is that Multiple istances of the wwwoffle daemon can
be started, each on its own port and with its config file. So there can
be more than one collecting proxy alive. Moving hosts' caches from one
'collection' to another is simply movin a directory.

The htdig indexing is done pointing it to (any particuler instance of)
wwwoffle as a proxy, so the retrieveing/digging is done from the proxy

Apart from these well documented features, I have seen through at least
a couple interesting usages of these tools.

1) keep a humanly selected (better eh?), incremental knowledge base: by
doing htdig updates instead of 'initial' indexing on a wwwoffle cache,
you can mantain a trace of all the words you've seen withouth actually
keeping the docs in cache. ALso preparing 'clean' or 'selected' caches
for indexing is easy. Single host data is kept in separate directory, so
moving them in and out the cache area or from one cache collection to
the other is a snap.

2) the second usage is more of a collaborative tool. Different people,
each with their own cache collection(s), can select data upon which to
build their indexes and then merge them into a wider knowledge base (a
group, a redaction etc). Or you can feed yor external search engine's
document base with selected urls you personally visioned.


