Re: [htdig] Questions about what's possible with ht://Dig...


Geoff Hutchison (ghutchis@wso.williams.edu)
Tue, 06 Jul 1999 15:39:47 -0400


Albert Lunde wrote:
> From what I've read so far ht://Dig seems like a pretty flexible spider;
> which could be configured to spider remote systems, or to access the local
> server directly.

Yes on both counts.

> It sounds like http://www.htdig.org/files/contrib/scripts/multidig.tar.gz
> might be useful for running a series of indexes on various servers.

That is it's intent. It also makes merging indexes fairly easy.

> (1) Is the only way to deal with queries across multiple indexes to combine
> the indexes with htmerge, or is there a way to query more than one index
> and aggregate the results?

There is currently not any was to aggregate indices. You must merge
them.

> (2) Can your data files be copied between systems (e.g. doing local
> indexing on one server, then copying with ftp or scp to another server for
> merging or searching)? I can think of several sorts of issues:
> - absolute path names
> - byte order or floating point across archtectures

Pathnames are never used in the databases. Byte order is a snag,
however, but you can use the standard Berkeley DB tools to dump the
database to a text file, then reload it on another machine. Not elegant,
but it works.

> (2) Is there a way to index all the HTML files in a directory tree,
> regardless of how they are linked, (or some other arbitrary list of files
> on the local system)?

Nope. It follows links, so if there isn't a link to it, it won't find
it. This way you can actually have "private" directories that aren't
indexed...

> (3) Is it feasible to use the ht://Dig spider with some different search
> and index software?

I don't see how you'd use it with a different indexer. The spider is the
indexer. Searching might be possible, but you're probably much better
off writing a wrapper in Perl, PHP or something else--I don't know of
anyone else who reads the ht://Dig database format.

> I guess the last two questions depend on what the interface is between the
> spider and indexing software: to what extent it is exported in a form that
> external software could be added or to what extent the whole package is too
> interconnected to pick apart.

You can add external parsers for different file formats. In 3.2, you'll
be able to add external transport protocol helpers, and hopefully
external "decoders" for decompression, decryption, etc. There are also
any number of wrappers in Perl and PHP, several of which may be found on
the contrib/ section of the website.

> If you'd care to comment on the pros or cons of any of this, I'd be interested.

There are obviously a number of search tools sites comparing these
products, including http://www.searchtools.com/

-- 
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Tue Jul 06 1999 - 11:58:24 PDT