Re: htdig: MS Office files -- help indexing them, please!

Geoff Hutchison (
Tue, 24 Nov 1998 14:21:55 -0500

At 11:40 AM -0500 11/24/98, Tyson Bigler wrote:
>powerpoint). Does anyone have an external parser for me??!! My peers keep
>telling me that AltaVista has all of these "filters" (aka parsers), but I
>haven't seen/used them...

You never know--you might be able to "steal" the AltaVista filters. I don't
know the details of their filters, but if they use external programs too,
you can use those (at least as examples). If not, I suggest looking in
something like Yahoo for a PowerPoint -> HTML converter.

>I am also having difficulty with htmerge on a fairly large (and it will only
>grow larger) index. The specific error seems to be coming from the sort
>command. When using the standard sort included with Solaris 2.5.1 I get:

Use GNU sort. The sort program on Solaris seems to have some nasty bugs
like this.
>sort: can't create /home/atlantis8/bigler/stmAAAa00598/a: Not a directory
>htmerge: Word sort failed

># htmerge -c conf/unix.conf -v -s
>htmerge: Sorting...
>/home/atlantis3/bigler/opt/bin/sort: read error: Invalid argument
>htmerge: Word sort failed

This may be a bug from me. Try the latest snapshot from (if there's a great need, I'll
"bless" something soon as a beta, even though there are some unresolved

>Any help would be *greatly* appreciated. I had rather not go the other
>direction and be forced into AltaVista.... ;-D And I'd like to deliver a
>solution way ahead of the "other guy". ;-D

Sounds good to me. :-)

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:52 PST