Re: htdig: MS Office files -- help indexing them, please!

Tyson Bigler (
Tue, 01 Dec 1998 06:22:58 -0600 (CST)

$TMPDIR is set to a partition w/ ~10Gb of free space. I think I may have
figured out the problem (and a nasty work-around): it seems that GNU sort is
interpreting the sort column of db.wordlist as a command line argument; i.e.
if the line in db.wordlist begins "-something" sort is freaking out! I
simply did a `sed 's?^-??'` on db.wordlist and then moved the new file into
place. htmerge ran successfully. This being said, there's got to be a
better way!!

On a different note, I am having other problems. I know that this is not a
htdig limitation, but the Solaris 2.5.1 machine I'm running this on has a 2Gb
file size limitation. Is there any way to have htdig split into multiple 2Gb
files? I know that I can manually limit things to the point where the
various db's are <2Gb, but that's not really a solution either. I need a
dynamic db! I guess I could move to Solaris 2.6, which doesn't have the 2Gb
limitation. I'd like to hear how other folks have dealt with this problem.
As you can see, I'm indexing a *huge* amount of documents...



On 01-Dec-98 Geoff Hutchison wrote:
> At 4:56 PM -0500 11/24/98, Tyson Bigler wrote:
>>I dl'd the latest snapshot (htdig-3.1.0b3-112298) and I'm using GNU sort,
>>I still get the same 'invalid argument' error from sort... Maybe I need to
>>rebuild the index because I'm trying to merge the same index everytime...
> This shouldn't be a problem. What is the environment variable TMPDIR? (If
> you're using rundig, it should be set in there somewhere.)
> -Geoff Hutchison
> Williams Students Online

M. Tyson Bigler                  SEPTCo Computing Solutions Group
Infrastructure Support           Bellaire Technology Center               3737 Bellaire Blvd., Room 1007B
    713-245-7476                 Houston, TX 77025

---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:43 PST