Re: [htdig] Index word doc meta tags (toc)


Subject: Re: [htdig] Index word doc meta tags (toc)
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Apr 18 2000 - 21:14:42 PDT


At 4:11 PM +0000 4/18/00, Steve Wambolt3 wrote:
>I have just installed htdig -- plus the parse.pl script to index pdf and
>word documents .. so far it looks great ...
>[snip]
>Example - I have a 50 page worddoc - it has a 3 page table of contents (when
>you reveal code in the word doc you get this) {TOC \o "1-2"} - What I
>would like to be able todo is tell htdig to index ONLY the table of
>content - I guess by passing it the metatag above ????

You don't mention what program you're using to convert the Word
documents, so I'll assume catdoc. I would use this program to convert
one of your documents and take a look to see if there's an easy way
to separate the TOC section from the rest of the document. Then you'd
want to hack the Perl script (I'm guessing from your comments that
you're using parse_doc.pl -- conv_doc.pl or the new doc2html scripts
should work as well) to ignore everything but this.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Apr 18 2000 - 20:14:08 PDT