Subject: Re: [htdig] htdig / Suse 6.2: very long run ?
From: Torsten Neuer (email@example.com)
Date: Wed Apr 26 2000 - 08:55:14 PDT
Geoff Hutchison wrote:
> At 2:15 PM +0300 4/26/00, Peter L. Peres wrote:
> > I's me again ;-) Has anyone tried to index a C/java/C++/ASM source tree
> >using htdig ? Perhaps by placing a list of menemonics and reserved words
> >in the bad word list ?
For C/C++/Java it should be quite easy to write a lex/yacc parser which
eliminates reserved words, operators and other "noise" characters. In
addition, such a parser could globally declared functions and variables
to <H> tags.
There should be some source->html converters somewhere at freshmeat,
already do some nice markup. Either plugging such a converter into the
web-server for converting plain source files on-the-fly or having such
a tool (perhaps with little modifications) generate input for the digger
should be no problem.
> > Is there some support for parsing dvi and ps files ? dvi can be turned
> >into (ugly) text using dvi2ascii and there is a corresponding converter
> >for ps.
> I would check the conv_doc.pl script and plug in a dvi->txt
> converter. I believe it already handles PostScript files nicely.
Perhaps it is easier (and better, although slower) to convert dvi->ps
and use the PostScript feature of conv_doc.pl - dvi2ascii and similar
might lead to some unwanted effects with regards to embedded graphics,
which probably cause a lot of noise in the document database (the
excerpts will contain lots of dashes/vertical bars etc for rulers).
-- InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH Waldhofstraße 14 Tel: +49-4101-403605 D-25474 Ellerbek Fax: +49-4101-403606 E-Mail: firstname.lastname@example.org Internet: http://www.inwise.de
------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Apr 26 2000 - 06:42:31 PDT