Torsten Neuer (firstname.lastname@example.org)
Tue, 6 Jul 1999 10:14:22 +0200
According to kimsg:
>I'm using HTDig 3.1.2 for NT version and I develop external parser of HTDig
>under Windwos NT environment.
>I develop that external parser for NT vesion is Windows console application
>but I have met interation with external parser and HTDig is not simple. So I
>have to modify ExternalParser.cc.
>My proposal and question.
>1. How about to change parse logic of ExternalParser.cc into Plaintext.cc.
> - Get external parser in htdig.conf
> - Excute this program and get temp text file.
> - Goto Plaintext parser.
Plaintext has restrictions which IMO forbid your method of working with an
external parser. This is because an external document may have hyperlinks
to other documents and also HTML docs. Plain text does not have any of
these, neither does it have headings or titles. Therefore, using your
approach a lot of information could get lost either by not reaching it
when digging (i.e. not being able to follow hyperlinks) or having bad
search results resulting from the unavailability of titles and headers.
I think that the interaction between ht://Dig and an external parser is
fairly easy and straight forward, btw., so what is your problem?
If you're able to convert a document into plain text, then about 95% of
writing the external parser is already done ;-)
>2. How do display excerpt in case external document.
I haven't worked much with external docs, but AFAIK all excerpts are
taken from the doc.db, so nobody has to care about that once a doc has
-- InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH Waldhofstraße 14 Tel: +49-4101-403605 D-25474 Ellerbek Fax: +49-4101-403606 E-Mail: email@example.com Internet: http://www.inwise.de
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Jul 06 1999 - 00:45:43 PDT