Gilles Detillieux (email@example.com)
Thu, 25 Feb 1999 16:00:36 -0600 (CST)
According to Ismael Olea:
> Gilles Detillieux escribió:
> > Seriously, the main page at that URL does mention it. If you scroll down
> > to the Features section, it says:
> > - Searching of HTML and text files
> > Both HTML documents and plain text files can be searched.
> > Searching of other file types will be supported in future versions.
> htdig can handle sgml files too? And, can it manage meta tags in html
No, I don't think it can handle SGML. I'm not familiar with SGML, but my
understanding is that a lot of its tags are quite different than HTML's.
Also, the http server would likely assign a different content-type to
SGML documents, so htdig won't even attempt to parse them.
Meta tags in HTML are supported by htdig.
> > That's not quite the whole story, though. There is some support for
> > PDF documents right now, if you have acroread (Adobe Acrobat Reader) on
> > your system. Also, with external parsers, you can index a whole lot more.
> This external parsers must be htdig aware or can be unix-like? Where
> can I find they?
> > The parse_doc.pl script in ht://Dig 3.1.1's contrib directory can handle
> Looks very interesting.
External parsers must definitely be htdig aware. Their output must adhere
to the format specified in the documentation. See
for details. The parse_doc.pl script, and its earlier versions as perl
and shell scripts, is the only external parser around that's publically
available, as far as I know. Someone on the list can correct me if I'm
wrong. parse_doc.pl is also a good starting point if you want to set
up an interface between htdig and any number of more Unix-like document
parsers. Any filter that can extract plain text from a document can
easily be plugged into this script, and it handles the generation of
records for htdig.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:13 PST