[htdig] external_parsers


htdig@narwhal.cisti.nrc.ca
Mon, 4 Oct 1999 12:28:45 -0400 (EDT)


Hi,

I'd like to customize the output from htsearch for pdf's to provide a
field for the names of the author(s) of the contents of the pdf file, and
a link to an HTML abstract.

According to the documentation on external_parsers, there is no field for
the author of the contents, although there are other pertinent fields
like title, words, etc. I can obtain the text for the names of the
author(s) of the contents from a corresponding SGML file located on a
seperate machine from the machine that runs htdig. The URL for the HTML is
based on the URL of the pdf document.

So my main question is what would be the easiest way to add a few fields
to the database? If it's not easy, what would be the best kludge so that
HTML is preserved when data is added and retrieved to and from the field?

In other words, if adding a field is too complicated, what field would be
best to use keeping in mind that some HTML data will be added (for look
and feel) in order to customize the results. The reason I want to play
with the fields is because the SGML files and htdig are on two
seperate machines, and I'd rather the digging take longer than the user
having to wait 20 or more seconds for the SGML parsing, and the http
requests.

Pat

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Oct 04 1999 - 09:29:28 PDT