Michael J. Long (email@example.com)
Tue, 30 Sep 1997 09:22:20 -0400
> if i got the problem right.
> you want htdig be able to parse eg. gif files and extract the
> relevant information to store in its datbae?
> this you want to have done by a somehow seperate module?
Yep. Looking at the source code, this is what is done now.
There is a seperate module for text and another for postscript.
For the GIF files, a module could index the comments in the
GIF. Of course, I know very few people that actually use that
feature of GIF.
> that would be really great!!!!!!
I know. A very intelligent move by the htdig folks.
> so what about the aproach to put all parsers into a dynamic
> loadable library?
What if each module would be a seperate dynamic library that
htdig would load at startup? Sort of like how Netscape deals
with plug-ins. For those of you not familiar, NS searches
a directory path and queries all plug-ins in those paths for
the MIME type that they handle. NS then registers the plug-in
to handle any file that matches that MIME type.
This would work perfectly for htdig. That way, instead of
having to alter the source code (Document::getParsable) every
time a new module is created, htdig could get this information
dynamically from the "plug-ins" themselves.
I will be more than happy to help adding this functionality.
> this library would have to export a certain set of funktions
> (eg getTitle(file) ). if the requestet type of information can
> be extracted by the parser, the library deliveres it.
Sounds like Markus is describing the exact same thing.
> one would have to supply an additional configuration directive
> to associate a file type with a library like mime types.
Well, maybe he is thinking similar but not the same. :^)
> the most generous approach would be using the nss interface of
> glibc2 that would make it a service available to any software
> on that machine.
Unfortunately, I am not familiar with nss. Could anyone guide
me to some documentation/information??
> everybody in the world would have to supply such a library for
> his file formats :) that would cause adobe a lot of work.
If life were perfect, I would agree, but we all know what life
is, or rather is not.
Cause Adobe a lot of work?? I doubt it. They most likely already
have a PDF (or PS, or Frame, or Pagemaker, etc.) to text converter
written. It would just be a matter of passing that text to the
text parser that is already included with htdig.
While I am thinking of it, I am thinking about writing a module
for Frame files and I need a little (theoretical) help. What I
was thinking of doing was calling an external program (fmbatch)
to convert the Frame file to text and then have the text parser
parse the text file. That way, I can avoid the work of having to
decode the Frame file and I also wouldn't have to write a text
parser all over again. Is this possible??
Michael J. Long
P.S. Thanks for such a great tool. All we need for it is a
-- * Michael J. Long * #include <disclaimer.h> * Summa Four * Work: mjlong@Summa4.COM * Manchester, NH * Play: firstname.lastname@example.org ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the body of the message.
This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:25:05 PST