htdig: Patch for external_parser in attrs.html: Corrected and extended documentation


Hans-Peter Nilsson (hans-peter.nilsson@axis.com)
Tue, 5 Jan 1999 01:27:50 +0100


The current description of external parsers is wrong; they do
*not* take input on stdin; see htdig3/htdig/ExternalParser.cc

Here's an update. I also changed one of the examples to show
how parameters can be passed.

It should also be noted that the "u" field should specify a
complete, non-relative URL. Maybe this is a bug, since the "i"
field can be relative. The safe way to go here IMHO is to
update the documentation, *then* perhaps fix the code; here we go.

No empty fields are allowed. Think strtok ("\t\t","\t") or try
it yourself; you'll get an "external parser error".

There's also a random typo fix for "second string [of] each
pair" on the first line.

htdoc/ChangeLog:
Thu Jan 5 00:47:22 1998 Hans-Peter Nilsson <hp@axis.se>

        * attrs.html: Correct and add more verbose description of external
        parser program parameters and fields.

Index: attrs.html
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdoc/attrs.html,v
retrieving revision 1.9
diff -p -c -r1.9 attrs.html
*** attrs.html 1998/12/13 05:44:54 1.9
--- attrs.html 1999/01/05 00:25:01
***************
*** 1208,1220 ****
               The external parsers are specified as pairs of
              strings. The first string of each pair is the
              content-type that the parser can handle while the
! second string each pair is the path to the external
! parsing program. The parsing program will get the
! document to be parsed on its standard input and it is
! to write information for htdig on its standard
! output.<br>
               The output consists of records, each record terminated
! with a newline. Each record is a series of tab
              separated fields. The first field is a single character
              that specifies the record type. The rest of the fields
              are determined by the record type.
--- 1208,1281 ----
               The external parsers are specified as pairs of
              strings. The first string of each pair is the
              content-type that the parser can handle while the
! second string of each pair is the path to the external
! parsing program. If quoted, it may contain parameters,
! separated by spaces.<p>
! The parser program takes four command-line
! parameters, not counting parameters and parameters
! given in the command string:<br>
! <em>infile content-type URL configuration-file</em><br>
! <table border="1">
! <tr>
! <th>
! Parameter
! </th>
! <th>
! Description
! </th>
! <th>
! Example
! </th>
! </tr>
! <tr>
! <td valign="top">
! infile
! </td>
! <td>
! A temporary file with the contents to be parsed.
! </td>
! <td>
! /var/tmp/htdext.14242
! </td>
! </tr>
! <tr>
! <td valign="top">
! content-type
! </td>
! <td>
! The MIME-type of the contents.
! </td>
! <td>
! text/html
! </td>
! </tr>
! <tr>
! <td valign="top">
! URL
! </td>
! <td>
! The URL of the contents.
! </td>
! <td>
! http://www.htdig.org/attrs.html
! </td>
! </tr>
! <tr>
! <td valign="top">
! configuration-file
! </td>
! <td>
! The configuration-file in effect.
! </td>
! <td>
! /etc/htdig/htdig.conf
! </td>
! </tr>
! </table><p>
! The external parser is to write information for
! htdig on its standard output.<br>
               The output consists of records, each record terminated
! with a newline. Each record is a series of non-empty tab
              separated fields. The first field is a single character
              that specifies the record type. The rest of the fields
              are determined by the record type.
***************
*** 1340,1346 ****
                  </td>
                  <td>
                    A hyperlink to another document that is
! referenced by the current document.
                  </td>
                </tr>
                <tr>
--- 1401,1409 ----
                  </td>
                  <td>
                    A hyperlink to another document that is
! referenced by the current document. It must be
! complete and non-relative, using the URL parameter to
! resolve any relative references found in the document.
                  </td>
                </tr>
                <tr>
***************
*** 1409,1415 ****
            </dt>
            <dd>
              external_parsers: text/html /usr/local/bin/htmlparser
! application/ms-word /usr/local/bin/mswordparser
            </dd>
          </dl>
        </dd>
--- 1472,1478 ----
            </dt>
            <dd>
              external_parsers: text/html /usr/local/bin/htmlparser
! application/ms-word "/usr/local/bin/mswordparser -w"
            </dd>
          </dl>
        </dd>

brgds, H-P
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Tue Jan 05 1999 - 12:42:14 PST