Hans-Peter Nilsson (email@example.com)
Tue, 9 Feb 1999 00:04:28 +0100
While tracing the whereabouts of some "spuriously deleted"
documents, I came to debug the Postscript::parse() function.
It's just that there's not much to trace -- it immediately
returns (line 56).
Looking in this years mailing lists contents, it seems that
people think that ht://Dig can actually parse PostScript, and
someone posted a problem description about not getting any
output while indexing PostScript documents. Small wonder...
This "disabling" of PostScript parsing predates CVS logs.
Now, if I enable it by removing the "return", everything seems
to work as expected, but debugging output appears; there are
"naked" cout writes (not testing the "debug" flag).
Work "as expected" I say, because all words in PostScript files
are not complete or easily parseable words; often one or two
characters are expressed in ways that the PostScript parser
cannot grok, so a chopped or otherwise munged word is indexed.
See for example <URL:http://egcs.cygnus.com/scheduler.ps>.
This leads me to think that the PostScript parser is not as
complete as needed, and possibly "disabled" for a good reason.
Maybe it should be rewritten, using PDF.cc, or maybe the PDF
parser has the same problems.
I don't know. Maybe someone has some good answers?
If your local_urls documents are stored with a time before era
(1 Jan 1970), they may (linux) have a date older than nothing
(negative date if your time_t is signed), and will not be
indexed. See Document.cc around line 550 (date is zero for
newly encountered documents).
Not that this urgently needs fixing at this level; maybe a debug
output saying "Whoops! You have some really old documents here"
is in order (I may fix).
Hope all systems get a 64-bit time_t -- or at least unsigned --
-- Hans-Peter Nilsson, Axis Communications AB, S - 223 70 LUND, SWEDEN Hans-Peter.Nilsson@axis.se | Tel +46 462701867,2701800 Fax +46 46136130 | RFC 1855 compliance implemented; report loss of brain. ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Tue Feb 09 1999 - 00:31:28 PST