Subject: Re: [htdig] Request for new htdig META property: htdig-description
From: Patrick Jennings (email@example.com)
Date: Fri Apr 07 2000 - 15:16:19 PDT
> > description_meta_tag_names: htdig-description description
> > which generates the behaviour:
> > if no htdig-description then look for description,
> > if no description then automatically generated
> > excerpt
> > description_meta_tag_names: htdig-description
> > if no htdig-description, then automatically generated
> > excerpt.
> > This is a more elegant and configurable implementation. I like
> > It'll just take a bit more code to implement.
> The precedence aspect would be more difficult to implement. The
> code parses HTML in one pass, and deals with tags as they occur, so
> would require storing the description tags until you find the
> precedence one, then ignoring subsequent tags, or something like
> and only indexing the words in the tag after you know you have the
> one. I guess that makes sense in any case, as you probably don't
> to index more than one of these, which is what your quick fix does.
Yep. But much of the code could live in the meta_dsc object, with
very little structural change resulting. The HTML.do_tag method would
contain lines like:
if (conf["name"] && conf["content"])
if (meta_dsc.listed(cache)) then // if type string matches one
meta_dsc.save_data(cache, content) // write only if new
data, or higher priority
case ("/head" pattern) // "/head" added to tags.Pattern, I image
if (meta_dsc.found) // if we called meta_dsc.save_data at
meta_dsc.done // let the object know this is the final
add description.text to the index
Or something like that.
Fortunately, in my case I want the pages with htdig-descriptions to
show up first in a search. So the quick-fix side-effect of adding
both sets of descriptions to the index increases keyword repetition.
Not a bug, it's a feature!
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Apr 07 2000 - 13:01:09 PDT