Re: [htdig] Getting URL names to show up in index.


Subject: Re: [htdig] Getting URL names to show up in index.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Mon Jun 05 2000 - 11:29:58 PDT


According to Geoff Hutchison:
> On Mon, 5 Jun 2000 naughton@domino.danielwoodhead.com wrote:
> > I hope this is a simple one. I am trying to have URL names show up in
> > search results. I have thousands of files that are in the following
> > format:
> >
> > 123456_latest.pdf
> >
> > I would like to get hits on 123456. The following is the way I have the
> > htdig.conf setup
>
> I think what you meant to say is that you want to *search* on parts of a
> filename. (You can already get URL names to show up in search
> results--this is part of the $(URL) variable).
>
> This has been requested a few times, but no one has offered anything in
> terms of implementation. It probably needs something in Retriever.cc after
> it gets through parsing a file to "parse" the URL.
>
> Personally, I'd put the string in your files somewhere (doesn't PDF have a
> "comments" or "keywords" portion). This will also make it easier for other
> search engines or browsers to get the information.

Since PDFs must be converted or parsed with an external converter or
parser, it's a very easy matter to modify the external program to spit
out the file name in addition to the body text and/or title. htdig passes
the full URL to the external converter or parser as its 3rd argument.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Jun 05 2000 - 09:19:43 PDT