Re: [htdig] Problems with GET URLS


Subject: Re: [htdig] Problems with GET URLS
From: Paul Wolstenholme (wolstena@sfu.ca)
Date: Mon Apr 10 2000 - 15:37:15 PDT


Hi Gilles,

I should have posted a follow-up to my problem. I identified at least
three problems. Neither I think a problem with htdig but related to the
co-processor between the chair and keyboard.

1. I was using persistant variables defined in a previous page (PHPLIB).
If the page was accessed directly, the variable does not exist and hence
no content. This was easily fixed and now it works great.

2. There was a meta tag formatting problem. I mistakenly putting two
authors in the same meta-tag.
eg <meta name="DC.Creator" content"Lorimer, Smith">

For some reason only the first author is found during a search, I still
need to double check this but I think it is so. Is it possible that the
comma is mucking things up?

If the proper DC syntax is used:
<meta name="DC.Creator" content"Lorimer">
<meta name="DC.Creator" content"Smith">

The proper results seem to be returned but still testing.

3. You need to be deligent when using GET.
http://my.org/index.php3?a=1&b=2
http://my.org/index.php3?b=2&a=1

While the two yield the same pages, HtDig treats the two as separate
results instead of one. If you want to avoid this, the order of your get
variables needs to be consistant.

As a side note, has anyone hacked Htdig to check if there is a DOI or
PII number assigned to the document. Wouldn't this would be a good way
to identify if the files are the same or not?

Regards,
Paul

Gilles Detillieux wrote:
>
> Hi, Paul. I didn't see any response to this in the archives.
>
> I can't imagine anything within htdig which would explain the behaviour
> you're describing. Rather, it seems to me that perhaps your PHP scripts
> aren't putting out correct DC meta tags for all pages. They should
> look like
>
> <meta name="DC.Creator" content"Lorimer">
>
> Also look around these tags for improperly formed tags which might cause
> htdig to swallow the meta tag as part of another tag, or tags that would
> turn off indexing.
>
> According to Paul Wolstenholme:
> > I'm not sure whether this is a HtDig bug or a glitch in my PHP scripts.
> > I'm trying to use Htdig 3.1.5 to index some PHP pages that get there data
> > from a MySQL database. The data displayed by the page is determined by a
> > couple of GET variables that script uses as input --
> > title.php3?page=1&journal_id=4.
> >
> >
> > I have set up a database that is suppose to search by one meta
> > tag element DC.Creator via:
> >
> > keywords_meta_tag_names: DC.Creator
> >
> > keyword_factor: 100
> > text_factor: 0
> > title_factor: 0
> > heading_factor_1: 0
> > heading_factor_2: 0
> > heading_factor_3: 0
> > heading_factor_4: 0
> > heading_factor_5: 0
> > heading_factor_6: 0
> >
> > The problem is that when I do a search using a known DC.Creator value.
> > Htsearch will only return a result if I enter the value of the first
> > DC.Creator for a particular journal_id. For example, I know that Lorimer
> > is a DC.Creator value for the following page:
> >
> > title.php3?page=1&journal_id=22
> >
> > When I enter this value into my search form, htsearch returns incorrect
> > results. It returns all the page values for a particular journal_id
> > value. htdig appears to think that the folling are equal eventhough the
> > content is different.
> >
> > title.php3?page=1&journal_id=3 is the same as
> > title.php3?page=2&journal_id=3 and
> > title.php3?page=3&journal_id=3 ...
> >
> >
> > If I enter the DC.Creator value for any other page value but page=1 I get
> > no results. Htdig seems to work fine on pages that do not pass any
> > variables via GET.
> >
> > /Paul
> >
> >
> > --
> > ________________________________________________________________
> > Paul Wolstenholme
> > Simon Fraser University
> > Vancouver, BC Canada
> >
> >
> > ------------------------------------
> > To unsubscribe from the htdig mailing list, send a message to
> > htdig-unsubscribe@htdig.org
> > You will receive a message to confirm this.
> >
>
> --
> Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-unsubscribe@htdig.org
> You will receive a message to confirm this.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Apr 10 2000 - 13:22:25 PDT