Re: [htdig] Problems with GET URLS

Subject: Re: [htdig] Problems with GET URLS
From: Gilles Detillieux (
Date: Mon Apr 10 2000 - 15:59:37 PDT

According to Paul Wolstenholme:
> Hi Gilles,
> I should have posted a follow-up to my problem. I identified at least
> three problems. Neither I think a problem with htdig but related to the
> co-processor between the chair and keyboard.

Hadn't heard that one before. :)

> 1. I was using persistant variables defined in a previous page (PHPLIB).
> If the page was accessed directly, the variable does not exist and hence
> no content. This was easily fixed and now it works great.
> 2. There was a meta tag formatting problem. I mistakenly putting two
> authors in the same meta-tag.
> eg <meta name="DC.Creator" content"Lorimer, Smith">
> For some reason only the first author is found during a search, I still
> need to double check this but I think it is so. Is it possible that the
> comma is mucking things up?
> If the proper DC syntax is used:
> <meta name="DC.Creator" content"Lorimer">
> <meta name="DC.Creator" content"Smith">

I don't know what the proper Dublin Core syntax is, but htdig will treat
the two as equivalent. In earlier versions, it allowed commas and white
space as separators for keywords. In the current version, any space or
punctuation is allowed.

> The proper results seem to be returned but still testing.
> 3. You need to be deligent when using GET.
> While the two yield the same pages, HtDig treats the two as separate
> results instead of one. If you want to avoid this, the order of your get
> variables needs to be consistant.

Yes, this is an important point. It tracks visited documents by URL, so
any unique URL is considered another document, even if it gives you the
same page as another URLs.

> As a side note, has anyone hacked Htdig to check if there is a DOI or
> PII number assigned to the document. Wouldn't this would be a good way
> to identify if the files are the same or not?

I never read about DOI or PII numbers here before, so I'd assume no one
had proposed it before. I'm not familiar with these, but if they're a
common standard, then it might make good sense to use them. I don't expect
they'd do the job for everyone who requested duplicate suppression, though.
There's been talk of using MD5 checksums for this purpose. It's on the
to-do list, but I don't know of anyone actively working on it.

> Gilles Detillieux wrote:
> >
> > Hi, Paul. I didn't see any response to this in the archives.
> >
> > I can't imagine anything within htdig which would explain the behaviour
> > you're describing. Rather, it seems to me that perhaps your PHP scripts
> > aren't putting out correct DC meta tags for all pages. They should
> > look like
> >
> > <meta name="DC.Creator" content"Lorimer">

Woops! That should have been content="Lorimer". I forgot the "=" sign.

> >
> > Also look around these tags for improperly formed tags which might cause
> > htdig to swallow the meta tag as part of another tag, or tags that would
> > turn off indexing.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Mon Apr 10 2000 - 13:44:43 PDT