Subject: Re: [htdig3-dev] Redirects still counted as documents?
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Thu Oct 26 2000 - 10:41:54 PDT
According to Patrick:
> Could someone give me a clue as to how to make sure that the
> document counter during a dig is not incremented when encountering
> a server redirect (HTTP code 301/302)? I'm in Retriever.cc
> and it looks like the total index size is incremented in
> the routine Retriever::Start(), but redirect handling is in the
> routine Retriever::parse_url, and I'm not certain as to how
> to make the total index size NOT increase by one in the first
> routine (by seeing that it's a redirect) or how to DECREMENT the
> total index size by one in the second routine.
I don't understand why you feel this is a problem. The way the
databases are structured, in both 3.1.x and 3.2.x (despite their profound
differences), you must have a unique document ID for each URL. Therefore,
because the redirect gives htdig a new URL, and the actual document
must be indexed using this URL, htdig must assign it a new document ID
and give it a separate record in the database. The old record for the
pre-redirect URL will get tossed out of the database by htmerge/htpurge,
and the total index size should be corrected at that point.
The only potential problem I can see in this method of doing things is
that some of the link description text can be lost. At the time the
redirect is encountered, htdig transfers all stored descriptions from the
record of the old URL to that of the new URL, but if it later encounters
links to the old URL, I think it will continue to append the new link
descriptions to the old record instead of the new. If this is the problem
you're trying to solve, then the solution might be to put a record of
the redirect in the old URL's DocumentRef record. I think trying to use
the same record for both the old and the new URL is asking for trouble.
If this isn't the problem you're trying to solve, could you be more
explicit as to what your goal is?
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Oct 26 2000 - 10:47:53 PDT