BOUNCE htdig: Admin request


owner-htdig@sdsu.edu
Mon, 23 Nov 1998 08:47:47 -0800 (PST)


>From andrew@contigo.com Mon Nov 23 08:47:45 1998
Received: from maila.central.susx.ac.uk (maila.central.susx.ac.uk [139.184.14.12])
        by sdsu.edu (8.8.7/8.8.7) with SMTP id IAA03985
        for <htdig@sdsu.edu>; Mon, 23 Nov 1998 08:47:41 -0800 (PST)
Received: from libhub302.lib.susx.ac.uk [139.184.66.12]
        by maila.central.susx.ac.uk with smtp (Exim 1.82 #16)
        id 0zhz8h-0003Zy-00; Mon, 23 Nov 1998 16:47:03 +0000
Date: Mon, 23 Nov 1998 16:47:08 +0000 (GMT)
To: htdig@sdsu.edu
Cc: safe3@susx.ac.uk
Subject: duplicate indexing
X-Mailer: Siren Mail (Windows Version 4.0.2 (Windows 95/NT))
X-Sender: safe3@mailhost.central.sussex.ac.uk
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET="US-ASCII"
Message-Id: <E0zhz8h-0003Zy-00@maila.central.susx.ac.uk>
From: D.P.Birchall@sussex.ac.uk (Danny Birchall)

Hi

I've just joined this list, and I'm asking a question straight away... I
hope somebody can help me, or at least point me in the right direction.

We're running htdig on a server which has a number of symbolic links and
server aliases requested by the owners of individual directories to make
more memorable URLs: eg www.sussex.ac.uk/Units/thisunit/ is aliased (or
linked) to www.sussex.ac.uk/thisunit/. When we run a dig, because of
inevitable inconsistencies in URLs, htdig finds, and indexes, both
paths. A search is then liable to turn up two copies of some documents
in the results, listed one after the other (naturally, because they are
the same document and thus have the same content).

Is there any way of telling htdig to ignore these duplicates? Duplicate
server names can be controlled with the server_aliases directive, but is
there an equivalent for paths? I see that 'Eliminate or detect duplicate
documents' is in the todo list: is this the same as what I am asking, or
is there already a solution to this problem?

TIA for *any* help on this. I promise to be a good citizen and stick
around on the list to answer others' questions when I've learned some
myself...

Danny

--------------------------------------------------
Danny Birchall
Editor
University of Sussex Information Service
http://www.sussex.ac.uk/

D.P.Birchall@sussex.ac.uk
Tel: (0)1273 678745
Fax: (0)1273 678441
---------------------------------------------------



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:50 PST