Re: [htdig] Post-processing removal of dups?

Torsten Neuer (
Fri, 18 Jun 1999 21:49:49 +0200

According to Gilles Detillieux:
>According to Aaron Turner:
>> On a simular note, I'm having a major delima. Basically I have a SQL DB
>> with content that is accessed via PHP. Each "article" in the DB has a URL
>> like:
>> /articles/article.php3?id=x&loc=a.b.c.d
>> where x, a, b, c, d are postive integers. Basically the id is a unique
>> identifier for the article, and loc is the location in the 'tree'. Each
>> article can be in 1 or more places in the tree. So:
>> /articles/article.php3?id=11&loc=
>> /articles/article.php3?id=11&loc=
>Here are a couple more ideas. If you can produce a list of locations that
>you want to be excluded from searches, you can add them to the list in the
>exclude_urls attribute, or put them as disallow records in robots.txt.
>Alternatively, you could change the article.php3 script to add a noindex
>tag to its output for any article that's not at it's "primary" location,
>i.e. the one where you want it to be for search results.

The latter is probably the best solution for the problem since it
also keeps down the size of the database and thus reduces reponse
times for queries.

The first solution would be best if regular expressions could be
used in the exclude_urls directive of the configuration file.

just my 2cc,

InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail:            Internet:

------------------------------------ To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Jun 18 1999 - 12:13:37 PDT