Re: [htdig] Post-processing removal of dups?

Aaron Turner (
Fri, 18 Jun 1999 12:56:06 -0700 (PDT)

Thanks for the ideas Gilles, but unfortunately they won't work since there
is no set 'primary location'. Maybe the best way to explain is with a
filesystem example.

say I have:

/usr/local/bin/somefile which is a link to: /usr/bin/somefile

Joe does:

find /usr -name somefile

This will turn up two matches which are really duplicates since they're
the same file. This is what I want to avoid. I don't care which match it
returns, but it should only return one. If Joe does:

find /usr/local -name somefile

the problem goes away naturally because find restricts itself to
/usr/local and it's sub-tree. This is what I can currently do with

Problem is that if find were told to ignore linked files (or articles that
aren't in their 'primary location') then:

find /usr/local -name somefile

wouldn't return any matches which would be bad.


On Fri, 18 Jun 1999, Gilles Detillieux wrote:

> > According to Aaron Turner: > > On a simular note, I'm having a major delima. Basically I have a SQL DB > > with content that is accessed via PHP. Each "article" in the DB has a URL > > like: > > > > /articles/article.php3?id=x&loc=a.b.c.d > > > > where x, a, b, c, d are postive integers. Basically the id is a unique > > identifier for the article, and loc is the location in the 'tree'. Each > > article can be in 1 or more places in the tree. So: > > > > /articles/article.php3?id=11&loc= > > /articles/article.php3?id=11&loc= > > Here are a couple more ideas. If you can produce a list of locations that > you want to be excluded from searches, you can add them to the list in the > exclude_urls attribute, or put them as disallow records in robots.txt. > > Alternatively, you could change the article.php3 script to add a noindex > tag to its output for any article that's not at it's "primary" location, > i.e. the one where you want it to be for search results.

------------------------------------ To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Jun 18 1999 - 12:15:52 PDT