Re: [htdig] Meta description tags


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 23 Feb 1999 13:11:43 -0600 (CST)


According to Brett Baugh:
> Gilles Detillieux wrote:
> >
> > Could you elaborate on this new site? Is it the same OS version as the
> > old site? Same library versions? I'm assuming the same processor and
> > OS on both, or else I wouldn't expect the binaries to work at all on the
> > new site, but what processor and OS are you using? Are you sure the sites
> > are supposed to be binary compatible? Can you try rebuilding the binaries
> > on the problem site?
>
> I use one single Linux box (dual P-II 233, 128 M ram, 2.0.35 kernel,
> apache 1.2.6 with php2 and php3 modules) to serve about 20 different
> sites (virtual sites). All the sites have everything in common;
> binaries, libraries, OS, the works.

I see. That rules out binary incompatibility, doesn't it! ;-)

> > Sorry to provide a whole lot more questions than answers, but without more
> > details about the environment, we're really working blindly, and can't be
> > of much help. Maybe these questions will lead you to the problem yourself.
>
> That it did. One of our brilliant production people decided that it
> would benefit this one particular client to have TWO title tags in
> each document - a normal one and then one that just repeated the
> contents of the meta description tag - so it would get more
> preferential treatment in search engines. GAAAAH. It's a good thing
> she doesn't work here anymore... heh.

Oooh! Search engine spamming! You were right earlier when you said
something evil and nasty was happening on that site!

> So I guess now the question
> is... can you tell htdig to only grab the first title tag it sees? I
> suppose taking out the second <title> is an option; I doubt anyone
> would notice at this point... but that's a lot of typing.

You could probably insert something like this at the start of the
switch statment case 0 clause that handles the title tag, at line 390
of htdig/HTML.cc (in version 3.1.1), just before in_title is set to 1:

            if (title.length())
            {
                if (debug)
                    cout << "More than one <title> tag in document!"
                         << " (possible search engine spamming)" << endl;
                break;
            }

And again, at the start of the case 1 clause, before resetting in_title
to 0, insert this:

            if (!in_title)
                break;

This should make any additional titles be indexed just like regular text.
I haven't tried it, though, so test carefully. Let me know how it goes.
This may be worth including in the next release.

> I still can't believe how long I stared at those doc headers without
> seeing that. I guess my brain just filters out certain things without
> asking after, say, the fifth pot of coffee in a day. Thanks for
> putting up with me...

Glad I could steer you in the right direction.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST