Gilles Detillieux (email@example.com)
Tue, 23 Feb 1999 13:11:43 -0600 (CST)
According to Brett Baugh:
> Gilles Detillieux wrote:
> > Could you elaborate on this new site? Is it the same OS version as the
> > old site? Same library versions? I'm assuming the same processor and
> > OS on both, or else I wouldn't expect the binaries to work at all on the
> > new site, but what processor and OS are you using? Are you sure the sites
> > are supposed to be binary compatible? Can you try rebuilding the binaries
> > on the problem site?
> I use one single Linux box (dual P-II 233, 128 M ram, 2.0.35 kernel,
> apache 1.2.6 with php2 and php3 modules) to serve about 20 different
> sites (virtual sites). All the sites have everything in common;
> binaries, libraries, OS, the works.
I see. That rules out binary incompatibility, doesn't it! ;-)
> > Sorry to provide a whole lot more questions than answers, but without more
> > details about the environment, we're really working blindly, and can't be
> > of much help. Maybe these questions will lead you to the problem yourself.
> That it did. One of our brilliant production people decided that it
> would benefit this one particular client to have TWO title tags in
> each document - a normal one and then one that just repeated the
> contents of the meta description tag - so it would get more
> preferential treatment in search engines. GAAAAH. It's a good thing
> she doesn't work here anymore... heh.
Oooh! Search engine spamming! You were right earlier when you said
something evil and nasty was happening on that site!
> So I guess now the question
> is... can you tell htdig to only grab the first title tag it sees? I
> suppose taking out the second <title> is an option; I doubt anyone
> would notice at this point... but that's a lot of typing.
You could probably insert something like this at the start of the
switch statment case 0 clause that handles the title tag, at line 390
of htdig/HTML.cc (in version 3.1.1), just before in_title is set to 1:
cout << "More than one <title> tag in document!"
<< " (possible search engine spamming)" << endl;
And again, at the start of the case 1 clause, before resetting in_title
to 0, insert this:
This should make any additional titles be indexed just like regular text.
I haven't tried it, though, so test carefully. Let me know how it goes.
This may be worth including in the next release.
> I still can't believe how long I stared at those doc headers without
> seeing that. I guess my brain just filters out certain things without
> asking after, say, the fifth pot of coffee in a day. Thanks for
> putting up with me...
Glad I could steer you in the right direction.
-- Gilles R. Detillieux E-mail: <firstname.lastname@example.org> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to email@example.com containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Feb 26 1999 - 14:34:12 PST