Re: htdig: meta robots tag problem


Geoff Hutchison (Geoffrey.R.Hutchison@williams.edu)
Fri, 25 Sep 1998 15:48:51 -0400


>However, htsearch doesn't report the pages with this meta tag (I
>assume they were not indexed, right?). Is this htdig bug, or
>mistake on my part? What is a correct way of achieving the
>desire effect?

So for all those afflicted with this bug (probably many of us), I
apologize. I made a big thinko with differing versions of my META robots
patches. Below is a patch to HTML.cc with a variety of fixes, including the
META robots behavior. I hope it will fix the problems intended. :-)
Hopefully this will also fix the error seen in this file under gcc-2.7.2.
Since I don't have this compiler around anymore, please let me know if it
disappears.

I'll point out that these kinds of fixes go into the CVS tree all the time.
If you don't mind living on the edge, you might consider the public CVS
tree. For instructions, check out
<http://dev.htdig.org/cvsinstructions.html>

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

*** htdig3.dev/htdig/HTML.cc Wed Sep 23 14:55:44 1998
--- htdig-3.1.0b1/htdig/HTML.cc Tue Sep 8 15:12:30 1998
***************
*** 4,25 ****
  // Implementation of HTML
  //
  // $Log: HTML.cc,v $
- // Revision 1.13 1998/09/23 14:58:21 ghutchis
- //
- // Many, many bug fixes
- //
- // Revision 1.12 1998/09/18 18:45:55 ghutchis
- //
- // YABF (Yet another bug fix)
- //
- // Revision 1.11 1998/09/18 02:38:08 ghutchis
- //
- // Bug fixes for 3.1.0b2
- //
- // Revision 1.10 1998/09/10 04:16:25 ghutchis
- //
- // More bug fixes.
- //
  // Revision 1.9 1998/09/08 03:29:09 ghutchis
  //
  // Clean up for 3.1.0b1.
--- 4,9 ----
***************
*** 52,58 ****
  //
  //
  #if RELEASE
! static char RCSid[] = "$Id: HTML.cc,v 1.13 1998/09/23 14:58:21 ghutchis
Exp $";
  #endif

  #include "htdig.h"
--- 36,42 ----
  //
  //
  #if RELEASE
! static char RCSid[] = "$Id: HTML.cc,v 1.9 1998/09/08 03:29:09 ghutchis
Exp $";
  #endif

  #include "htdig.h"
*************** HTML::do_tag(Retriever &retriever, Strin
*** 622,628 ****
                  {
                      if (strlen(w) >= minimumWordLength)
                          retriever.got_word(w, 1, 10);
! w = strtok(0, " ,\t\r\n");
                  }
              }

--- 606,612 ----
                  {
                      if (strlen(w) >= minimumWordLength)
                          retriever.got_word(w, 1, 10);
! w = strtok(0, " \t\r\n");
                  }
              }

*************** HTML::do_tag(Retriever &retriever, Strin
*** 642,648 ****
                      {
                          if (strlen(w) >= minimumWordLength)
                              retriever.got_word(w, 1, 10);
! w = strtok(0, " ,\t\r\n");
                      }
                  }
                  else if (mystrcasecmp(cache, "htdig-email") == 0)
--- 626,632 ----
                      {
                          if (strlen(w) >= minimumWordLength)
                              retriever.got_word(w, 1, 10);
! w = strtok(0, " \t\r\n");
                      }
                  }
                  else if (mystrcasecmp(cache, "htdig-email") == 0)
*************** HTML::do_tag(Retriever &retriever, Strin
*** 668,681 ****
                    {
                      String content_cache = conf["content"];

! if (content_cache.indexOf("noindex") != -1)
                        {
                          doindex = 0;
                          retriever.got_noindex();
                        }
! else if (content_cache.indexOf("nofollow") != -1)
                        dofollow = 0;
! else if (content_cache.indexOf("none") != -1)
                        {
                          doindex = 0;
                          dofollow = 0;
--- 652,665 ----
                    {
                      String content_cache = conf["content"];

! if (content_cache.indexOf("noindex") != 0)
                        {
                          doindex = 0;
                          retriever.got_noindex();
                        }
! else if (content_cache.indexOf("nofollow") != 0)
                        dofollow = 0;
! else if (content_cache.indexOf("none") != 0)
                        {
                          doindex = 0;
                          dofollow = 0;
*************** HTML::do_tag(Retriever &retriever, Strin
*** 690,699 ****
                      //
                      meta_dsc = conf["content"];
                      if (meta_dsc.length() > max_meta_description_length)
! {
! String temp = meta_dsc.sub(0,
max_meta_description_length);
! meta_dsc = temp;
! }
                      if (debug > 1)
                        cout << "META Description: " << conf["content"] << endl;
                      retriever.got_meta_dsc(meta_dsc);
--- 674,680 ----
                      //
                      meta_dsc = conf["content"];
                      if (meta_dsc.length() > max_meta_description_length)
! meta_dsc = meta_dsc.sub(0, max_meta_description_length);
                      if (debug > 1)
                        cout << "META Description: " << conf["content"] << endl;
                      retriever.got_meta_dsc(meta_dsc);

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:52 PST