Re: [htdig] Sort by Date from Meta Tags [patch]


Subject: Re: [htdig] Sort by Date from Meta Tags [patch]
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Apr 11 2000 - 11:00:47 PDT


According to Geoff Hutchison:
> On Tue, 11 Apr 2000, Gilles Detillieux wrote:
> > Adding these capabilities to the 3.1.5 release would take a fair bit more
> > effort, so if you want to try the bleeding edge, and don't mind hacking the
> > code a bit, just wait a little longer and 3.2.0b2 should be released.
>
> Funny this should come up. I actually wrote it for the 3.1.x code first
> because someone asked me to. :-) At the time, I didn't feel it was worth
> putting into the tree for future 3.1.x releases (in part because at the
> time, I didn't think there would be any). I put the code into the 3.2 tree
> because it seemed like a useful feature.
>
> I no longer have the patch to 3.1.x, but I could backport it from the 3.2
> tree pretty easily. IIRC, it was about a 30-line patch. I'll post this to
> the list sometime tomorrow, probably.

Good thing I'm such a pack rat. I found your patch, and some of our
discussions with Mike Grommet. Here's an updated version of your patch,
with Mike's correction added in, as well as a test for the use_doc_date
attribute. I imagine one might need to tweak the default tm_hour setting
in get_time() to avoid having the date in the search results falling on
the previous date, due to timezone offsets. htsearch displays times in
the server's local timezone, while htdig parses all dates in UTC.

This should free you to continue on 3.2.0b2. :-)

On Tue, 6 Apr 1999, Geoff Hutchison wrote:
> On Tue, 6 Apr 1999, mike grommet wrote:
> > My thoughts are to take a meta tag, named something like "Document-date" and
> > a value
> > just like the standard GMT time returned by a web server for a Last
> > Modification
>
> There is already a standard for this, specified by the Dublin Core
> standard. The tag is named "DATE" and has the ISO-8601 format YYYY-MM-DD.
>
> > Would you happen to have this code handy? It would be useful to me at least
>
> Here you go... I should probably make this an option with something like
> 'use_doc_date' when I commit it.

*** htdig/HTML.cc.orig Thu Feb 24 20:29:10 2000
--- htdig/HTML.cc Tue Apr 11 12:58:15 2000
*************** HTML::do_tag(Retriever &retriever, Strin
*** 893,898 ****
--- 893,903 ----
                  {
                      retriever.got_meta_email(transSGML(conf["content"]));
                  }
+ else if (mystrcasecmp(cache, "date") == 0 &&
+ config.Boolean("use_doc_date",0))
+ {
+ retriever.got_time(transSGML(conf["content"]));
+ }
                  else if (mystrcasecmp(cache, "htdig-notification-date") == 0)
                  {
                      retriever.got_meta_notification(transSGML(conf["content"]));
*** htdig/Retriever.cc.orig Thu Feb 24 20:29:10 2000
--- htdig/Retriever.cc Tue Apr 11 12:59:53 2000
*************** Retriever::RetrievedDocument(Document &d
*** 561,566 ****
--- 561,567 ----
      current_ref = ref;
      current_anchor_number = 0;
      current_title = 0;
+ current_time = 0;
      current_head = 0;
      current_meta_dsc = 0;
  
*************** Retriever::RetrievedDocument(Document &d
*** 583,589 ****
      //
      ref->DocHead(current_head);
      ref->DocMetaDsc(current_meta_dsc);
! ref->DocTime(doc.ModTime());
      ref->DocTitle(current_title);
      ref->DocSize(doc.Length());
      ref->DocAccessed(time(0));
--- 584,593 ----
      //
      ref->DocHead(current_head);
      ref->DocMetaDsc(current_meta_dsc);
! if (current_time == 0)
! ref->DocTime(doc.ModTime());
! else
! ref->DocTime(current_time);
      ref->DocTitle(current_title);
      ref->DocSize(doc.Length());
      ref->DocAccessed(time(0));
*************** Retriever::got_title(char *title)
*** 1098,1103 ****
--- 1102,1142 ----
      current_title = title;
  }
  
+ //*****************************************************************************
+ // void Retriever::got_time(char *time)
+ //
+ void
+ Retriever::got_time(char *time)
+ {
+ time_t new_time;
+ struct tm tm;
+
+ tm.tm_hour = 0;
+ tm.tm_min = 0;
+ tm.tm_sec = 0;
+ tm.tm_mon = 0;
+ tm.tm_mday = 1;
+ tm.tm_year = 0;
+
+ if (debug > 1)
+ cout << "\ntime: " << time << endl;
+ //
+ // As defined by the Dublin Core, this should be YYYY-MM-DD
+ // In the future, we'll need to deal with the scheme portion
+ // in case someone picks a different format.
+ //
+ if (mystrptime(time, "%Y-%m-%d", &tm))
+ {
+ #if HAVE_TIMEGM
+ new_time = timegm(&tm);
+ #else
+ new_time = mytimegm(&tm);
+ #endif
+ current_time = new_time;
+ }
+ // If we can't convert it, current_time stays the same and we get
+ // the default--the date returned by the server...
+ }
  
  //*****************************************************************************
  // void Retriever::got_anchor(char *anchor)
*** htdig/Retriever.h.orig Thu Feb 24 20:29:10 2000
--- htdig/Retriever.h Tue Apr 11 12:34:26 2000
*************** public:
*** 51,56 ****
--- 51,57 ----
      void got_word(char *word, int location, int heading);
      void got_href(URL &url, char *description);
      void got_title(char *title);
+ void got_time(char *time);
      void got_head(char *head);
      void got_meta_dsc(char *md);
      void got_anchor(char *anchor);
*************** private:
*** 83,88 ****
--- 84,90 ----
      String current_title;
      String current_head;
      String current_meta_dsc;
+ time_t current_time;
      int current_id;
      DocumentRef *current_ref;
      int current_anchor_number;

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Apr 11 2000 - 08:45:56 PDT