htdig: patch: search log


jmoore (jmoore@sober.com)
Wed, 22 Apr 1998 22:10:33 -0400 (EDT)


Hi,

Maybe it was the search engine backdoors (e.g.
http://www.metaspy.com/spymagic/Spy?filter=false) that got me on this
train of thought, but I thought it would be useful and interesting to
examine how users actually _use_ search engines. Examining GET requests
in the server logs didn't have enough info, so I added a search logging
feature to htsearch.

To enable it, add the following to your index .conf file:

log_file: /path/to/the/index.log

It logs each search request in the following format:

[timestamp] - [remote host] (and|or|boolean) [user entered keywords]
[formatted keywords] ([total results]/[results per page]) - [page
number]\n

e.g.
Wed Apr 22 20:54:02 1998 - 100.1.1.100 (or) [linux dos] [linux or dos]
(28/10) - 1

NOTES:

- there is code for file locking (flock), but it isn't implemented. I'm
not sure what is supported on the most platforms. Please let me know if
you have any thoughts on this.

- storing all the search info in a hash/dictionary and then passing that
reference to the logSearch() function would probably be cleaner than
passing 5 args.

- tested on htdig-3.0.8b2 linux 2.0.33 i686

btw, The whole idea of sending patches to a mailing list is a little new
to me, please let me know if there's anything i should do differently.

enjoy!

:jason

jmoore@sober.com

--- htsearch.cc.orig Fri Aug 15 01:59:46 1997
+++ htsearch.cc Wed Apr 22 21:34:49 1998
@@ -42,6 +42,10 @@
 #include <ctype.h>
 #include <signal.h>
 
+#include <sys/file.h>
+#include <sys/stat.h>
+//#include <fcntl.h>
+
 
 typedef void (*SIGNAL_HANDLER) (...);
 
@@ -52,6 +56,7 @@
 void convertToBoolean(List &words);
 void doFuzzy(WeightWord *, List &, List &);
 void addRequiredWords(List &, StringList &);
+void logSearch(cgi &, Configuration &, String &, ResultList *, int);
 
 int debug = 0;
 int minimum_word_length = 3;
@@ -198,6 +203,13 @@
                          doc_db.get()));
     }
 
+ String log_file = config["log_file"];
+ if (log_file.length() != 0)
+ {
+ // log this search info
+ logSearch(input, config, logicalWords, results, pageNumber);
+ }
+
     Display display(index, doc_db);
     display.setResults(results);
     display.setSearchWords(&searchWords);
@@ -628,3 +640,71 @@
     cout << "<pre>\n" << msg << "\n</pre>\n</body></html>\n";
     exit(1);
 }
+
+
+//*****************************************************************************
+// Log the search info to the search log
+//
+// Each log entry is one line. The format is:
+// [timestamp] - [REMOTE_HOST] (and|or|boolean) [user entered terms] [formatted terms]
+// ([total hits]/[hits per page] - [page number])\n
+//
+
+void
+logSearch(cgi &input, Configuration &config, String &logicalWords, ResultList *results, int page)
+{
+ FILE *search_log;
+ int f_log;
+ char *path;
+ char *timestr;
+ char *env_host;
+ time_t t;
+
+ String words = input["words"]; // user input
+
+ // char *index = config["config"];
+ char *log = config["log_file"];
+
+ static mode_t f_mode = ( O_WRONLY | O_CREAT | O_APPEND ); // file
+ static mode_t p_mode = ( S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH ); // permissions
+
+ t = time(NULL);
+ timestr = ctime(&t);
+ timestr[strlen(timestr) - 1] = '\0'; // overwrite newline
+ env_host = getenv("REMOTE_HOST"); // set by http daemon
+
+ if ((f_log = open(log, f_mode, p_mode)) < 0)
+ {
+ // unable to open
+ return;
+ }
+
+/****** TODO: file locking
+// Q: How does file locking work on non-linux platforms?
+
+ int lock = 0;
+ for (int i=0; i<5;i++)
+ {
+ if (flock(f_log, (LOCK_EX | LOCK_NB)) == 0)
+ {
+ lock = 1;
+ break;
+ }
+ }
+ if (lock == 0)
+ return;
+
+*******/
+
+ search_log = fdopen(f_log, "a");
+
+ fprintf(search_log, "%s - %s (%s) [%s] [%s] (%d/%s) - %d\n",
+ timestr, env_host, config["match_method"],
+ words.get(), logicalWords.get(),
+ results->Count(), config["matches_per_page"],
+ page
+ );
+
+ fclose(search_log);
+}
+

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:03 PST