Re: [htdig] htdig indexing using local_url question


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 25 Oct 1999 16:01:49 -0500 (CDT)


According to Stephen Yeoh:
> Even if htdig does not like the https, I'm surprised that it did not parse
> the remaining html files in the specified directory. Can you explain why
> that is not happening?

I could, but the code says it better. See the IsValidURL() function in
htdig/Retriever.cc, lines 628-633:

    if (strstr(u, "/../") || strncmp(u, "http://", 7) != 0)
      {
        if (debug > 2)
          cout << endl <<" Rejected: Not an http or relative link!";
        return FALSE;
      }

This will reject any URL it comes across while parsing the original
document(s) specified in start_urls, if it doesn't begin with "http://".

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Oct 25 1999 - 15:42:38 PDT