This is a partial list of literature relevent to ht://Dig development, including sources on other web search engines, search algorithms, databases, fuzzy searching and other topics. It is by no means a complete bibliography of these topics but should include some good resources.

  1. Agirre, E., K. Gojenola, et al. (1998). ``Towards a single proposal in spelling correction.'' http://xxx.lanl.gov/ps/cmp-lg/9806010.
  2. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text http://theory.stanford.edu/people/raghavan/www7/181.html
  3. Brin, S., J. Davis, et al. (1995). Copy Detection Mechanisms for Digital Documents. ACM SIGMOD Annual Conference, San Francisco, California. http://www-db.stanford.edu/~shiva/copy.pdf.
  4. Brin, S. and L. Page (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference. http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm.
  5. Sergey Brin, Rajeev Motwani, Lawrence Page, Terry Winograd. What can you do with a Web in your pocket? http://www.research.microsoft.com/research/db/debull/98june/webbase.ps
  6. Chakrabarti, S., B. Dom, et al. (1998). Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Seventh International WWW Conference, Brisbane, Australia. http://decweb.ethz.ch/WWW7/1898/com1898.htm.
  7. Cho, J., H. Garcia-Molina, et al. (1998). Efficient Crawling Through URL Ordering. The Seventh Annual International World Wide Web Conference. http://www-db.stanford.edu/~cho/crawler-paper/.
  8. Engineering, U. I. (1998). Why On-Site Searching Stinks. http://world.std.com/~uieweb/searchart.htm.
  9. Fang, M., N. Shivakumar, et al. (1998). Computing Iceberg Queries Efficiently. 1998 International Conference on Very Large Databases (VLDB '98), New York. http://www-db.stanford.edu/~shiva/Pubs/iceberg-full.ps.
  10. Golding, A. R. and Y. Schabes (1996). Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction. The 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA. http://xxx.lanl.gov/ps/cmp-lg/9605037.
  11. Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, Mark Najork. Measuring Index Quality Using Random Walks on the Web http://www8.org/w8-papers/2c-search-discover/measuring/measuring.html
  12. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. The Nineth Annual ACM-SIAM Symposium on Discrete Algorithms. http://simon.cs.cornell.edu/home/kleinber/auth.ps.
  13. Kukich, K. (1992). ``Technique for automatically correcting words in text.'' ACM Computing Surveys 24(4): 377-439. http://www.acm.org/pubs/toc/Abstracts/surveys/146380.html.
  14. Lawrence, S. and C. L. Giles (1998). Searching the World Wide Web. Science. 280: 98-100. http://www.sciencemag.org/cgi/content/full/280/5360/98?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&author1=Lawrence&author2=Giles&searchid=QID_NOT_SET&FIRSTINDEX=.
  15. Manber, U. (1997). ``A Text Compression Scheme that Allows Fast Searching Directly in the Compressed File.'' ACM Transactions on Information Systems 15(2). ftp://ftp.cs.arizona.edu/people/udi/CAS.ps.
  16. Marchiori, M. (1997). The quest for correct information on the web: Hyper search engines. The Sixth International WWW Conference, Santa Clara, USA. http://www6.nttlabs.com/HyperNews/get/PAPER222.html.
  17. Mayfield, J. (1998). Research on N-Grams in Information Retrieval. http://www.cs.umbc.edu/ngram/.
  18. Members of the Clever Project. Hypersearching the Web http://www.sciam.com/1999/0699issue/0699raghavan.html
  19. Muth, R. and U. Manber (1996). Approximate Multiple String Search. Seventh Annual Combinatorial Pattern Matching Symposium, Laguna Beach, CA. ftp://ftp.cs.arizona.edu/people/udi/approx-multi.ps.
  20. Page, L., S. Brin, et al. (1998). ``The PageRank Citation Ranking: Bringing Order to the Web.'' (work in progress). http://google.stanford.edu/~backrub/pageranksub.ps.
  21. Pinkerton, B. (1994). Finding What People Want: Experiences with the WebCrawler. The Second International WWW Conference, Chicago, USA. http://info.webcrawler.com/bp/WWW94.html.
  22. Pollock, J. J. and E. M. Zamora (1984). ``Automatic spelling correction in scientific and scholarly text.'' Communications of the ACM 27(4): 358-368. .
  23. Rapp, R. (1997). Text-Detector. c't: 386. http://www.heise.de/ct/english/9704386/.
  24. Shivakumar, N. and H. Garcia-Molina (1995). SCAM: A Copy Detection Mechanism for Digital Documents. 2nd International Conference in Theory and Practice of Digital Libraries, Austin, Texas. http://www-db.stanford.edu/~shiva/Pubs/scam.pdf.
  25. Shivakumar, N. and H. Garcia-Molina (1996). Building a scalable and accurate copy detection mechanism. First ACM Conference on Digital Libraries, Bethesda, Maryland. .
  26. Shivakumar, N. and H. Garcia-Molina (1998). Finding near-replicas of documents on the web. Workshop on Web Databases. http://www-db.stanford.edu/~shiva/Pubs/web.ps.
  27. Tillman, H. N. (1997). Evaluating quality on the net. Internet Librarian, Monterey, California. http://www.hopetillman.com/findqual.html.
  28. Wu, S. and U. Manber (1992). ``Fast Text Searching Allowing Errors.'' Communications of the ACM 35(October 1992): 83-91. .
  29. Wu, S. and U. Manber (1994). ``A Fast Algorithm for Multi-Pattern Searching.'' . ftp://ftp.cs.arizona.edu/reports/1994/TR94-17.ps.

This is a growing list of various useful RFCs and other standards.


Last modified: $Date: 2001/06/13 14:31:49 $