Reading List
EE 380L - A Practicum in Data Mining
Spring 2002

Notices

Paper Selection Policy


I. General Reading (but not for class presentations)

  1. Information retrieval on the web
    Mei Kobayashi and Koichi Takeda
    ACM Computing Surveys, vol.32, no.2, 144-173, 2000
  2. Data mining for hypertext: A tutorial survey
    S. Chakrabarti
    ACM SIGKDD Explorations, 1(2), 1-11, 2000
  3. Authoritative sources in a hyperlinked environment
    Jon Kleinberg.
    Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.
  4. The PageRank Citation Ranking: Bringing Order to the Web
    Lawrence Page, Sergy Brin, Rajeev Motwani, and Terry Winograd
  5. The Web as a graph
    S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal
    Proceedings of the 19th ACM Symposium on Principles of Database Systems, pp 1-10, 2000.
  6. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
    Soumen Chakrabarti, Martin van den Berg, Byron Dom, WWW8
  7. Text-Learning and Related Intelligent Agents: A Survey
    Dunja Mladenic, IEEE Intelligent Systems, July/August 1999
  8. Impact of Similarity Measures on Web-page Clustering
    A.Strehl, J. Ghosh and R. Mooney
    Proc. AAAI workshop on AI for Web Search, K. Bollacker (Ed)
    TR WS-00-01, AAAI Press, July 2000, pp. 58-64
  9. Web Mining Research:  A Survey
    R. Kosala and H. Blockeel
    SIGKDD Explorations, June 2000. Volume 2, Issue 1
  10. Data Preparation for Mining World Wide Web Browsing Patterns
    Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava
    Knowledge and Information Systems, V1(1), 1999
  11. An Internet-enabled Knowledge Discovery Process
    by Alex Buchner, et. al., MINEit Software Ltd., 1999

II. Hyperlinks

  1. Improved Algorithms for Topic Distillation in a Hyperlinked Environment
    Krishna Bharat and Monika Henzinger
    Research and Development in Information Retrieval, 104-111, 1998
    Lorenzo Thione 2002-02-13
  2. Stable Methods for Link Analysis
    A. Ng, A. Zheng, M. Jordan. SIGIR-01
    Sreangsu Acharyya 2002-02-13
  3. Random Walks with "Back Buttons"
    Ronald Fagin et. al., Proc. 2000 ACM Symposium on Theory of Computation
  4. Stochastic models for the Web graph
    R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal
    Proc. of the 41th IEEE Symp. on Foundations of Computer Science. 2000

5.      Link Prediction and Path Analysis Using Markov Chains
Ramesh Sarukkai, WWW9

  1. Trawling the web for emerging cyber-communities,
    Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, WWW8

III. Information Retrieval

1.      Learning Approaches for Detecting and Tracking News Events
Y. Yang et. al., IEEE Intelligent Systems, 14(4):32--43, 1999

2.      Translingual Information Retrieval: A Comparative Evaluation
J. G. Carbonell et. al., IJCAI-97

3.      Latent Semantic Kernels
Nello Cristianini and Huma Lodhi and John Shawe-Taylor
Journal of Intelligent Information Systems, 18:2/3, 127-152, 2002
Rasmus Pederson 2002-02-18

   (a) Document Classification

1.      Improving Short-Text Classification using Unlabeled Background Knowledge to Assess Document Similarity
Sarah Zelikovitz and Haym Hirsh
Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000).

2.      Improving Text Categorization Methods for Event Tracking
Y. Yang, T. Ault, T. Pierce & C. Lattimer. SIGIR-01.

Suju Rajan 2002-02-27

3.      Transductive Inference for Text Classification using Support Vector Machines. 
T. Joachims. ICML-99.
Rasmus Pederson 2002-03-06

4.      Text Classification in a Hierarchical Mixture Model for Small Training Sets
Kristina Toutanova, Francine Chen, Kris Popat, and Thomas Hofmann
Proceedings of the Tenth International ACM Conference on Information and Knowledge Management, CIKM 2001

5.      Text Classification by Bootstrapping with Keywords, EM and Shrinkage
Andrew MacCallum and Kamal Nigam
ACL Workshop for Unsupervised Learning in Natural Language Processing, 1999
Lorenzo Thione 2002-02-27

6.      A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
Alexei Vinokourov and Mark Girolami
BUBL journals: Information Processing and Management, 2002
Adrian Agogino 2002-02-27

   (b) Document Clustering

  1. On Feature Distributional Clustering for Text Categorization
    Ron Bekkerman, Ran El-Yaniv, Naftali Tishby and Yoad Winter, SIGIR-01
    Juan Du 2002-03-20
  2. ProbMap: A Probabilistic Approach for Mapping Large Document Collections
    Thomas Hofmann, IDAJ 2000
  3. Unsupervised and Supervised Clustering for Topic Tracking. 
    M. Franz, J. S. McCarley, T. Ward & W. Zhu, SIGIR-01
  4. On Clustering Validation Techniques
    M. Halkidi, Y. Batistakis & M. Vazirgiannis
    To appear in Intelligent Information Systems Journal
  5. Co-clustering documents and words using Bipartite Spectral Graph Partitioning
    I. S. Dhillon. KDD-01.
  6. Learning the Similarity of Documents
    Thomas Hofmann, NIPS-12, MIT Press 2000
    Shi Zhong 2002-03-20
  7. A fast and high quality multilevel scheme for partitioning irregular graphs
    George Karypis and Vipin Kumar, SIAM Journal on Scientific Computing, 1998

IV. Contents + Links

  1. Enhanced Topic Distillation Using Text, Markup Tags, and Hyperlinks.
    S. Chakrabarti, M. Joshi & V. Tawde, SIGIR-01
  2. The missing link - a probabilistic model of document content and hypertext connectivity
    David Cohn and Thomas Hofmann, NIPS-13, 2001
    Sreangsu Acharyya 2002-04-01
  3. HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering
    Ron Weiss, et. al., 
    Proceedings of the Seventh ACM Conference on Hypertext, Washington, DC, March 1996
    Sreangsu Acharyya 2002-04-22
  4. Context and Page Analysis for Improved Web Search
    Steve Lawrence and C. Lee Giles
    IEEE Internet Computing, vol.2, no.4, pp38-46, 1998
    Alexander Strehl 2002-03-18

V. Click-Stream Analysis

1.      Analysing navigation behaviour in web sites integrating multiple information systems
Bettina Berendt and Myra Spiliopoulou
VLDB Journal, Special Issue on Databases and the Web, 2000
Suju Rajan 2002-04-22
 

  1. Navigation Analysis Tool based on the Correlation between Contents Distribution and Access Patterns
    Hiroki Kato, Takehiro Nakayama, Yohei Yamane
    Proc. Web Mining Workshop, KDD2000
    Alexander Strehl 2002-04-15

VI. Personalization

1.      Adaptive Web Navigation for Wireless Devices
C. Anderson, P. Domingos & D. Weld. IJCAI-01.
Adrian Agogino 2002-04-15

2.      Generative Models for Cold-Start Recommendations
Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar and David M. Pennock
SIGIR-01 Workshop on Recommender Systems

3.      Is it all about connections? Factors affecting the performance of a link-based recommender system
Miles Efron and Gary Geisler, SIGIR-01 Workshop on Recommender Systems
Lorenzo Thione 2002-04-22

4.      Dependency networks for inference, collaborative filtering and data visualizations
David Heckerman, et. al., Tech. Report, Microsoft Research
Rasmus Pederson 2002-04-24
 

5.      The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users
Eric Horvitz, et. al., Proc. of the 14th Conf. on Uncertainty in Artificial Intelligence, 1998
Alexander Strehl 2002-04-24

6.      PVA: A Self-Adaptive Personal View Agent
Chien Chin Chen, Meng Chang Chen and Yeali Sun
Journal of Intelligent Information Systems, 18:2/3, 173-194, 2002
Adrian Agogino 2002-04-24
 

7.      Discovery of Aggregate Usage Profiles for Web Personalization
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa
in Proceedings of the Web Mining for E-Commerce Workshop (WebKDD'2000), held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD'2000), August 2000, Boston.
Juan Du 2002-04-24
 


 

VII. Miscellaneous

  1. Iterative Residual Rescaling: An Analysis and Generalization of LSI
    R. Ando & L. Lee. SIGIR-01.
  2. Learning Probabilistic Models of the Web, 
    Thomas Hofmann, ACM SIGIR 2000
    Shi Zhong 2002-04-15
  3. The small-world phenomenon: An algorithmic perspective
    Jon Kleinberg. 
    Cornell Computer Science Technical Report 99-1776, October 1999.
    Suju Rajan 2002-02-13
  4. Web Caching with Consistent Hashing
    David Karger, et. al., WWW8
  5. Mining the Web for Relations
    Neel Sundaresan and Jeonghee Yi, WWW9
    Juan Du 2002-04-15
  6. A Dynamic Probabilistic Model to Visualize Topic Evolution in Text Streams
    Ata Kaban and Mark A. Girolami
    Journal of Intelligent Information Systems, 18:2/3, 107-125, 2002

Last updated 02/2002