Reading
List
EE 380L - A Practicum in Data Mining
Spring 2002
Notices
Paper Selection Policy
- Students must select and
present one of the following papers (except the general reading papers).
Note: we have more papers than students, so some will be left uncovered.
- Paper is allocated on first
come basis.
- To select a paper, email
your selection to the TA.
- Scheduling of your talk
depends on the topic you have chosen. Ideally it will take place in the
"Student Paper Presentations" slot (in the course
schedule) right after that topic has been covered in class.
- Selected
means the paper has been selected for presentation and cannot be selected
again.
I. General Reading (but not for class
presentations)
- Information
retrieval on the web
Mei Kobayashi and Koichi Takeda
ACM Computing Surveys, vol.32, no.2, 144-173, 2000
- Data mining for hypertext: A
tutorial survey
S. Chakrabarti
ACM SIGKDD Explorations, 1(2), 1-11, 2000
- Authoritative
sources in a hyperlinked environment.
Jon Kleinberg.
Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
Extended version in Journal of the ACM 46(1999). Also appears as IBM
Research Report RJ 10076, May 1997.
- The
PageRank Citation Ranking: Bringing Order to the Web
Lawrence Page, Sergy Brin, Rajeev Motwani, and Terry Winograd
- The Web as
a graph
S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal
Proceedings of the 19th ACM Symposium on Principles of Database
Systems, pp 1-10, 2000.
- Focused
Crawling: A New Approach to Topic-Specific Web Resource Discovery,
Soumen Chakrabarti, Martin van den Berg, Byron Dom, WWW8
- Text-Learning
and Related Intelligent Agents: A Survey
Dunja Mladenic, IEEE Intelligent Systems, July/August 1999
- Impact
of Similarity Measures on Web-page Clustering
A.Strehl, J. Ghosh and R. Mooney
Proc. AAAI workshop on AI for Web Search, K. Bollacker (Ed)
TR WS-00-01, AAAI Press, July 2000, pp. 58-64
- Web
Mining Research: A Survey
R. Kosala and H. Blockeel
SIGKDD Explorations, June 2000. Volume 2, Issue 1
- Data
Preparation for Mining World Wide Web Browsing Patterns
Robert Cooley, Bamshad Mobasher, and
Jaideep Srivastava
Knowledge and Information Systems, V1(1), 1999
- An
Internet-enabled Knowledge Discovery Process
by Alex Buchner, et. al., MINEit Software Ltd., 1999
II. Hyperlinks
- Improved
Algorithms for Topic Distillation in a Hyperlinked Environment
Krishna Bharat and Monika Henzinger
Research and Development in Information Retrieval, 104-111, 1998
Lorenzo Thione 2002-02-13
- Stable
Methods for Link Analysis
A. Ng, A. Zheng, M. Jordan. SIGIR-01
Sreangsu
Acharyya 2002-02-13
- Random
Walks with "Back Buttons"
Ronald Fagin et. al., Proc. 2000 ACM Symposium on Theory of Computation
- Stochastic
models for the Web graph.
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal,
Proc. of the 41th IEEE Symp. on Foundations of Computer Science.
2000
5.
Link Prediction and Path
Analysis Using Markov Chains,
Ramesh Sarukkai, WWW9
- Trawling
the web for emerging cyber-communities,
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, WWW8
III. Information
Retrieval
1. Learning
Approaches for Detecting and Tracking News Events
Y. Yang et. al., IEEE
Intelligent Systems, 14(4):32--43, 1999
2. Translingual
Information Retrieval: A Comparative Evaluation
J. G. Carbonell et. al., IJCAI-97
3. Latent
Semantic Kernels
Nello Cristianini and Huma Lodhi and John Shawe-Taylor
Journal of Intelligent Information Systems, 18:2/3, 127-152, 2002
Rasmus Pederson 2002-02-18
(a)
Document Classification
1. Improving
Short-Text Classification using Unlabeled Background Knowledge to Assess
Document Similarity
Sarah Zelikovitz and Haym Hirsh
Proceedings of the Seventeenth International Conference on Machine Learning
(ICML-2000).
2. Improving
Text Categorization Methods for Event Tracking.
Y. Yang, T. Ault, T. Pierce &
C. Lattimer. SIGIR-01.
Suju Rajan 2002-02-27
3. Transductive
Inference for Text Classification using Support Vector Machines.
T.
Joachims. ICML-99.
Rasmus Pederson 2002-03-06
4. Text
Classification in a Hierarchical Mixture Model for Small Training Sets
Kristina Toutanova, Francine Chen, Kris Popat, and Thomas Hofmann
Proceedings of the Tenth International ACM Conference on Information and
Knowledge Management, CIKM 2001
5. Text
Classification by Bootstrapping with Keywords, EM and Shrinkage
Andrew MacCallum and Kamal Nigam
ACL Workshop for Unsupervised Learning in Natural Language Processing, 1999
Lorenzo Thione 2002-02-27
6. A
Probabilistic Framework for the Hierarchic Organisation and Classification of
Document Collections
Alexei Vinokourov and Mark Girolami
BUBL journals: Information Processing and Management, 2002
Adrian Agogino 2002-02-27
(b)
Document Clustering
- On
Feature Distributional Clustering for Text Categorization
Ron Bekkerman, Ran El-Yaniv, Naftali Tishby and Yoad Winter, SIGIR-01
Juan Du 2002-03-20
- ProbMap:
A Probabilistic Approach for Mapping Large Document Collections
Thomas Hofmann, IDAJ 2000
- Unsupervised
and Supervised Clustering for Topic Tracking.
M. Franz, J. S. McCarley, T. Ward & W. Zhu, SIGIR-01
- On
Clustering Validation Techniques
M. Halkidi,
Y. Batistakis
& M.
Vazirgiannis.
To appear in Intelligent Information Systems Journal
- Co-clustering
documents and words using Bipartite Spectral Graph Partitioning.
I. S. Dhillon.
KDD-01.
- Learning
the Similarity of Documents
Thomas Hofmann, NIPS-12, MIT Press 2000
Shi Zhong 2002-03-20
- A
fast and high quality multilevel scheme for partitioning irregular graphs
George Karypis and Vipin Kumar, SIAM Journal on Scientific
Computing, 1998
IV. Contents + Links
- Enhanced
Topic Distillation Using Text, Markup Tags, and Hyperlinks.
S. Chakrabarti, M. Joshi & V. Tawde, SIGIR-01
- The missing link -
a probabilistic model of document content and hypertext connectivity
David Cohn and Thomas Hofmann, NIPS-13, 2001
Sreangsu Acharyya 2002-04-01
- HyPursuit:
A Hierarchical Network Search Engine that Exploits Content-Link Hypertext
Clustering
Ron Weiss, et. al.,
Proceedings of the Seventh ACM Conference on Hypertext, Washington,
DC, March 1996
Sreangsu Acharyya 2002-04-22
- Context
and Page Analysis for Improved Web Search
Steve Lawrence and C. Lee Giles
IEEE Internet Computing, vol.2, no.4, pp38-46, 1998
Alexander Strehl 2002-03-18
V. Click-Stream Analysis
1.
Analysing
navigation behaviour in web sites integrating multiple information systems
Bettina Berendt and Myra Spiliopoulou
VLDB Journal, Special
Issue on Databases and the Web, 2000
Suju Rajan 2002-04-22
- Navigation
Analysis Tool based on the Correlation between Contents Distribution and
Access Patterns
Hiroki Kato, Takehiro Nakayama, Yohei Yamane
Proc. Web Mining Workshop, KDD2000
Alexander Strehl 2002-04-15
VI. Personalization
1.
Adaptive Web
Navigation for Wireless Devices.
C. Anderson, P. Domingos
& D. Weld. IJCAI-01.
Adrian Agogino 2002-04-15
2.
Generative
Models for Cold-Start Recommendations
Andrew I. Schein, Alexandrin Popescul, Lyle H. Ungar and David M. Pennock
SIGIR-01 Workshop on Recommender Systems
3.
Is
it all about connections? Factors affecting the performance of a link-based
recommender system
Miles Efron and Gary Geisler, SIGIR-01 Workshop on Recommender Systems
Lorenzo Thione 2002-04-22
4.
Dependency
networks for inference, collaborative filtering and data visualizations
David Heckerman, et. al., Tech. Report, Microsoft Research
Rasmus Pederson 2002-04-24
5.
The Lumiere Project:
Bayesian User Modeling for Inferring the Goals and Needs of Software Users
Eric Horvitz, et. al., Proc. of the 14th Conf. on Uncertainty in
Artificial Intelligence, 1998
Alexander Strehl 2002-04-24
6.
PVA:
A Self-Adaptive Personal View Agent
Chien Chin Chen, Meng Chang Chen and Yeali Sun
Journal of Intelligent Information Systems, 18:2/3, 173-194, 2002
Adrian Agogino 2002-04-24
7.
Discovery
of Aggregate Usage Profiles for Web Personalization
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa
in Proceedings of the Web Mining for E-Commerce Workshop (WebKDD'2000),
held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in
Databases (KDD'2000), August 2000, Boston.
Juan Du 2002-04-24
VII. Miscellaneous
- Iterative
Residual Rescaling: An Analysis and Generalization of LSI.
R. Ando & L. Lee. SIGIR-01.
- Learning
Probabilistic Models of the Web,
Thomas Hofmann, ACM SIGIR 2000
Shi Zhong 2002-04-15
- The small-world
phenomenon: An algorithmic perspective.
Jon Kleinberg.
Cornell Computer Science Technical Report 99-1776, October 1999.
Suju Rajan 2002-02-13
- Web
Caching with Consistent Hashing,
David Karger, et. al., WWW8
- Mining the Web for Relations,
Neel Sundaresan and Jeonghee Yi, WWW9
Juan Du 2002-04-15
- A
Dynamic Probabilistic Model to Visualize Topic Evolution in Text Streams
Ata Kaban and Mark A. Girolami
Journal of Intelligent Information Systems, 18:2/3, 107-125, 2002
Last updated 02/2002