EE 379K:
Introduction to DATA MINING
Tentative COURSE TOPICS list and Schedule
1. Introduction and Overview (1 lecture)
(TSK, Ch 1)
2. Classification (4 lectures): decision
trees, nearest-neighbor, Bayesian approaches, SVM
(TSK, Ch 4, 5.2, 5.3, 5.5)
3. Predictive Modeling/Regression (2 lectures): common issues; linear, online and non-linear approaches
(TSK, Appendix D)
3.a: Linking Regression and
Classification (2 lectures): Logistic
regression, feedforward neural networks
(TSK, Ch 5.4)
4. Combining Multiple Models: ( 1 lecture): ensemble
learning; bagging and boosting
(TSK, Ch 5.6)
5. Collaborative Filtering and NetFlix
Challenge ( 1 lecture):
6. Clustering/Segmentation: (4 lectures): issues and challenges,
k-means; hierarchical methods, graph partitioning; co-clustering,
semi-supervised learning
(TSK, Ch 8,9)
7. Data Pre-Processing, Cleaning, Reduction, Feature Extraction and
Visualization. (4 lectures): Data quality; Curse of dimensionality; PCA
(TSK, Ch 2, 3.1-3.3; Appendix B)
8. Intro to Web Mining and Cloud Computing: (2 lectures)
Google's Pagerank; Hubs and authorities; social
networks; Hadoop/MapReduce
Misc Topics: (time permitting): Data Warehousing and OLAP; Mining Association Rules, Market Basket Analysis
Term Paper Presentations (about 3 classes)
Course wrap-up; outlook. : (1
lecture)