EE 379K: Introduction to DATA MINING

Tentative COURSE TOPICS list and Schedule

 

1. Introduction and Overview (1 lecture)
(TSK, Ch 1)

2. Classification  (4 lectures): decision trees, nearest-neighbor, Bayesian approaches, SVM
(TSK, Ch 4, 5.2, 5.3, 5.5)

3. Predictive Modeling/Regression (2 lectures): common issues; linear, online and non-linear approaches
(TSK, Appendix D)

3.a: Linking Regression and Classification (2 lectures): Logistic regression, feedforward neural networks
(TSK, Ch 5.4)

4. Combining Multiple Models: ( 1 lecture): ensemble learning; bagging and boosting
(TSK, Ch 5.6)

5. Collaborative Filtering and NetFlix Challenge ( 1 lecture):

6. Clustering/Segmentation: (4 lectures): issues and challenges, k-means; hierarchical methods, graph partitioning; co-clustering, semi-supervised learning
(TSK, Ch 8,9)

7. Data Pre-Processing, Cleaning, Reduction, Feature Extraction and Visualization. (4 lectures): Data quality; Curse of dimensionality; PCA
(TSK, Ch 2, 3.1-3.3; Appendix B)

8. Intro to Web Mining and Cloud Computing: (2 lectures)
Google's Pagerank; Hubs and authorities; social networks; Hadoop/MapReduce

Misc Topics: (time permitting): Data Warehousing and OLAP; Mining Association Rules, Market Basket Analysis

Term Paper Presentations (about 3 classes)

Course wrap-up; outlook. : (1 lecture)