The following is a preliminary schedule for CS 5083 Knowledge Discovery and Data Mining for Spring 2011. This schedule will be updated as the semester progresses. Also, although it is not listed on each day, there will be project discussions every class period.
Date | Topic | Due |
Jan 18 (Week 1) | What is data mining? Seminar style classes, project discussion | |
Jan 20 | Snow day! |
Project ideas |
Jan 25 (Week 2) | No class (Dr McGovern traveling) | |
Jan 27 | Is data mining just statistics? Project discussion, Barn-Raising paper, Statistics and Data Mining: Intersecting Disciplines | Project vote |
Feb 1 (Week 3) | Snow/Blizzard day! |
|
Feb 3 | Snow/Blizzard day! |
Papers on turbulence prediction on D2L, summary 1 |
Feb 8 (Week 4) | Fast Algorithms for Mining Association Rules | Summary 2 |
Feb 10 | Random Forests (full paper, wikipedia article, webpage describing the RF software) | Summary 3 |
Feb 15 (Week 5) | SVMs (see email) | Summary 4 |
Feb 17 | Bayesian Network paper on D2L | Summary 5 |
Feb 22 (Week 6) | 1st half of chapter 7 of Weka book (on D2L) | Project presentations |
Feb 24 | 2nd half of chapter 7 of Weka book (on D2L) | Summary 6 |
Mar 1 (Week 7) | A Complexity-Invariant Distance Measure for Time Series | Summary 7 |
Mar 3 | Time Series Shapelets: A New Primitive for Data Mining | Summary 8 |
Mar 8 (Week 8) | Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics | Summary 9 |
Mar 10 | Pick an paper from Eamonn Keogh | Summary 10 |
Mar 12-20 | Spring Break! |
|
Mar 22 (Week 9) | Netflix prize paper (pick one but this one seems good) | Summary 11, list of KDD/ICDM papers |
Mar 24 | Paper from KDD Cup 2007 | Summary 12 |
Mar 29 (Week 10) | Kim: Feature Shaping for Linear SVM Classifiers |
|
Mar 31 | Josh: Combining Predictions for Accurate Recommender Systems Shiblee: Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Daniel: Restricted Boltzmann Machines for Collaborative Filtering |
|
Apr 5 (Week 11) | Bei: Rotation Forest: A New Classifier Ensemble Method Diana: Ensemble Pruning via Individual Contribution Ordering Yujia: Mining Positive and Negative Patterns for Relevance Feature Discovery |
|
Apr 7 | Singular Value Decomposition. Read a tutorial (you can watch the Strang lecture on youtube or google for SVD tutorial - however you want to learn about them is fine). Also read Simon Funk's webpage on using them in netflix. | Summary 13 |
Apr 12 (Week 12) | Reducing the Dimensionality of Data with Neural Networks (and this is supporting/extra material if you want) and Rotation Forests (paper on D2L) | Summary 14 |
Apr 14 | A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database | Summary 15 |
Apr 19 (Week 13) | Machine learning from Imbalanced Data Sets 101 and Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance | Summary 16 |
Apr 21 | An Efficient Boosting Algorithm for Combining Preferences | Summary 17 |
Apr 26 (Week 14) | MapReduce: Simplified Data Processing on Large Clusters | Summary 18 |
Apr 28 | Resisting Structural Re-identification in Anonymized Social Networks | Summary 19 |
May 3 (Week 15) | Research the Vermont Law on Data Mining of prescription data and the related news articles and supreme court case and come prepared to discuss (quick news article link 1, and link 2) | |
May 5 | Final Project Presentations | |
May 12 | No final! |