CS 5083: Knowledge Discovery and Data Mining

The following is a preliminary schedule for CS 5083 Knowledge Discovery and Data Mining for Spring 2011. This schedule will be updated as the semester progresses. Also, although it is not listed on each day, there will be project discussions every class period.

Date Topic Due
Jan 18 (Week 1) What is data mining? Seminar style classes, project discussion
Jan 20
Snow day!

Project ideas

Jan 25 (Week 2) No class (Dr McGovern traveling)
Jan 27 Is data mining just statistics? Project discussion, Barn-Raising paper, Statistics and Data Mining: Intersecting Disciplines Project vote
Feb 1 (Week 3)
Snow/Blizzard day!
Feb 3
Snow/Blizzard day!
Papers on turbulence prediction on D2L, summary 1
Feb 8 (Week 4) Fast Algorithms for Mining Association Rules Summary 2
Feb 10 Random Forests (full paper, wikipedia article, webpage describing the RF software) Summary 3
Feb 15 (Week 5) SVMs (see email) Summary 4
Feb 17 Bayesian Network paper on D2L Summary 5
Feb 22 (Week 6) 1st half of chapter 7 of Weka book (on D2L) Project presentations
Feb 24 2nd half of chapter 7 of Weka book (on D2L) Summary 6
Mar 1 (Week 7) A Complexity-Invariant Distance Measure for Time Series Summary 7
Mar 3 Time Series Shapelets: A New Primitive for Data Mining Summary 8
Mar 8 (Week 8) Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics Summary 9
Mar 10 Pick an paper from Eamonn Keogh Summary 10
Mar 12-20
Spring Break!
Mar 22 (Week 9) Netflix prize paper (pick one but this one seems good) Summary 11, list of KDD/ICDM papers
Mar 24 Paper from KDD Cup 2007 Summary 12
Mar 29 (Week 10)

Kim: Feature Shaping for Linear SVM Classifiers
Scott: Mixture Models for Learning Low-dimensional Roles in High-dimensional Data
Miguel: Fast Online Learning through Offline Initialization for Time-sensitive Recommendation

 
Mar 31

Josh: Combining Predictions for Accurate Recommender Systems

Shiblee: Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing

Daniel: Restricted Boltzmann Machines for Collaborative Filtering

 
Apr 5 (Week 11)

Bei: Rotation Forest: A New Classifier Ensemble Method

Xiaolei: Link Prediction Based on Graph Topology: The Predictive Value of the Generalized Clustering Coefficient

Diana: Ensemble Pruning via Individual Contribution Ordering

Yujia: Mining Positive and Negative Patterns for Relevance Feature Discovery

 
Apr 7 Singular Value Decomposition. Read a tutorial (you can watch the Strang lecture on youtube or google for SVD tutorial - however you want to learn about them is fine). Also read Simon Funk's webpage on using them in netflix. Summary 13
Apr 12 (Week 12) Reducing the Dimensionality of Data with Neural Networks (and this is supporting/extra material if you want) and Rotation Forests (paper on D2L) Summary 14
Apr 14 A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database Summary 15
Apr 19 (Week 13) Machine learning from Imbalanced Data Sets 101 and Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance Summary 16
Apr 21 An Efficient Boosting Algorithm for Combining Preferences Summary 17
Apr 26 (Week 14) MapReduce: Simplified Data Processing on Large Clusters Summary 18
Apr 28 Resisting Structural Re-identification in Anonymized Social Networks Summary 19
May 3 (Week 15) Research the Vermont Law on Data Mining of prescription data and the related news articles and supreme court case and come prepared to discuss (quick news article link 1, and link 2)  
May 5 Final Project Presentations
May 12
No final!