CS 5083: Knowledge Discovery and Data Mining

The following is a preliminary schedule for CS 5083 Knowledge Discovery and Data Mining for Spring 2012. This schedule will be updated as the semester progresses.

Date Topic Reading Assigned Due
Jan 17 (Week 1: Introduction) Introduction, what is data mining, how the class works, project discussion      
Jan 19 What is data mining? Project discussion
  1. Barn-Raising paper
  2. Chapter 1 from the book
  Project ideas
Jan 24 (Week 2: )
No class
Jan 26 What is data mining? Ethics and data mining, Is data mining just statistics? Project discussion

Finish chapter 1 from the book

Find and read a paper on ethics and data mining Project vote
Jan 31 (Week 3: ) Association rules Fast Algorithms for Mining Association Rules Summary 1  
Feb 2 Algorithm overview Chapter 4 Summary 2  
Feb 7 (Week 4: ) Evaluation Chapter 5 Summary 3  
Feb 9 Real algorithms Chapter 6: 1st half Summary 4  
Feb 14 (Week 5: ) Real algorithms Chapter 6: through section 2.4   Project updates
Feb 16 Real algorithms Chapter 6: 2nd half Summary 5  
Feb 21 (Week 6: ) Finish real algorithms   Project updates
Feb 23 Visit from our data source representative    
Mar 28 (Week 7: ) Logical Shapelets

Logical-Shapelets: An Expressive Primitive for Time Series Classification

Summary 6  
Mar 1 Indexing time series efficiently iSAX 2.0: Indexing and Mining One Billion Time Series Summary 7  
Mar 6 (Week 8: ) Time series paper presentations

Tim: Qiang Zhu and Eamonn Keogh (2010) Using CAPTCHAs to Index Cultural Artifacts. The Ninth International Symposium on Intelligent Data Analysis [pdf].

Chris: Li Wei and Eamonn Keogh  (2006) Semi-Supervised Time Series Classification. SIGKDD 2006.

Scott: Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms.

Caleb: Eamonn Keogh, Li Wei, Xiaopeng Xi, Stefano Lonardi, Jin Shieh, Scott Sirowy (2006). Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems

Carlos: Jin Shieh and Eamonn Keogh (2008) iSAX: Indexing and Mining Terabyte Sized Time Series. SIGKDD 2008.

Allen: Bing Hu, Thanawin Rakthanmanon, Yuan Hao, Scott Evans, Stefano Lonardi, and Eamonn Keogh (2011). Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL

Wayne: Gustavo Batista, Xiaoyue Wang and Eamonn J. Keogh (2011) A Complexity-Invariant Distance Measure for Time Series.

James: E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

Sonya: Thanawin Rakthanmanon, Eamonn Keogh, Stefano Lonardi, and Scott Evans (2011). Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data.

Nathan: Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs.

Mar 8   Finish up presentations from Tuesday    
Mar 13 (Week 9: ) Multi-dimensional time series mining Identifying Predictive Multi-Dimensional Time Series Motifs: An application to severe weather prediction Summary 8, pick a good paper to help with the project  
Mar 15 Multi-dimensional time series mining

Genetic Algorithm Search for Predictive Patterns in Multidimensional Time Series

Summary 9  
Mar 17-25
Spring Break!
Mar 27 (Week 10: ) Data transformations and SVMs Chapter 7 of the book, Support Vector Machines: Hype or Hallelujah? Summaries on both papers  
Mar 29 SVMs on real data David Goldberg's thesis Summary  
Apr 3 (Week 11: ) Random Forests
  1. Random Forest journal paper
  2. Random forests website
Summary 13  
Apr 5
Student papers
Mining Sensor Streams for Discovering Human Activity Patterns Over Time and Patent Maintenance Recommendation with Patent Information Network Model
Apr 10 (Week 12: )
Student papers
Clustering by Synchronization and Enabling Fast Lazy Learning for Data Streams
Apr 12
Student papers
Random Forest Based Feature Induction
Apr 17 (Week 13: ) Student papers Fast and Flexible Multivariate Time Series Subsequence Search  
Apr 19
Cancelled: Dr McGovern sick
Apr 24 (Week 14: ) Student papers Clustering Very Large Multi-dimensional Datasets with MapReduce and

Causality Quantification and Its Applications: Structuring and Modeling of Multivariate Time Series

Apr 26 Student papers Finish discussion of Causality Quantification and Its Applications: Structuring and Modeling of Multivariate Time Series and then Mining periodic behaviors for moving objects and Large Linear Classification When Data Cannot Fit In Memory    
May 1 (Week 15: )        
May 3 (Course wrapup) Final presentations Final presentations