The following is a preliminary schedule for CS 5083 Knowledge Discovery and Data Mining for Spring 2012. This schedule will be updated as the semester progresses.
Date | Topic | Reading | Assigned | Due |
Jan 17 (Week 1: Introduction) | Introduction, what is data mining, how the class works, project discussion | |||
Jan 19 | What is data mining? Project discussion |
|
Project ideas | |
Jan 24 (Week 2: ) | No class |
|||
Jan 26 | What is data mining? Ethics and data mining, Is data mining just statistics? Project discussion | Finish chapter 1 from the book |
Find and read a paper on ethics and data mining | Project vote |
Jan 31 (Week 3: ) | Association rules | Fast Algorithms for Mining Association Rules | Summary 1 | |
Feb 2 | Algorithm overview | Chapter 4 | Summary 2 | |
Feb 7 (Week 4: ) | Evaluation | Chapter 5 | Summary 3 | |
Feb 9 | Real algorithms | Chapter 6: 1st half | Summary 4 | |
Feb 14 (Week 5: ) | Real algorithms | Chapter 6: through section 2.4 | Project updates | |
Feb 16 | Real algorithms | Chapter 6: 2nd half | Summary 5 | |
Feb 21 (Week 6: ) | Finish real algorithms | Project updates | ||
Feb 23 | Visit from our data source representative | |||
Mar 28 (Week 7: ) | Logical Shapelets | Logical-Shapelets: An Expressive Primitive for Time Series Classification |
Summary 6 | |
Mar 1 | Indexing time series efficiently | iSAX 2.0: Indexing and Mining One Billion Time Series | Summary 7 | |
Mar 6 (Week 8: ) | Time series paper presentations | Tim: Qiang Zhu and Eamonn Keogh (2010) Using CAPTCHAs to Index Cultural Artifacts. The Ninth International Symposium on Intelligent Data Analysis [pdf]. Chris: Li Wei and Eamonn Keogh (2006) Semi-Supervised Time Series Classification. SIGKDD 2006. Scott: Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. Caleb: Eamonn Keogh, Li Wei, Xiaopeng Xi, Stefano Lonardi, Jin Shieh, Scott Sirowy (2006). Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems Carlos: Jin Shieh and Eamonn Keogh (2008) iSAX: Indexing and Mining Terabyte Sized Time Series. SIGKDD 2008. Allen: Bing Hu, Thanawin Rakthanmanon, Yuan Hao, Scott Evans, Stefano Lonardi, and Eamonn Keogh (2011). Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL Wayne: Gustavo Batista, Xiaoyue Wang and Eamonn J. Keogh (2011) A Complexity-Invariant Distance Measure for Time Series. James: E. Keogh, J. Lin and A. Fu (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence Sonya: Thanawin Rakthanmanon, Eamonn Keogh, Stefano Lonardi, and Scott Evans (2011). Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data. Nathan: Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs. |
||
Mar 8 | Finish up presentations from Tuesday | |||
Mar 13 (Week 9: ) | Multi-dimensional time series mining | Identifying Predictive Multi-Dimensional Time Series Motifs: An application to severe weather prediction | Summary 8, pick a good paper to help with the project | |
Mar 15 | Multi-dimensional time series mining | Genetic Algorithm Search for Predictive Patterns in Multidimensional Time Series |
Summary 9 | |
Mar 17-25 | Spring Break! |
|||
Mar 27 (Week 10: ) | Data transformations and SVMs | Chapter 7 of the book, Support Vector Machines: Hype or Hallelujah? | Summaries on both papers | |
Mar 29 | SVMs on real data | David Goldberg's thesis | Summary | |
Apr 3 (Week 11: ) | Random Forests | Summary 13 | ||
Apr 5 | Student papers |
|||
Apr 10 (Week 12: ) | Student papers |
|||
Apr 12 | Student papers |
|||
Apr 17 (Week 13: ) | Student papers | Fast and Flexible Multivariate Time Series Subsequence Search | ||
Apr 19 | Cancelled: Dr McGovern sick |
|||
Apr 24 (Week 14: ) | Student papers | Clustering Very Large Multi-dimensional Datasets with MapReduce and
Causality Quantification and Its Applications: Structuring and Modeling of Multivariate Time Series |
||
Apr 26 | Student papers | Finish discussion of Causality Quantification and Its Applications: Structuring and Modeling of Multivariate Time Series and then Mining periodic behaviors for moving objects and Large Linear Classification When Data Cannot Fit In Memory | ||
May 1 (Week 15: ) | ||||
May 3 (Course wrapup) | Final presentations | Final presentations |