CSE/STAT 598E, Data Mining II

3 credits, TR 9:45-11:00, 223B IST Building.

Office hours: TR 2:30--3:30 p.m.

This is part two of a two-semester course on data mining. Since most of the topics we cover in this part is quite independent of those covered in part one (CSE 598E/STAT 597E Data Mining I), you can take this course without having taken part one as long as you have some basic knowledge of multivariate statistics and linear algebra, and certain level of mathematical sophistication. Most of the materials will be developed from scratch and should be easily followed by graduate students from CSE, MATH, STAT and other engineering and science departments.

Instructor

Hongyuan Zha

Textbook

The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

Grading

  1. Projects: 70%
  2. Participation and discussion: 30%

Schedule

Course Notes and Links

Projects

Teams for Project

  1. Team 1:
    • Tianjiang Li (tzl109@psu.edu)
    • Bingjun Sun (bsun@cse.psu.edu)
    • Viswanath Avasarala
    • Padmapriya Ayyagari
  2. Team 2:
    • Vlad Morariu (morariu@cse.psu.edu)
    • Roberto Lublinerman (lubliner@cse.psu.edu)
    • Dimitrios Zarpalas (dxz141@psu.edu)
    • James Taylor (james@bx.psu.edu)
  3. Team 3:
    • Qingzhao Tan
    • Xin Yang
    • Ding Zhou
    • Yang Song
  4. Team 4:
    • Ritendra Datta (datta@cse.psu.edu)
    • Shiva Kasiviswanathan
    • Ashish Parulekar
    • Siddharth Pal (sup111@psu.edu)
  5. Team 5:
    • Amitayu Das (adas@cse.psu.edu)
    • Yiyu Chen (yzc107@psu.edu)

Topics

  1. Unsupervised Learning (Chapter 14)
    • Principal component analysis (both discrete and continuous)
    • Multi-dimensional scaling, Isomap and maximal variance unfolding
    • Local methods for manifold learning
    • Semi-supervised learning
    Project 1: Face image analysis
  2. Support Vector Machines (Chapter 12)
    • VC dimension, statistical learning theory and large-margin classifers
    • Quadratic programming problems
    • Design of kernels and kernel alignment
    • One-class SVM
    Project 2: Document classification
  3. Boosting (Chapter 10)
    • Additive model and boosting
    • Adaboost and exponetial loss
    • Regularization
    • Interpretation
    Project 3: TBA