CSE/STAT 598E, Data Mining II
3 credits, TR 9:45-11:00, 223B IST Building.
Office hours: TR 2:30--3:30 p.m.
This is part two of a two-semester course on data mining. Since
most of the topics we cover in this part
is quite independent of those covered in
part one (CSE 598E/STAT 597E Data Mining I), you can take this course without
having taken part one as long as you have some basic knowledge of
multivariate statistics and linear algebra, and certain level
of mathematical sophistication. Most of the materials will
be developed from scratch and should be easily followed by graduate
students from CSE, MATH, STAT and other engineering
and science departments.
Instructor
Hongyuan Zha
Textbook
The Elements of Statistical Learning,
by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Grading
- Projects: 70%
- Participation and discussion: 30%
Schedule
Course Notes and Links
Projects
Teams for Project
- Team 1:
- Tianjiang Li (tzl109@psu.edu)
- Bingjun Sun (bsun@cse.psu.edu)
- Viswanath Avasarala
- Padmapriya Ayyagari
- Team 2:
- Vlad Morariu (morariu@cse.psu.edu)
- Roberto Lublinerman (lubliner@cse.psu.edu)
- Dimitrios Zarpalas (dxz141@psu.edu)
- James Taylor (james@bx.psu.edu)
- Team 3:
- Qingzhao Tan
- Xin Yang
- Ding Zhou
- Yang Song
- Team 4:
- Ritendra Datta (datta@cse.psu.edu)
- Shiva Kasiviswanathan
- Ashish Parulekar
- Siddharth Pal (sup111@psu.edu)
- Team 5:
- Amitayu Das (adas@cse.psu.edu)
- Yiyu Chen (yzc107@psu.edu)
Topics
- Unsupervised Learning (Chapter 14)
- Principal component analysis (both discrete and continuous)
- Multi-dimensional scaling, Isomap and maximal variance
unfolding
- Local methods for manifold learning
- Semi-supervised learning
Project 1: Face image analysis
- Support Vector Machines (Chapter 12)
- VC dimension, statistical learning theory and
large-margin classifers
- Quadratic programming problems
- Design of kernels and kernel alignment
- One-class SVM
Project 2: Document classification
- Boosting (Chapter 10)
- Additive model and boosting
- Adaboost and exponetial loss
- Regularization
- Interpretation
Project 3: TBA