Research

        Areas

Statistical learning
The focus is on the generative modeling approach for supervised and unsupervised learning. In particular, we investigate multilayer mixture models and two-way mixture models to tackle high dimensionality and non-standard cluster distributions. We explore clustering via mode association and allow density to be estimated nonparametrically. See the package HMAC. We have also developed D2-clustering, an extension of k-means (or Lloyd algorithm in vector quantization) for vector data to bags of vectors, and a generalized mixture modeling method for non-vector data.

Applications explored: document retrieval/classification, image annotation/retrieval/segmentation/compression, social networks, information visualization, genomics, etc.

Sample talk | Free software | Basics on data mining & learning
Stochastic modeling
Spatial stochastic models attempt to characterize the inherent dependence among image pixels. The dependence can then be exploited for various tasks, for instance, segmentation, compression, classification. We have developed the 2-D Hidden Markov model (i.e., Spatial HMM) with extensions to a multiresolution model (MHMM) and 3-D for volume data.

Applications explored: general-purpose photographs, satellite images, Chinese classical paintings, Van Gogh paintings, etc.

Sample talk | Tutorial on HMM
Image annotation
Image annotation is about tagging pictures by words automatically using only pixel information. We have developed ALIPR, a real-time computerized image annotation system. The work is rooted in the ALIP system developed in 2002. Relevant methodologies: 2-D MHMM, D2-clustering, generalized mixture modeling.

Sample talk | alipr.com | In the news: MIT Tech Review ...
flower,
holiday,
garden
ocean,
lighthouse,
beach
medicine,
seed,
science
Image retrieval
Content-based image retrieval systems search for similar pictures using only pixel information. We have developed the SIMPLIcity retrieval system that has been deployed at several real-world Web sites, e.g., airliners.net , mindat.org , terragalleria.com , and requested for educational purposes by dozens of universities. We continue to work on image retrieval to bring in new aspects such as aesthetics, semantics learning, and story picturing.

Sample talk | Demo | Slashdot news
Query
Social networks
Statistical modeling and learning techniques are used to discover E-communities and to study academic collaboration networks with applications to citseer.

Comparative genomics
Data mining and statistical modeling methods are used to study evolution and functions of DNA segments based on aligned DNA sequences of multiple species.

Sample talk
Source coding theory
Asymptotics of vector quantizers with high bit rate when perceptually based distortion measures are used.

Sample talk


        Software for the public (author: Jia Li)

@Jia Li, Updated August, 2005          Back to Home