Apply semi-supervised learning and/or domain adaptation techniques to the problem of sentiment analysis of opinionated text in natural language.
If a person writes a review of the iDroid smartphone from the Samokia brand saying that “it has a beautiful screen, great camera, and excellent sound quality”, this should be classified as a positive review. If another person writes a review saying that “this phone is terrible, the camera stopped working after two months”, this should be classified as a negative review, and so on.
The goal of sentiment analysis is to take text from product reviews and to classify them as positive or negative reviews. The goal of this project is to apply weakly supervised machine learning techniques to this problem, so that we can infer the sentiment of movie reviews by training a classifier in a corpus of reviews about cell-phones, or to understand Portuguese reviews while training a system with English reviews.
The training data will contain a set of reviews in some domain (e.g., movies) which have been manually labeled as being positive or negative. At test time, we want to classify reviews in a different domain (e.g., cell-phones). While some words are good predictors for both problem domains (e.g. “excellent” probably indicates a positive review, while the word “terrible” probably indicates a negative one), other words carry opposite polarities (e.g. “unpredictable” would have a positive polarity in the movie domain, and a negative one in the cell-phones domain). The system should be able to detect these domain biases automatically from the data.
There are no mandatory requisites. Some programming experience (in languages like C/C++, Java, Python, Matlab, etc) is preferred. It is also preferred, but not required, to have completed a Machine Learning course at IST.
At the end of the project, the student should have created a system able to classify a review as either positive or negative, robust to domain changes, using weakly supervised learning techniques.
Bo Pang; Lillian Lee; Shivakumar Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques.” Proceedings of EMNLP, 2002
John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association for Computational Linguistics (ACL), 2007.