We take inspiration from recent research on sentiment analysis that interprets text based on the subjective attitude of the author. We consider related tasks where a piece of text is interpreted to predict some extrinsic, real-valued outcome of interest that can be observed in non-text data.
- The interpretation of an annual financial report from a company to its shareholders is the risk incurred by investing in the company in the coming year.
- The interpretation of a critic’s review of a film is the film’s box office success.
- The interpretation of a political blog post is the response it garners from readers.
- The interpretation of a day’s microblog feeds is the public’s opinion about a particular issue.
In all of these cases, one aspect of the text’s meaning is observable from objective real-world data, although perhaps not immediately at the time the text is published (respectively: return volatility, gross revenue, user comments, and traditional polls). We propose a generic approach to text-driven forecasting that is expected to benefit from linguistic analysis while remaining neutral to different theories of language. A highly attractive property of this line of research is that evaluation is objective, inexpensive, and theory-neutral. This approach introduces some methodological challenges, as well.
We conjecture that forecasting tasks, when considered in concert, will be a driving force in domain-specific, empirical, and extrinsically useful natural language analysis. Further, this research direction will push NLP to consider the language of a more diverse subset of the population, and may support inquiry in the social sciences about foreknowledge and communication in societies.
This talk includes joint work with Ramnath Balasubramanyan, William Cohen, Dipanjan Das, Kevin Gimpel, Mahesh Joshi, Shimon Kogan, Dimitry Levin, Brendan O’Connor, Bryan Routledge, Jacob Sagi, and Tae Yano.
Noah Smith is an assistant professor in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science, as a Hertz Foundation Fellow, from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, especially unsupervised methods, machine learning for structured data, and applications of natural language processing. He serves on the editorial board of the journal Computational Linguistics and received a best paper award at the ACL 2009 conference. His ten-person group, Noah’s ARK, is supported by the NSF, DARPA, Qatar NRF, Portugal FCT, and gifts from Google, HP Labs, and IBM Research.