Entrar | Contactos | Dicionário | FLiP.pt | LegiX.pt | Blogue | Loja

May 31th - João Graça

João Graça (L2F @INESC-ID)

Rich Prior Knowledge in Learning for Natural Language Processing

Abstract:

We possess a wealth of prior knowledge about most prediction problems, and particularly so for many of the fundamental tasks in natural language processing. Unfortunately, it is often difficult to make use of this type of information during learning, as it typically does not come in the form of labeled examples, may be difficult to encode as a prior on parameters in a Bayesian setting, and may be impossible to incorporate into a tractable model. Instead, we usually have prior knowledge about the values of output variables. For example, linguistic knowledge or an out-of-domain parser may provide the locations of likely syntactic dependencies for grammar induction. Motivated by the prospect of being able to naturally leverage such knowledge, four different groups have recently developed similar, general frameworks for expressing and learning with side information about output variables.

These frameworks are Constraint-Driven Learning (UIUC), Posterior Regularization (UPenn), Generalized Expectation Criteria (UMass Amherst), and Learning from Measurements (UC Berkley).

This tutorial describes how to encode side information about output variables, and how to leverage this encoding and an unannotated corpus during learning. We survey the different frameworks, explaining how they are connected and the trade-offs between them. We also survey several applications that have been explored in the literature, including applications to grammar and part-of-speech induction, word alignment, information extraction, text classification, and multi-view learning. Prior knowledge used in these applications ranges from structural information that cannot be efficiently encoded in the model, to knowledge about the approximate expectations of some features, to knowledge of some incomplete and noisy labellings. These applications also address several different problem settings, including unsupervised, lightly supervised, and semi-supervised learning, and utilize both generative and discriminative models. The diversity of tasks, types of prior knowledge, and problem settings explored demonstrate the generality of these approaches, and suggest that they will become an important tool for researchers in natural language processing.

The tutorial will provide the audience with the theoretical background to understand why these methods have been so effective, as well as practical guidance on how to apply them. Specifically, we discuss issues that come up in implementation, and describe a toolkit that provides "out-of-the-box" support for the applications described in the tutorial, and is extensible to other applications and new types of prior knowledge.

--

Bio: João Graça is a post doctoral researcher at the L2F INESC-ID. He obtained his PhD in Computer Science Engineering at Instituto Superior Tecnico, Technical University of Lisbon, where he was advised jointly by Luisa Coheur, Fernando Pereira and Ben Taskar. His main research interest are Machine Learning and Natural Language Processing. Currently his research focus on unsupervised learning with high level supervision in the form of domain specific prior knowledge, and on the utility of unsupervised methods for real world applications.

 

 
March 9th - Noah Smith
March 23rd - Nuno Brás
March 30th - Shadab Khan
April 13th - David Batista
April 29th - Ruben Martinez-Cantin
May 14th - Xavier Anguera Miro
May 25th - Francisco Melo
June 8th - Matthijs Spaan
June 22nd - João Graça
July 2nd - Ricardo Vigário
November 2nd - Andras Hartmann
November 16th - Rui Guerreiro
November 30th - Gopala Anumanchipalli
December 14th - Mário Figueiredo
January 18th - Ivan Selesnick
February 2nd - Mariana Almeida
February 14th - Sara Silva
March 1st - Artur Ferreira
March 15th - Jorge Marques
March 29th - André Lourenço
April 4th - Kalyanmoy Deb
May 3rd - André Martins
May 17th - José Santos
May 31th - João Graça

Instituto Superior Técnico


Priberam.pt