A New Approach to Cross-Modal Multimedia Retrieval

22 May 2012

José Costa Pereira University of California, San Diego, USA

The problem of cross-modal retrieval from multimedia repositories is considered. This problem addresses the design of retrieval systems that support queries across content modalities, e.g. using text to search for images. A mathematical formulation is proposed, equating the design of cross-modal retrieval systems to that of isomorphic feature spaces for different content modalities. Two hypotheses are investigated, regarding the fundamental attributes of these spaces. The first is that low-level cross-modal correlations should be accounted for. The second is that the space should enable semantic abstraction. Three new solutions to the cross-modal retrieval problem are then derived from these hypotheses: correlation matching (CM), which models cross-modal correlations, semantic matching (SM), which relies on semantic representation, and semantic correlation matching (SCM), which combines both.

On a second part of this talk, the problem of image retrieval using the query-by-example paradigm is considered.

Recent research efforts in semantic representations and context modeling are based on the principle of task expansion: that vision problems such as object recognition, scene classification, or retrieval (RCR) cannot be solved in isolation. The extended principle of modality expansion (that RCR problems cannot be solved from visual information alone) is investigated. A semantic image labeling system is augmented with text.

Pairs of images and text are mapped to a semantic space, and the text features used to regularize their image counterparts. This is done with a new cross-modal regularizer, which learns the mapping of the image features that maximizes their average similarity to those derived from text. The proposed regularizer is class-sensitive, combining a set of class-specific denoising transformations and nearest neighbor interpolation of text-based class assignments.



José Costa Pereira received a Licenciatura in Computer Science and Engineering from Faculdade de Engenharia (2000), and a M.S. in Computational Methods in Science and Engineering from a joint-venture between Faculdade de Ciencias and Faculdade de Engenharia (2003), all from University of Porto, Portugal. He has worked for Vodafone Portugal in the Data Networks group, from 2000 to 2005; joining the IP Division at Alcatel-Lucent in September 2005 just before joining graduate school. Since 2008, he works has a Ph.D. student in the Statistical and Visual Computing Lab (SVCL) in the Electrical and Computer Engineering Department, in the University of California, San Diego, USA. His current research is focused on image retrieval and classification, exploring contextual relations for image annotation. In the past, as part of his M.S. thesis, he has developed an Independent Component Analysis (ICA) based method for Blind Source Separation (BSS).