Modeling Subcellular Location from Images and Other Sources of Information

12 June 2012

Luís Pedro Coelho IMM

Subcellular location is an important property of proteins, carefully regulated by the cell machinery. To determine subcellular location on a proteome-wide scale, fluorescent image data is most commonly used and a classification system is employed for analysis. These systems assign each protein to one of a small set of predefined location classes (typically the major organelles).

Too often, in the past, the performance of classification was evaluated on datasets which contained multiple images of the same protein as representative of a class. I will argue that this is overly optimistic and generalises poorly.

On the second part of my talk, I will discuss how classification implies a limited representation of the underlying biology as proteins are often in multiple organelles. I will present techniques that go beyond the case of single location assignment to fractional assignment. These techniques were applied on a large collection of images of fluorescently tagged mouse proteins, which included several proteins for which no location assignment had been previously reported in the literature.

This work was performed at Carnegie Mellon University with Prof. Robert F. Murphy and Dr. Tao Peng.



Luís Pedro Coelho is a PhD candidate in computational biology at Carnegie Mellon University. His work is focused on automatic understanding of large collections of bioimage data. Previously, at CMU, he was involved in the Structure Literature Image Finder (SLIF) project, a project which mined the academic literature using both the text of the papers and the images therein. This project was one of the finalists in the Elsevier Grand Challenge (4 out of 70 teams were present at the final). Before coming to CMU, Luis earned a BS and an MS from Instituto Superior Técnico, in Lisbon. His MS research was on learning from noisy data with Bayesian networks.