A Web Tool for Building Parallel Corpora of Spoken and Sign Languages

28 June 2016

Fabio Kepler L2F / University of Pampa, Brazil

Sign languages are the main way of communication in the Deaf community and with the listening population. There are about 70 million deaf people and over 200 distinct sign languages in the world. Unfortunately, not all deaf know a sign language, and many cannot read or write in a spoken language. Moreover, when there is prelingual deafness, a sign language becomes the children’s native language, and a spoken language is hard to learn as a second language. This affects their learning in school, where there is usually no especial material in sign language. In this talk we will describe our work in building an online tool for manually annotating texts of spoken languages with sign languages, using the SignWriting system. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can then be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. By building a collaborative, online, easy to use annotation tool we aim at helping the development of proper resources for sign languages that can be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and we will discuss the main ones as well as alternatives for building better resources.



Fabio Kepler is a professor (~assistant) in Brazil currently on a year-long sabbatical/postdoctorate at L2F/INESC-ID. He is interested in NLP problems in general, like POS tagging, parsing, sentiment analysis, and machine translation, and more recently became interested in a specific subset of the natural languages, namely the sign languages. He holds a PhD from University of Sao Paulo, Brazil.