Within the context of text-as-data paradigm, new type of analysis and metrics are today available to the political science research, allowing new and different scientific approaches. Political science is an area particularly benefited with this new paradigm. Traditionally, all national parliaments have produced comprehensive data for the whole legislative process, that includes a faithful transcription of all parliamentary debates. These transcriptions constitute a rich database, as it gives us access to one of the most relevant components of political dynamics: the (public) parliamentary debate. The use of NLP techniques, such as Text Classification, Named Entity Recognition or Keyword extraction, can produce statistical studies that previously were not accessible due to the size of the data. We will present a few studies that used the collection of speeches given in the plenary sessions of the Portuguese Parliament, since 1999, to answer simple questions posed by political scientists.
The press media is another rich source of data. With the imperative presence of all media in the online sector, most of the media data, in particular, news articles, are today accessible to be collected and further processed. In the line of our work within Political Science field, we have dedicated our studies to the political commentary, present in every newspaper. The political commentator usually writes about the current national and international political affairs and is known to have a dual role in the dynamics of the public opinion: is a key influencer as a reflection of the daily public opinion. We have collected more than 80000 articles, written approximately by 3500 distinct authors, published by the most relevant Portuguese newspapers in the period of 2008-2016. This presents a unique corpus that can not only be used for academic purposes but also has a public scrutiny of the (perhaps only saved) public opinion of the past. We have built a web application, named “Arquivo de Opinião”, that gives access, in a user-friendly experience, to the collected data.