Monday, August 31, 2015

G. WATERS / GETTY IMAGES – Journal du CNRS

Specialist language processing, Xavier Tannier designed a tool to analyze, from articles and tweets, the opinions of different political currents and their members. He explains how it works.

You have just received a Google Award for the project “ Event thread extraction for viewpoint analysis ” An application can detect, analyze and graphically represent the views of the various political and their members on topics specific society. Can you explain what the language processing
Xavier Tannier:
This is a discipline of artificial intelligence halfway between IT and language , which aims to analyze human language, in which case the text. Examples of applications are machine translation, spell checking, automatic summarization, information extraction, text mining. My laboratory, the Limsi, works on almost all these areas, as well as the processing of spoken language. My work is more oriented information extraction and text mining. I cling to the analysis of large amounts of textual material (typically several million) to extract information relevant to a given situation. We speak of artificial intelligence as soon as one tries to make the machine a task which required human intelligence before, and that is not pure calculation. In natural language processing (NLP), we try to simulate the language skills of a human, either in terms of production, translation or understanding. This is the case this tool to organize texts to better decipher the opinions and power relations in the political world.

How did the idea of this project “ Event thread extraction for analysis viewpoint “?
X. T.:
This is a collaboration with Ioana Manolescu, INRIA Saclay of, and the team of the newspaper Decoders Le Monde led by Samuel Lawrence, which is dedicated to the fact checking. This widespread practice today in newspapers is to check, sometimes in real time, the accuracy of factual statements of a politician for example. It may be encrypted information – lower unemployment or expenses on a budget. – Whose accuracy is verified, or statements that may contradict previous statements

The

automatic language processing is

particularly suited to the analysis of

Articles release.

The team is also working on data visualization, primarily for the journal’s Web site. The language processing is particularly suited to the analysis of articles. I work for example on the notion of event, and I try, from a large body of texts to build a chronology of important events that occurred on a specific theme. For example, if the user wants the list of important events in the Arab Spring, the events in Iraq or about a person, the system will automatically search the newspaper and will try to determine what is most important about these questions and will provide a chronology to the user (ANR Chronolines).

What is the improvement over a simple search engine?
X. T.: There is a hierarchy of importance of the cited article, carried by a large number of data and articles. To take the example of a theme such as “secularism”, we will have a very large database, with ten or fifteen significant events in the area where the time information is of great importance. We will assign significance coefficients taking into account aspects related to rumors or political trends. Two main criteria are then divided into sub-criteria: redundancy, that is to say the item number repeating events, and timing analysis, which estimates that if we talk about it yet long after, c is an important event. Journalistic writing is very popular because the contents are numerous and because it is not at all-comers as on blogs: there will be no spelling mistakes, sentences are well constructed, etc. This poses a problem less to solve. We work across the Web, but most are preferred news sites to handle complicated data with less disparate elements.

What kind of information do you treat?
X. T.: We collect as much data to study these phenomena from press articles, statements of politicians and their Web sites and Twitter accounts. It will focus on politics because that is what interests people the most World : for example, they aim approaching regional, or primary right or later presidential . The goal is either on a specific event or on a given topic, collect political statements and spread them on the political spectrum: from the extreme left, left, center, right, to the extreme -Straight and have a visualization that allows fast enough to decode what the vocabulary and opinion carried by each of the parties. So we will distinguish language tics or postures without real political content. There are many political parties in France and within the view range is distributed by the party or rather by people recede a little. We will focus on language elements, or on the comments and reactions of the parties that are quite eloquent on sensitive events, and thus identify those who deviate discourse imposed for political strategy. Often, there is a mainstream displayed by a political party and finally two or three personalities who will take a completely different language within the same party. On the subject of secularism, for example, can be measured, and this is even more visible on topics such as marriage for all. This is what we called the “dissonant expressions.”

This project will

noticed
how
party that opposes any

apparently

actually adopt the same line.

Your starting material, however, is already formatted and very homogeneous?
X. T.: Even if some tweets recede a little bit; but we do not take into account all the tweets, only those who are linked to politicians accountable. The vocabulary is a bit rowdier, but it remains relatively controlled. One could imagine that this type of tool is involved in the standardization of thought, but we already see that the fact checking practiced in recent years prevents political transmit erroneous figures. They are then forced to lie more finely. The purpose of this application is to help journalists and citizens to understand how and why the politicians do not say what they think, but repeat what is formatted according to each party. The goal is not predictive: it is only decode the political discourse and help to visualize it. Typically, this project will be noted, for example on a European issue, how parties are total opposites in appearance actually adopt the same line.

What is the project’s progress and he will be marketed
X. T.: The Google Award, contrary to what the name suggests, is not a reward for completed work, but financial support on a year work in progress (with INRIA and Le Monde). We’re still in the fundraising phase, which begins rather well since the French National Research Agency (ANR) has just agreed to fund this program ( ContentCheck ANR). The work we have carried out are prerequisites: our work on “chronolines” journalistic timelines we are considering for several months on despatches from AFP, is now complete. But the winning project as such does not yet exist. I think the application is free, it is not intended to be marketed immediately, because it is the moment of a research work, and Google did not interfere in our work, but rather gathering ideas. Symbolically, it is very important to us, but significantly less than the RDA in terms of financial returns. We just can finance an engineer for a year

On the same subject, also read:. At the emotions search our SMS

LikeTweet

Sunday, August 30, 2015

[Festival of Huma] The 4 values ​​of free software match .. – Numerama