Why
The main objective of EUMSSI is developing technologies for identifying and aggregating data presented as unstructured information in sources of very different nature (video, image, audio, speech, text and social context), including both online (e.g., YouTube) and traditional media (e.g. audiovisual repositories), and for dealing with information of very different degrees of granularity.
What
The resulting platform will be potentially useful for any application in need of cross-media data analysis and interpretation, such as intelligent content management systems, personalized recommendation, real time event tracking, content filtering.
How
The multimodal analytics will help organize, classify and cluster cross-media streams, by enriching its associated metadata. A core idea is that the process of integrating content from different media sources is carried out in an interactive manner, so that the data resulting from one media helps reinforce the aggregation of information from other media, in a cross-modal interoperable semantic representation framework. This will be accomplished thanks to the integration in a multimodal platform of state-of-the-art information extraction and analysis techniques from the different fields involved. Interoperability and interactive reinforcement of the data aggregation and a high-level semantic, conceptual and eventive representation will distinguish this proposal from others that incorporate multimodal search.
Main Concept
Nowadays, a journalist has access to a vast amount of data from a plurality of types of sources to document a story. Some of the sources provide structured data streams (e.g. AFP, Reuters etc.) while others provide unstructured and heterogeneous information (e.g. Social Web). One task of a multimedia journalist is to monitor, gather, curate and contextualise the relevant information for the target audience. He needs to go through an enormous amount of records with information of very diverse degrees of granularity, in order to put information into context and tell his story from all significant angles, and, at the same time, he needs to reduce the noise of irrelevant content. This is extremely time-consuming, especially when a topic or event is interconnected with multiple entities from different domains.
At a different level, many TV viewers are getting used to navigating with their tablets or iPads while watching the TV, the tablet effectively functioning as a second screen, often providing background information on the program or interaction in social networks about what is being watched.
Both the journalist and the TV viewer would greatly benefit from a system capable of automatically analysing and interpreting unstructured multimedia data stream and its social background, and, with this understanding, be able of doing such things as: contextualising the data; making further suggestions; contributing with new, related information; filtering unwanted content; etc.
The EUMSSI consortium has been created with the main objective of developing methodologies and technologies for identifying and aggregating data presented as unstructured information in sources of very different nature (video, image, audio, speech, text and social context), including both online (e.g., YouTube) and traditional media (e.g. audiovisual repositories), and for dealing with information of very different degrees of granularity.
A core idea is that the process of integrating content from different media sources is carried out in an interactive manner, so that the data resulting from one media helps reinforce the aggregation of information from other media, in a cross-modal interoperable semantic representation framework. Once all the available descriptive information has been collected, an interpretation component will dynamically reason over the semantic representation in order to derive hidden, or implicit, knowledge, following an event-centered structure, so as to answer such questions as: what are the themes or topics in the data stream?, what type of situation is being represented?, what kind of participants does it involve?, when and where is it happening?, what are the attitude and sentiment being expressed?
This will be accomplished thanks to the integration in a multimodal platform of state-of-the-art information extraction and analysis techniques from the different fields involved (image, audio and text analysis) plus the addition of the information generated by exploring the social context. Actually the mechanisms of social intelligence will be a core asset for interacting with the other modalities and reinforcing the aggregation and interpretation of the data.
Our Motivation