Table of Contents



1. Image: The Yalta Conference

Scenario

Let us imagine that an employee of an encyclopedia company wants to create a multimedia presentation of the Yalta Conference. For that purpose, s/he uses an MPEG-7 compliant authoring tool for detecting and labeling relevant multimedia objects automatically. On the web, the employee finds three different face recognition web services, each of them providing very good results for detecting Winston Churchill, Franklin D. Roosevelt and Josef Stalin respectively. Having these tools, the employee would like to firstly run the face recognition web services on images and import the extraction results into the authoring tool in order to automatically generate links from the detected face regions to detailed textual information about Churchill, Roosevelt and Stalin. Figure 1 is an example of such an image; the bounding boxes are generated by the face recognition web services and linked to textual data by the authoring tool.

MPEG-7 annotation example of the Yalta Conference
MPEG-7 annotation example of the Yalta Conference (Image adapted from Wikipedia)

Using the COMM ontology

The interoperability problem with which our employee was faced can be solved by employing the COMM ontology for representing the metadata of all relevant multimedia objects and the presentation itself throughout the whole creation workflow. The employee is shielded from details of the multimedia ontology by embedding it in authoring tools and feature analysis web services.

The application of the Winston Churchill face recognizer results in an annotation RDF graph that is depicted in the Figure 2 (visualized by an UML object diagram where the scheme used is instance:Concept as the usual UML notation). The decomposition of Figure 1-A, whose content is represented by id0, into one still region (the bounding box of Churchills face) is represented by the large middle part of the UML diagram. The segment is represented by the ImageData instance id1 which plays the StillRegionRole srr1. It is located by the DigitalData instance dd1 which expresses the RegionLocatorDescriptor rld1 (lower part of the diagram). Due to the semantic annotation pattern, the face recognizer can annotate the still region by connecting it with the instance Churchill of a domain ontology that contains historic Persons (upper part of Figure 2.

Running the two remaining face recognizers for Roosevelt and Stalin will extend the decomposition further by two still regions, i.e. the ImageData instances id2 and id3 as well as the corresponding StillRegionRoles, SpatialMaskRoles and DigitalData instances expressing two more RegionLocatorDescriptors (indicated at the right border of Figure 2. The domain ontologies which provide the instances Roosevelt and Stalin for annotating id2 and id3 with the semantic annotation pattern don't have to be identical with the one that contains Churchill. If several domain ontologies are used, the employee can use the OWL sameAs and equivalentClass constructs to align the three face recognition results to the domain ontology that is best suited for enhancing the automatic generation of the multimedia presentation.

Annotation example of the Yalta Conference image with COMM
Annotation example of the Yalta Conference image with COMM


2. Image: The Treachery of Images

Scenario

Let us imagine that Martin, a student of semiotics in art (Martin Lefebvre is a student from the University Paris 1, Panthéon-Sorbonne, who has analyzed some paintings of Magritte, http://imagesanalyses.univ-paris1.fr/), would like to annotate the image depicting the painting The Treachery Of Images of René Magritte (Figure 3) on the (semantic) web.

The picture shows a pipe that looks as though it might come from a tobacco store advertisement. Magritte painted below the pipe Ceci n'est pas une pipe (This is not a pipe), which seems a contradiction, but is actually true: the painting is not a pipe; it is an image of a pipe. As Magritte himself commented: "Just try to stuff it with tobacco! So if I had written on my picture 'This is a pipe' I would have been lying.". This painting and its paradox illustrates perfectly the duality of describing an object or a scene and an image or a video depicting or representing this object or scene.

René Magritte, The Treachery Of Images (Image adapted from Wikipedia)
René Magritte, The Treachery Of Images (Image adapted from Wikipedia)

Using the COMM ontology

Martin would like now to adopt our proposed multimedia ontology for annotating this painting. He does not have to directly manipulate or even understand the various patterns of the COMM ontology. On the contrary, he could use an image annotation tool, such as an extended version of the M-OntoMat-Annotizer that generates annotations according to our proposed multimedia ontology.

The decomposition of the content of Magritte's Image (physically represented by the instance img0) into three still regions is represented by the large middle part of the UML diagram (Figure 4-A). The still regions are represented by the ImageData instances id1, id2 and id3. The former plays a StillRegionRole while the latter two play an ImageTextRole. The boundaries of the regions are described by a Polygon and two Rectangles respectively which are values for the SegmentLocalizations that are requisites for the SegmentRoles.

Using the ontology, a clear distinction between the annotations which are provided by the image annotation tool and the extraction results of the OCRAlgorithms is possible. The former is described by instantiating the semantic annotation pattern (Figure 4-B). The pipe still region (id1) can be annotated with the instance (pipe1) of the concept Pipe of a possibly existing domain ontology. The strings which have been extracted by running the OCRAlgorithms are attached to the two image text segments id2 and id3 by instantiating the content annotation pattern (Figure 4-C and D). In order to keep the diagram simple, the extracted strings are not displayed but only the two DigitalData instances ceciData and pipeData are. These two DigitalData instances are about the two extracted strings ``Ceci'' and ``pipe'' respectively.

Annotation example of the Magritte painting with COMM
Annotation example of the Magritte painting with COMM