Project Description

The LSCOM workshop has developed an expanded multimedia concept lexicon of more than 2000 concepts, of which slightly over 400 have been annotated in 80 hours of video. Concepts related to events, objects, locations, people, and programs have been selected following a multi-step process involving input solicitation, expert critiquing, comparison with related ontologies, and performance evaluation. Participants of the process include representatives from intelligence community users, ontology specialists, and multimedia analytics researchers. In addition, each concept has been qualitatively assessed according to some criteria, such as utility (usefulness), observability (by humans), and feasibility (by automatic detection). An annotation process was completed in late 2005 by student annotators at Columbia University and CMU, over the entire development set of TRECVID 2005 videos. Human subjects judge the presence or absence of each concept in the key frame of each subshot, resulting in a total of 61901 labels for each concept.

The first version of the LSCOM annotations consist of keyframe-based labels for 449 visual concepts, out of the 834 initial selected concepts, over the entire TRECVID 2005 development set (61901 subshots).

The LSCOM-Lite annotations include 39 high-level features (concepts), which are interim results from the effort in developing a Large-Scale Concept Ontology for Multimedia (LSCOM). Most of the concepts in LSCOM-Lite overlap with the concepts in LSCOM; however, some concepts in LSCOM-Lite are not in LSCOM. The concepts were selected based on semi-automatic mapping of 26377 noun search terms from BBC query logs in late 1998 to WordNet senses, division of semantic concept space into a small number of orthogonal dimensions, and evaluation of 2003 and 2004 TRECVID search topics. The dimensions consist of program category, setting/scene/site, people, object, activity, event, and graphics. A collaborative effort among participants in the TRECVID 2005 benchmark was completed in the summer of 2005 to produce annotations of the 39 concepts over the entire development set of TRECVID 2005 videos. Human subjects judge the presence or absence of each concept in the key frame of each subshot, resulting in a total of 61901 labels for each concept. Ten of the LSCOM-Lite concepts have been chosen for evaluation in the TRECVID 2005 high-level feature detection task and 20 LSCOM-lite concepts were evaluated at TRECVID 2006.

The Revised Event/Activity annotations were conducted on 24 concepts, which contained a temporal component. These concepts were originally annotated in the LSCOM v1.0 release using single keyframes for each shot. Since some concepts require motion, this approach gives unreliable results, so this subset of concepts was re-annotated by having human subjects watch the actual video clips, instead of just viewing single keyframes.

Contact Alex Hauptmann for more information.

Designed by Anlu Wang, copyright 2007