Integrating user-generated metadata into video collections

Michiel Hildebrand, VU University Amsterdam

Analysis of search behavior at Netherlands Institute for Sound & Vision*

  • People predominantly order video fragments
    • broadcast (33%)
    • stories (17%)
    • fragments (49%)
  • Finding fragments takes much longer than finding broadcasts
    • stories (2x)
    • fragment (3x)
  • Vocabulary mismatch between searchers and cataloguers
    • 35% of clicked results are not found by title or thesaurus term
* Search Behavior of Media Professionals at an Audiovisual Archive: A Transaction Log Analysis
B. Huurnink, L. Hollink, W. van den Heuvel, and M. de Rijke.

Metadata to support within video search?

multimedia content analysis
(computer vision)
user-generated metadata
(crowdsourcing)

... this could also enable

CROWDSOURCING

Crowdsourcing blends open innovation concepts with top-down, traditional management structures so that crowdsourcing organizations can effectively tap the collective intelligence of online communities for specific purposes
http://dbrabham.wordpress.com/crowdsourcing/

don't underestimate the Web

The Web as facilitator for big impact

  • enables grassroots movement
  • spreads across domains
  • allows contribution and participation
Pillowfight @ de Dam 03-04-2010: www.thehospages.com

CAPTCHA

± 200,000,000 captchas entered daily

reCAPTCHA

reuse to improve Optical Character Recognition (OCR)

Indirectly use human intelligence for digitisation

e.g. Google books, NY times
reCAPTCHA: Human-Based Character Recognition via Web Security Measures
Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham and Manuel Blum

Dutch National Archive on Flickr the Commons

After two years:

don't overestimate the crowd

organisational structure required

  • goals, rules, policies, ...
  • quality control
  • motivation for users
Pillowfight @ de Dam 03-04-2010: www.thehospages.com

Quality of tags?

eyes nose mouth

Consensus among users

Games with a purpose: score points for matching tags

http://www.gwap.com/

Wikipedia quality assurance

Information quality work organization in wikipedia
Besiki Stvilia et al.
Different roles and processes:

Wikipedia edit history

Wikipedia edit history

Revision history of article on Chocolate
Talk Before You Type: Coordination in Wikipedia
Fernanda B. Viégas et al.

Two considerations for crowdsourcing

CROWDSOURCING
VIDEO ANNOTATION

Crowdsourcing at
Sound & Vision

Motivation:
  1. Collect time-based annotations
  2. Bridging the ‘semantic gap’
  3. Engagement with the public

Web factor

Engagement with the public

Time-based annotations

Bridging the ‘semantic gap’

Waida? statistics from November 2009

Farmer seeks wife

BBC News

crowd control

Engagement with the public

>> constant effort required

Time-based annotations

Each tag is assigned to a single timepoint

Bridging the ‘semantic gap’

Automatically matched to vocabularies

46,792 unique tags 7,372 matched tags
B&G vocabulary (GTAA) 2,517 (5.3%) 822 (11%)
Cornetto WordNet 10,757 (23%) 3,858 (52%)

Judged by a cataloguer

Genre influences tags specificity and usefulness

INTEGRATION INTO ARCHIVE?

crowdsourced video tagging
crowd
tags
time-based descriptions
user perspective (subjective)
community consensus
in-house controlled annotation
professionals
concepts
descriptions of video/shot
expert perspective (objective)
cataloguers experience
crowdsourced video tagging
crowd
tags
time-based descriptions
user perspective (subjective)
community consensus
?????
bridge communities
link tags to concepts
descriptions of fragments
combine perspectives
quality assurance by moderation
in-house controlled annotation
professionals
concepts
descriptions of video/shot
expert perspective (objective)
cataloguers experience
http://www.flickr.com/photos/10110263@N03/4608152854

TAG GARDENING

Edit your own tags

Roles and processes

Tagger: edits own tags, e.g. after a game
  • correct spelling
  • delete tags
  • merge tags
  • ...
Gardener: moderate tags of multiple taggers to create fragment annotations
  • select and flag tags
  • link tags to fragments
  • link tags to concepts
  • ...

combine community consensus with cataloguers/domain experience

Garden tags

... another role and process?

Landlord: create high quality annotations of video scenes

combine community consensus with cataloguers/domain experience and video analysis

Garden scenes

m.hildebrand@few.vu.nl