Annotation with multiple thesauri

Design and evaluation of an open annotation tool

http://e-culture.multimedian.nl/

Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber & Geertje Jacobs

Rijksmuseum Amsterdam: Print Room online

pk online

Registration and digitization of Rijksmuseum Print Room

  • Agreement with the State Inspectorate for Cultural Heritage
    • Every object findable in 10 minutes
  • Publish digitized collection online
    • Appeal to a wider public
  • 700.000 prints, drawings and photographs
  • 100.000 in 3 years
  • Employes
    • 7 (4 full-time, 3 part-time) cataloguers
    • 2 (part-time) photographers
    • 2 (part-time) digital photo-editors
    • one (full-time) museum assistant

Annotation setting

Why controlled subject annotation?

when annotating

  • thesauri prevent:
    • incomplete
    • inconsequent
    • duplicates
  • save time when annotating

when searching

  • subject is an important entry point (general public)
  • thesauri support
    • multinlingual search
    • alternative spelling
    • synonyms
    • link to background knowledge

Current annotation process

huidig annotatie proces PK Online

Metadata sources at the Rijksmuseum
gebruik direct meerdere thesauri!

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation

annotatie proces met meerdere thesauri

Using multiple sources for subject annotation
demo

problems

multiple thesauri

  • large
  • heterogenous
  • overlap
    • semantic interoperability required

interface challenge

  • large coverage
  • effective presentation of results

User study

Requirements for multi thesauri annotation in terms of:

  • data modeling (thesauri)
  • functionality search algorithm
  • design user interface

Overview of study

req
proto 1
proto 1
proto 1
proto 1
proto 1
proto 1
proto 1
requirements analysis
Goal
acquire initial requirements for annotation
People
project leader, lead annotator, curator
Sources
  • documents
  • f2f discussions
  • annotation sessions
Findings
  • focus on subject annotation
  • user interface design
    • annotation fields for person, event, place, iconography, object, date
  • vocabulary data
    • ULAN, DBpedia persons, IconClass, WordNet, TGN
  • search functionality
    • find known term
    • find most suitable term
    • determine term does not exist
  • result visualization and organization
Prototype I
People
project leader
Sources
  • prototype demo
  • f2f discussion
Findings
  • equivalent terms
    • align thesauri
  • add hierarchical navigation
Prototype II
People
project leader, lead annotator
Sources
  • unsupervised exploration
  • email feedback
  • f2f discussion
Findings
  • term disambiguation
    • add provenance
  • use different sorting strategies
    • alphabetical for persons
  • merge "What" field
    • IconClass, WordNet and events
Prototype III
People
project leader, lead annotator, curator
Sources
  • unsupervised exploration
  • email feedback
  • f2f discussion
Findings
  • term disambiguation
    • add examples
  • equivalent terms
    • visualize aligned terms
Prototype IV
People
project leader, lead annotator, annotator, curator
Sources
  • 2 walkthroughs with prototype
  • f2f discussion
Findings
  • multilingualism
  • compound queries
Prototype V
People
project leader, lead annotator
Sources
  • unsupervised exploration
  • email feedback
  • f2f discussion
Findings
  • interface usability
  • support annotation process as a whole
    • add new annotation
    • finish/cancel annotation
    • view annotations
final prototype
Pilot
Goal
test experimental setup
People
1 participant
To be tested
  • title, description together with subject annotation fields is sufficient
  • incorporate online tutorial
Experiment
Setting
  • 5+2 participants
  • 1 hour annotation session
    • annotate about 6 "new" prints each
  • screen recording
  • live observation
  • 30 min Questionnaire
    • term disambiguation
    • term alignments
    • query construction
    • sorting and grouping strategies
    • compound queries

Challenge

Support three search tasks on multiple thesauri

Quickly find known term

problem solution
many search results
  • filter results
  • fast feedback
  • support compound queries
duplicates
  • "align" data
comparable results
  • disambiguate terms

Thesaurus exploration

problem solution
hierarchical navigation
  • combine search with navigation
compare terms
  • present extra information
consider alternatives
  • combine complementary thesauri

Exhaustive search

problem solution
in which thesaurus does a term occur?
  • search in multiple thesauri simultaneously
when to stop searching?
  • alfabetical sorting
  • indicate number of results
quickly try alternative queries
  • fast feedback

To think about

annotation interface

  • how generic are the design implications?
  • annotation for the general public?

integration of open linked data into workflow

  • implications for search, maintanence, ...
michiel.hildebrand@cwi.nl
http://e-culture.multimedian.nl