introduction multimedia
[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

talk show tell print

PDF

intelligent multimedia @ VU

http://www.cs.vu.nl/~eliens/research/note-intelligent-multimedia.html



project leader: dr A. Eliëns
researcher: dr. Z. Huang
programmer: drs. C. Visser
student: M. Hildebrand

project description

We are developing a high-level platform for 3D and rich media virtual environments based on agent-technology, using the languages DLP, Java, and VRML. On top of this platform we have developed an scripting language STEP for specifying humanoid movements and gestures, based on dynamic logic.

Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to imultimedia information systems.

Our platform also supports embodied conversational agents, see  [Eliëns et al. (2002)]. As demonstrators we have developed a distributed soccer-game prototype with intelligent autonomous avatar-embodied agents as players  [Huang et al. (2002)], a humanoid animation demonstrating Tai-Chi,  [Huang et al. (2002b)], avatars presenting a dialog in a mixed media presentation environment,  [Eliëns et al. (2003)], a domestic agent that can be addressed in natural language,  [Hildebrand et al. (2003)], an avatar reaching for objects that uses reasoning and inverse kinematics,  [ Huang et al. (2003d)], and an avatar conductiong music,  [Ruttkay et al. (2003a) ].

As a student project in the Casus Practicum, we have developed a 3D presentation for INCCA, the International Network for the Conservation of Contemporary Art, in cooperation with ICN, the Dutch Cultural Heritage Institute.

funding

Our research has been supported by two NWO projects:

The combined effort of these projects led to the DLP+X3D platform and the development of the STEP language.

In addition, we have one pending proposal with Michiel Hildebrand as candidate researcher:

  • (submitted) IMMEDIATE -- Intelligent MultiMEDIA Transactions Environment

cooperation

The work on embodied conversational agents is being done in cooperation with dr. Z. Ruttkay from CWI. The work on the cultural heritage application is done in cooperation with drs. T. Scholte from ICN.

publications

Apart from publications in international conference proceedings and workshops, we have demonstrated our work on the ICT Kenniscongres 2002 in the theme intelligent multimedia.

In preparation is a prestigious book on Life-like Characters, for which we have contributed a chapter describing our platform and STEP,  [Huang et al. (2003c)]. The editor of this book, Helmut Prendinger from Tokyo University, commented on our chapter: I really have to say that your chapter is very very well done, very interesting, very comprehensive, and very readable. It says exactly the things people want to know when they are looking for some scripting language to animate their characters. Above that, your STEP system is very well motivated and nicely embedded in other strands of computer science research. .

a brief history

Although the WASP proposal was written in 1996, long before the RIF proposal (1998), dr. Z. Huang started as a post-doc on the WASP project when the RIF project ran already for more than six months. Since we then felt that the WASP project proposal was slightly outdated, we made an effort to merge the WASP and RIF projects, by focussing on agents in 3D virtual environments which resulted in a paper presenting a taxonomy of Web agents,  [Huang et al. (2000)]. However, after about a year the continuity of the RIF project was endangered due to a mutation of personell at CWI. So, NWO was asked for permission to utilize the RIF funds for prolonging the WASP project. This was granted, provided that the research goals of the RIF project were sufficiently covered within WASP.

In the RIF project we used the blaxxun Community Server and VRML to realize information retrieval and delivery im multi-user 3D environments. See  [van Ballegooij and Eliëns (2001)]. Agents were then conceived as extensions on the server-side using blaxxun's native agents enriched with embedded logic. However, at the same time that the RIF project funds were transferred to WASP, the first prototype of DLP in Java became available, and we decided, somewhat radically, to drop the more low-level blaxxun technology in favor of a unified approach in DLP. Thus, we extended DLP with primitives for manipulating VRML worlds, using the Java External Authoring Interface that is part of VRML. The DLP+VRML framework proved to be surprisingly effective, as testified by the following references:

DLP+VRML

Now, the language DLP itself has quite a long history,  [Eliëns (1992)]. It has also been described in  [Eliëns (2000)]. As a language, DLP offers an object-oriented extension of (traditional, Edinburgh-style) Prolog, with multi-threaded objects, non-logical instance or state variables, communication by rendex-vous and (distributed) backtracking. After preliminary prototypes in C++, we focussed on an implementation in Java, to be able to use DLP for Web programming,  [Visser and Eliëns (2000)]. The first Java implementation became available relatively late, but just in time to create the DLP+VRML extension when needed.

As concerns the acceptance of our approach within the Web3D community, we wish to point to the acceptance of our  [Huang et al. (2002)] paper for the highly competetive international Web3D Conference 2002 (acceptance: 1 out of 13), and the acceptance of  [Huang et al. (2003a)] for the Web3D 2003 Conference. More recently, we have made an effort to publish in the ECA (Embodied Conversation Agents) community, as testified by our contribution to the Life-like Agents book,  [Huang et al. (2003c)]. We also received an invitation for the Dagstuhl seminar on Evaluating Embodied Conversational Agents in March 2004.

embedding in education: focus on multimedia

focus on multimedia

The intelligent multimedia research has a strong impact on the educational activities for the specialisation(s) Multimedia with Computer Science and Multimedia and Culture for Information Science.

In the first year students start with a general Introduction to Multimedia. This course centers around three themes: the convergence between media, platforms and delivery technology, the availability of broadband communication and its impact on the development of standards such as MPEG-4, and multimedia information retrieval as an essential ingredient of the growing multimedia information repository on the Web.

There are two follow-up courses, which are given in respectively the second and third year:

courses

The first of these courses deals with the technology for creating 3D scenes and worlds, whereas the second is more focused on providing intelligent services in virtual environments. Students use the DLP+VRML framework for their assignment in the second course. See  [Huang et al. (2003)].

In addition for Multimedia and Culture there is a Multimedia Development Casus Practicum in which the technology is applied in an assignment developed with the Dutch Cultural Heritage Institute (ICN).

For both specialisations, Multimedia and Multimedia and Culture we plan to offer a course on XML-based Multimedia Technology, to be developed by dr. Z. Huang, to make students familiar with advanced topics in XML-based information processing. As a remark, our platform does already support the use of XML and XSLT stylesheets,  [Huang et al. (2003b)], and we are migrating to a DLP+X3D platform, with X3D as the XML-based successor of VRML.

research directions

Apart from the issues involved in the modelling and realization of embodied agents in rich media 3D environments, there are also issues with regard to the architecture and implementation of our DLP+X3D platform.

parallelism and synchronization

Complex humanoid gestures are of a highly parallel nature. The STEP scripting language supports a direct way of modelling parallel gestures by offering a parallel construct (par), which results in the simultaneous execution of (possibly compound) actions. To avoid unconstrained thread creation, the STEP engine makes use of a thread pool, containing a fixed number of threads, from which threads are allocated to actions. Once the action is finished, the thread is put back in the pool. This approach works well for most examples. However when many threads are needed, as in the conductor example (which requires approximately 60 threads), problems may occur, in particular when there are may background jobs.

modeling and representation

Our agent model may be characterized as a BDI-model, extended with sensors and effectors needed for the interaction in a virtual environment. The STEP scripting language has been developed to facilitate the specification of communicative acts, like gestures. However, we would also like to explore text-to-speech synthesis as an extra modality of communication.

One interesting research issue is how to specify a reusable library of gestures, accomodating for differences in (personal) style. This is currently being investigated by Z. Ruttkay from CWI.

Another intersting issue is the use of inverse kinematics to grasp objects. However, when an object is not within reach, the agent has to reason about the best way to get near to the object, to be able to reach it.

architecture and implementation

To solve the problem of reliable timing would require not only a modification of the STEP engine, but also a rather different implementation of the DLP threads supporting the parallelism in STEP. Currently, the implementation only allows for best effort parallelism and does not provide the means for deadline scheduling.

However, it is our impression that we have reached the utmost efficiency feasible within the Java platform. Therefore we have been considering to redevelop the DLP+X3D platform in a .NET environment. An additional advantage of migrating to the .NET environment would be the possible integration of functionality such as text-to-speech synthesis which is not readily available in the Java environment.

references

Eliëns (1992) Eliëns (1992)
Eliëns A. (1992), DLP -- A language for Distributed Logic Programming, Wiley
Eliëns (2000) Eliëns (2000)
Eliëns A. (2000), Principles of Object-Oriented Software Development, Addison-Wesley Longman, 2nd edn.
Eliëns et al. (2003) Eliëns et al. (2003)
Eliëns A., Dormann C., Huang Z. and Visser C. (2003), A framework for mixed media -- emotive dialogs, rich media and virtual environments, Proc. TIDSE03, 1st Int. Conf. on Technologies for Interactive Digital Storytelling and Entertainment, Göobel S. Braun N.,n Spierling U., Dechau J. and Diener H. (eds>), Fraunhofer IRB Verlag, Darmstadt Germany, March 24-26, 2003
Eliëns et al. (2002) Eliëns et al. (2002)
Eliëns A., Huang Z., and Visser C. (2002), A platform for Embodied Conversational Agents based on Distributed Logic Programming, AAMAS Workshop -- Embodied conversational agents - let's specify and evaluate them!, Bologna 17/7/2002
Hildebrand et al. (2003) Hildebrand et al. (2003)
Hildebrand M., Eliëns A., Huang Z. and Visser C. (2003), Interactive Agents Learning their Environment, Proc. Intelligent Virtual Agents 2003, Irsee, September 15-17, 2003 J.G. Carbonell and J.Siekmann (eds.), LNAI 2792, Springer, pp. 13-17
Huang et al. (2001) Huang et al. (2001)
Huang Z., Eliëns A., and De Bra P. (2001), An Architecture for Web Agents, Proceedings of the Conference EUROMEDIA 2001, 2001.
Huang et al. (2000) Huang et al. (2000)
Huang Z., Eliëns A., van Ballegooij A., De Bra P. (2000), A Taxonomy of Web Agents, IEEE Proceedings of the First International Workshop on Web Agent Systems and Applications (WASA '2000), 2000.
Huang et al. (2001) Huang et al. (2001)
Huang Z., Eliëns A., Visser C. (2001), Programmability of Intelligent Agent Avatars, Proceedings of the Agent'01 Workshop on Embodied Agents, June 2001, Montreal, Canada
Huang et al. (2002) Huang et al. (2002)
Huang Z., Eliëns A., Visser C. (2002), 3D Agent-based Virtual Communities. In: Proc. Int. Web3D Symposium, Wagner W. and Beitler M.( eds), ACM Press, pp. 137-144
Huang et al. (2002b) Huang et al. (2002b)
Huang Z., Eliëns A., Visser C. (2002b), STEP -- a scripting language for Embodied Agents, PRICAI-02 Workshop -- Lifelike Animated Agents: Tools, Affective Functions, and Applications, Tokyo, 19/8/2002
Huang et al. (2003) Huang et al. (2003)
Huang Z., Eliëns A., Visser C. (2003), Intelligent Multimedia Technology: An Approach to Combine Agent Technologies with Multimedia, in preparation
Huang et al. (2003a) Huang et al. (2003a)
Huang Z., Eliëns A., Visser C. (2003a), Implementation of a scripting language for VRML/X3D-based embodied agents, Proc. Web3D 2003 Symposium, Saint Malo France, S. Spencer (ed.) ACM Press, pp. 91-100
Huang et al. (2003b) Huang et al. (2003b)
Huang Z., Eliëns A., Visser C. (2003b), XSTEP: A Markup Language for Embodied Agents, Proc. CASA03, The 16th Int. Conf. on Computer Animation and Social Agents
Huang et al. (2003c) Huang et al. (2003c)
Huang, Z., Eliëns, A., and Visser, C. (2003c), STEP: a Scripting Language for Embodied Agents, in: Helmut Prendinger and Mitsuru Ishizuka (eds.), Life-like Characters, Tools, Affective Functions and Applications, Springer-Verlag, (to appear).
Ruttkay et al. (2003a) Ruttkay et al. (2003a)
Ruttkay Z., Huang Z. and Eliëns A. (2003a), The Conductor: Gestures for Embodied Agents with Logic Programming, Joint Annual ERCIM/CoLogNet Workshop on Constraint and Logic Programming, Budapest, Hungary, 30 June - 2 July, 2003
van Ballegooij and Eliëns (2001) van Ballegooij and Eliëns (2001)
van Ballegooij and Eliëns A. (2001), Navigation by Query in Virtual Worlds, Web3D 2001 Conference, Paderborn, Germany, 19-22 Feb 2001
Visser and Eliëns (2000) Visser and Eliëns (2000)
Visser C. and Eliëns A. (2000), A High-Level Symbolic Language for Distributed Web Programming. Internet Computing 2000, June 26-29, Las Vegas


[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director
eliens@cs.vu.nl

draft version 1 (16/5/2003)