Shared Reality Classroom

This project is a component of the McGill Advanced Learnware Network project, funded by Canarie Inc. and Cisco Systems. Additional funding comes from the McGill Faculty of Engineering, the Royal Bank Teaching and Learning Improvement Fund, a Petro-Canada Young Innovator Award, the Natural Sciences and Engineering Research Council of Canada, and Formation de Chercheurs et l'Aide à la Recherche.

The augmented classroom teaching in Engineering component addresses the limited co-presence enjoyed by users of current videoconference technology. This project builds upon our ongoing research to enhance the experience of participants by simplifying the complexity of operation and automatically recording and indexing the lecture for later Web-based retrieval. Extending this work through high-bandwidth connectivity to another location will permit the environment to select appropriate views for display at each end based on the context of current activity. For example, knowing that a McGill instructor is listening to a student's question from Calgary, the system could automatically provide a zoomed in view of that student, permitting eye contact and improved interaction, without the need for manual camera pan and zoom control. Although the project will concentrate on teaching applications, the objective is to greatly expand the potential use of CA*net 3 for videoconferencing in a wide variety of areas.

While an increasing number of universities offer some distance education classes using videotape distribution, satellite feeds, or the Internet, current systems allow for very limited interaction between instructors and students. Videotape and television-based lecture dissemination are, by their very nature, unidirectional. While videoconferencing technology permits the bi-directional exchange of audio and video, the constrained affordances and quality of the medium impede the use of gestures and eye contact. Although high bandwidth networks permit the transmission of high-fidelity video between parties, the limits of display resolution still prevent the instructor from obtaining a clear view of students in the back of a remote classroom and often, vice versa. These obstacles discourage the natural interaction that takes place in a physical classroom setting, such as instructor awareness of class comprehension and spontaneous question-asking.

Furthermore, the instructor cannot walk around the room freely, unless a camera operator is present to follow these movements. Otherwise, the remote participants will lose sight of the presenter. Conventional motorized camera control algorithms cannot cope with the demands of tracking a presenter in a typical classroom setting. Many current tracking algorithms are faced with the same problems as those seen in traditional image processing. These include determining what to track, extracting only the information needed, and addressing various forms of noise. Some algorithms attempt to achieve robustness by requiring special clothing or equipment such as data gloves, portable computers, or active badges, while others involve an initial calibration phase to identify the object of interest. However, for a tracker to operate seamlessly in a teaching or conferencing environment, it is imperative that the system not impede the users' tasks physically, nor require a cumbersome initialization procedure. This motivates us to consider robust video tracking systems that can be calibrated without user involvement and track a user in a noisy environment typified by a classroom. This technology has important ramifications not only to distance education, but also to any application in which the involvement of a human camera operator is typically required.

Combined with a high-speed network for the low-latency, high resolution transfer of video data, we envision a system that provides a remote audience with a close-up, live view of the instructor, at a similar or better degree of fidelity than would be available to a physically present audience. In order to overcome the barrier of distance, two underlying technologies are imperative: (1) high quality audio-visual communication and (2) computer-mediation at each end of the channel, in order to select an appropriate view at all times. Computerized tracking of the presenter is one component of such computer-mediation, but research must also be conducted to determine appropriate responses to various gestures.

video-based hand-raise detection (MP4 video)

For example, if a student raises her hand to ask a question, should the system provide an auditory or visual cue to the presenter or simply zoom-in on that student? These questions make sense only when the underlying quality of the video and the encoding/decoding latencies can support effective interaction, given appropriate views. Hence, the application must take place in the context of high fidelity, low-latency communications, supported by high bandwidth networks. Although formal experimental measures have yet to be determined, the effectiveness of the system can then be evaluated in terms of the number and classification of instructor-student interactions taking place both within the physically present class and the remote, network-supported class.