Available Projects

The following projects are suitable for graduate students or strong undergraduate graduates, either as Honours Thesis projects or in groups for their design projects. Team descriptions are based on the assumption of these being undertaken by undergraduate students. Please read the FAQ and then contact me if you are interested in getting involved in any of these projects.

Wine Recommender
Within the context of an industrial collaboration, we are undertaking the design and prototype development of a recommendation system for wines that begins with limited user data, and over time, becomes tailored to the individual consumer's tastes and profiles of similar users. The objective is not only to offer recommendations that the user is likely to enjoy, but also to help educate users as to specific characteristics of the wines. Much of this project can be viewed as a conventional machine learning challenge, but there is an arguably even more important component that relates to the user experience. Thus, significant effort will be allocated to gaining an understanding of how the target audience for the app currently makes their wine selections, and ensuring that the app supports existing habits.
Sub-projects include:

  1. User modeling: Develop tools to fit individual users into general groupings related to their interest and knowledge of wines, and wine preferences initially known, with minimal interaction requirements, e.g., a lightweight app enrollment process.
  2. Interaction design: Develop and test several interface iterations that offer wine recommendations based on the user's profile and current interests, in a manner that includes an explanation of the characteristics of the wine that make up part of the logic for particular recommendations.
  3. Exploratory recommendation engine: Develop a content-based filtering algorithm with a tunable exploration bias, that generates recommendations based on a user profile and input related to the user's immediate interests.

Project Team: 2-4 members
Requires strong software skills, design experience, and at least one member with mobile development experience. An Honours Thesis student could undertake either the first or last two components independently. Otherwise, the group will ideally consist of a mix of students with machine learning and user experience interests.

Wearable Haptics
Our high-level objective is to explore the design of wearable haptics as an interaction paradigm in everyday conditions, using wireless devices that can be attached to the body or inserted into regular clothing, capable of sensing human input and delivering richly expressive output to the wearer. In collaboration with an industrial partner, we are building on our experience with foot-ground sensing and actuation to explore this design space with a focus on foot-based haptics, for which potential applications span the gamut from rehabilitation therapy and sports training to information communication, virtual reality, and mobile gaming.
Sub-projects include:

  1. Building on our ongoing hardware prototyping efforts, which resulted in several generations of insoles capable of sensing and actuation to produce tactile patterns on the user's foot, the student will help develop the next iteration with more capable (voice-coil) actuators and additional output modalities, including audio, embedded in the insole, and possibly including prototyping of the inclusion of a graphical display to provide spatially coherent multimodal feedback to the user.
    Project Team: 1-2 members
    Requires strong electrical design experience, ideally suited to robotics club members.
  2. Pattern perception research, including signal generation design strategies, is needed to produce the haptic sensations experienced by users wearing these devices, whether in the form of discrete patterns of vibration ("tactons") or more continuous textures evocative of interaction with objects or ground surfaces. To support this objective, the student will help us build an initial version of haptic pattern authoring tools that allow the designer to specify haptic effects, abstracting some of the specifics of actuator locations and capabilities.
    Project Team: 1-2 members
    Requires strong software design and development experience, and interest in incorporating ideas from recent research literature.
  3. As an initial exploration of the potential for wearable haptic devices, the students will explore two compelling applications, one involving ground surface simulation, to replicate and enhance the capabilities of our haptic floor, and another involving gaming or training of a physical activity such as dance or sports. Project Team: 1-2 members
    Requires strong software design and development experience, and interest in conducting experiments with human participants.

Variable-Friction Surface Mechanism

We are exploring the use of in-shoe variable-friction mechanisms for rehabilitation, virtual reality, and gaming experiences. This project involves development, specification, prototyping and testing of a third-generation variable-friction surface mechanism for installation in an athletic shoe. The intent is to induce, under external control, the experience of foot-ground contact on a slippery surface, such as ice, to simulate natural walking conditions in potentially hazardous conditions. After refining the mechanism, our intent is to carry out experiments to verify whether such artificially induced slips exhibit similar biomechanical properties to real-world slip events, and ultimately, incorporate such hardware into rehabilitation and training environments.


Project Team: this project could be carried by 2-4 members with strong mechanical design experience, ideally with a mechatronics background.

Development of a new haptic interface for the feet

Haptic perception through the feet informs a wide range of dynamic and static human activity. Stimulating the foot, for example to render virtual ground surface reactions, requires comparatively strong, and thus large actuators due to their placement between the ground and a human loading the foot. In stationary setups, actuators can be integrated into static assemblies at the ground surface. However, this is not feasible for mobile applications, for which the stimuli must be provided wherever the user happens to be. In such scenarios, delivery of sufficiently strong stimuli through conventional haptic actuators, such as voice coils, poses a significant challenge in terms of the associated electrical power requirements. Our project will implement new approaches to help render the stimuli with sufficient force, thereby overcoming this challenge.

Specifically, we would like to implement a variant of a design suggested by Berrezag, Visell and Hayward for an amorphous haptic interface for reproducing effects of compressibility and crushability (Berrezag et al., EuroHaptics 2012). Their design is based on two deformable chambers, made of oriented polymers, such as biaxially oriented polypropylene (BOPP), and connected by a conduit filled with magnetorheological (Mr) fluid. By varying the viscosity of the fluid through changes to the applied magnetic field, the system can be used to render various haptic effects, stimulating different textures and material behavior. Our proposed variant offers certain advantages, and has potential applications to rehabilitation, gaming, and VR.


Project Team: this project could be carried by 1-2 members with strong mechanical design experience, ideally with a mechatronics background.

Perceptually Realistic Multimodal Walking
Our immersive CAVE environment provides compelling graphical, auditory, and haptic (sense of touch) effects related to foot-ground interaction during walking on various (simulated) ground surfaces, such as ice, snow, gravel, and water. The haptic effects are produced by a physics engine and delivered through the floor, driven by an array of vibrotactile actuators. This project will allow for seamless transitions between the various graphical environments through foot-based gestures, and will change the ground properties, including density (e.g., snow compaction) and friction, that the user experiences in each of these.
Sub-projects:

  1. Adding parameter control to a low-cost graphics engine recently developed for simulating water effects on a graphics display, overtop of the shoes, for simulating solid-liquid interaction. Integrating these results with the vibrotactile and acoustic effects rendered by our haptic shoes as the user steps into simulated liquids, e.g., puddles or pools of water.
    Project Team: 1-2 members
    Requires Android experience and strong programming skills; some experience with computer graphics libraries would be valuable.
  2. Port the Max/MSP and Pd patches responsible for rendering the haptic effects to an Android platform and integrate these with our new "haptic shoes" architecture, allowing the same effects to be delivered to users outside of a special lab environment.
    Project Team: 1-2 members
    Requires Android experience and strong programming skills; some experience with Max/MSP and/or Pd would be valuable.
  3. Rendering architecture updates and refinements for:
    1. geospatial navigation: update to the current version of Processing
    2. multitexture exploration: integration with motion capture and Unity shader coding to render footprints where users walk on various simulated ground surfaces such as snow, ice, and sand. This should additional include publishing updated parameters to the haptics engine that controls the physical response of the floor in the impacted regions.
    3. stereographic rendering: using Unity's 3D capabilities for more immersive 3D environments
    Project Team: 1-2 members
    Requires Unity experience and strong systems skills, as the architecture involves many components, both hardware and software.

Mixed reality human-robot interaction for reduction of workplace injury
As part of a multi-site FRQNT-funded project, we are investigating the use of mixed reality in a human-robot interaction scenario to reduce the risk to workers arising from musculo-skeletal injury. The concept is to provide workers with an interface that adequately conveys the visual, auditory, and haptic cues to permit efficient manipulation and control of their tools, but in a safe manner.

Our recent efforts resulted in the development of a lightweight replica of the tool handle, equipped with sensors and actuators, that allow the user to manipulated a mixed-reality model of the actual tool. Graphical augmentation, using a CAD model of the tool, and optionally, video overlay through a see-through display, will give the operator a visual impression of how the tool is responding, while recorded or synthesized sound, measured forces, torques, and vibrations, acquired by sensors at the tool end, will be mapped to auditory and haptic feedback cues delivered at safe levels to the operator, facilitating effective operation while avoiding RSI.

At present, manipulation of the handle is tracked with an optical motion capture system, and the mixed reality display is rendered through an Epson Moverio BT-200 display, but we are seeking to migrate to an Acer mixed-reality device, providing full-screen immersion and built-in motion tracking.

The project includes the following sub-tasks:

  1. instrumentation of actual tool with sensors for acquisition of force, torque, and vibration data
  2. reproduction of acquired sensor data at replica tool handle, complemented by task-specific graphical display
  3. improved rendering of tool state information, including graphical, auditory, and vibrotactile modalities, in the user's workspace
  4. integration of the 3D-printed tool handle with a force-feedback haptic device to enrich the perceptual experience of (tele-)manipulation of the actual tool; this will eventually be replaced by an actual cable-driven robotic assembly being developed by colleagues at Université Laval

Project Team: 3-4 members
Skills: Unity development experience, interest in virtual and mixed reality

Multimodal alarms for the OR and ICU

At present, the operating room (OR) and intensive care unit (ICU) are noisy environments, exacerbated by frequent alarms. Regardless of whether the alarms are valid or false, all command attention, raise stress, and are often irrelevant to the responsibilities of individual clinicians. To cope with these problems, this project investigates the possibility of using multimodal alarms, preserving audio for those alarms that should be announced to the entire team, but delivering certain alarm cues individually, through haptics (vibrations) to the feet.

As a first step, the project will involve designing and conducting an experiment to determine the degree to which both haptic and audio alarms can be learned, recognized in the context of other demanding activities, and to quantify the reaction times and accuracy to such cues, comparing unisensory auditory and multisensory auditory and haptic stimuli. We will employ the stop-signal reaction task (SSRT) and Profile of Mood States (POMS) during pre and post paradigm exposure to quantify the fluidity of attentional decision-making and fatigue, respectively, to the unisensory and multisensory conditions.

Through these experiments, we hope to determine preliminary guidelines for the number of distinct alarms that can be conveyed effectively through haptics, leading to a reduction in the demands on the audio channel. (This project is being conducted in collaboration with a US-based professor of Anesthesiology Critical Care Medicine.)
Project Team: 1-2 members
Students should have interest in human factors and experiment design, and general coding ability, in particular with scripting languages.

Intelligent Agent for the Visually Impaired: Vision-based scene description and contextual awareness for Autour.

Autour is an eyes-free mobile system designed to give blind users a better sense of their surroundings, using spatialized audio to reveal the kind of information that visual cues such as neon signs provide to sighted users. Sounds appear to come from locations surrounding the user, thereby conveying a sense of directionality and distance. This allows for parsimony of representation and less intrusive sound cues. Imagine the difference between a mechanical voice stating, "Restaurant, 50 meters, 60 degrees to your left" vs. a very short "Restaurant" spatialized in the correct direction.

Our plans for enhancing the current system include the following sub-projects:
  1. Android port: While the core functionality of the iOS code, written in C++, runs through the JNI layer, UI-specific components of the app need to be rewritten for Android devices; the student(s) will also need to debug several critical run-time bugs in the JNI code.
  2. Improved dialogue management: rather than reading out information about nearby points of interest, continuously, until passed, we would like to empower Autour with the ability to learn from context (e.g., location, time of day, day of week) and from the user's feedback, what sort of information is of interest to the user, and what should be suppressed. In this manner, the intent is to help Autour become more of an intelligent assistant, rather than a naive automaton.
  3. Image-based environment description: Integration of a deep-learning image understanding layer, applied to the output of the smartphone camera, will provide verbal descriptions of the environment that can be spoken to users, enhancing their understanding of what's around them. This will complement the information available from Google Places and Foursquare databases. Coupled with an OCR capability, this will additionally serve to support indoor navigation, where GPS is unavailable.
  4. Other video-based functionality: Additional use of smarthpone video we intend to integrate in Autour includes guidance to doorway entrances, obstacle detection, sensing of traffic light status, and text recognition. Most of these capabilities can build on existing image processing libraries or well-documented algorithms. However, tuning the operation of such code and delivering the appropriate feedback for the mobile use context will require non-trivial additional effort.
  5. Comparison of spatial audio rendering strategies. The team will compare the efficacy of our existing naive HRTF audio display strategy with another filter bank implementation developed by our colleagues, and investigate changes to the spatialization algorithm as relevant for the use of bone-conduction headphones.
  6. Updated characterization of smartphone sensor accuracy: In 2012, we carried out a comprehensive test of the accuracy of the GPS, accelerometer, gyroscope, and magnetometer on three commodity smartphones, providing guidance to developers who rely on these sensors for their applications. Since then, the quality of smartphone sensors may well have improved, motivating a repeat of our original experiment.

Project Team: each sub-project could be carried out by 1-2 members of a design project team or by an Honour's Thesis student. Most of these sub-projects would benefit from at least one member with smartphone development experience.

Real-Time Emergency Response
Real-Time Emergency Response (rtER) was a winner of the Mozilla Ignite challenge that called on teams to design and build apps for the faster, smarter internet of the future. Specifically, rtER allows emergency responders to collaboratively filter and organize real-time information including live video streams, Twitter feeds, and other social media to help improve situational awareness of decision makers in emergency response scenarios.

A concern related to the use of such a system in real emergency response scenarios is the danger of information overload and the associated demands on limited resources. To mitigate against this concern, we are employing simple video summarization algorithms and techniques that match similar videos into clusters, thereby significantly reducing the amount of content that must be viewed in order to gain an understanding of an emergency situation. These mechanisms will be integrated into the rtER platform.

We would like to incorporate additional capabilities as summarized in the following sub-projects:

  1. updates to iOS port of mobile client
  2. integration of audio communication and recording from mobile client streams
  3. migrating our existing HLS-based streaming architecture to the newer Dynamic Adaptive Streaming over HTTP (DASH) standard
  4. implementing a security layer involving user accounts, access control, secure identification of video streams, and HTTPS encryption
  5. enhancing the existing visualization architecture to better integrate live video and embed emergency-related information display
  6. investigating feature-matching for dynamic video mosaicing from moving or multiple video streams
  7. developing the infrastructure to support crowdsourced video analytics
  8. implementing chronology-based "event timelines" that allow viewers to scroll back to previous states during review of an emergency or crisis event

Project Team: 1-4 members
Skills: strong software development ability, plus specific skills as relevant to the individual sub-project(s)

Augmented Reality Tools for Enhanced Training of First Responders
This project is intended to equip firefighters with a heads-up-display (similar to Google glass) that provides them with valuable information related to their task, e.g., pointers to the nearest exit point and a breadcrumb trail indicating the path taken to the present location. The system was developed initially with support from the Mozilla Gigabit Community Fund and trialled with firefighters in a simple training scenario. Recent updates have integrated indoor positioning information, along with other sensor data from the TI Sensor Tag. Now, these data must be integrated to render the appropriate view of virtual content, overlaid correctly with the real-world scene. Sub-projects include:

  1. Building 3D indoor maps using the Project Tango tablet and leveraging this information for improved accuracy of indoor position and visualization of environment in low-visibility conditions.
  2. Incorporating position and orientation knowledge to render the relevant virtual information, including maps, waypoints, beacons, exit markers, and locations of other responders, as a see-through augmented reality display.
  3. Integrating new interactivity, allowing the firefighters to share information through the system, correlate their position with a map display, mark locations within the environment, and access additional data from external sensors such as the TI SensorTag.

Project Team: 1-3 members
Skills: strong software development ability, in particular on Android platform (for our augmented reality display); computer graphics experience would be highly desirable

Enhanced Remote Viewing Capabilities from a Camera Array
Our camera array architecture, initially developed for remote viewing of surgical (medical) procedures, provides real-time viewpoint interpolation capabilities, allowing users to look around the scene as if physically present. We are now interested in applying this architecture to more general video-mediated activities, including face-to-face videoconferencing, and exploring the potential to leverage mobile interaction with the array in a manner that compensates for the limited screen real estate of mobile devices.

Sub-project 1: Camera Array Enhancement
The students will first become acquainted with the existing software and hardware architecture, and then progress to designing, implementing and testing solutions to one or more of the following problems:

  1. rendering a 3D stereoscopic display from the dual-interpolated views that the array can generate
  2. implementing a dynamic camera-swapping selection mechanism based on the current set of requested viewpoints to achieve maximum quality and smoothness as users move between views
  3. evaluating the limits of scalability of competing rendering architectures, in which interpolation is carried out client-side rather than server-side
  4. incorporating recording functionality so that a sufficient subset of the camera outputs can be saved for on-demand viewing of the sessions at a later date

Project Team: 1-2 members
Skills: strong C programming experience and systems skills

Sub-project 2: Mobile Telepresence Experiment
This sub-project will examine the qualitative experience of telepresence when using a smartphone display as a mobile window into the remote environment by preparing and conducting an experiment comparing the mobile telepresence capabilities with a pan-tilt-zoom camera, and a fixed large-screen display. Tasks include:

  1. Acquisition of test video footage from multiple calibrated high-definition video sources.
  2. Configuring and testing the existing software architecture to make use of the pre-recorded video sources, which may involve direct retrieval of uncompressed data from RAM or possibly real-time decoding of compressed video from SSD.
  3. Recording of the actual experimental video content, which will involve some human activity that must be "judged" by the experiment participants.
  4. Implementing the virtual pan-tilt-zoom camera to support one of the experiment conditions.
  5. Carrying out the user study and evaluating the results

Project Team: 1-2 members
Skills: systems experience, good programming knowledge, interest in human-computer interaction and experimental studies

Past Projects

Video tagger and classifier UI
We developed the infrastructure for a web-based video tagging interface and a prototype object-and-event detector. Together, these tools could allow for both manual and automated tagging of video clips in popular repositories such as YouTube and Vimeo. We now wish to build on this architecture by improving the toolkit of object and event detection capabilities that can be tailored for a variety of general-purpose video analytics purposes. Our long-term objective is to combine these tools with user feedback on the automated detection to train more complex recognizers using machine learning techniques.

Open Orchestra
The orchestral training of professional and semi-professional musicians and vocalists requires expensive resources that are not always available when and where they are needed even if the funding for them were made available. What is needed is the musical equivalent of an aircraft simulator that gives the musician or vocalist the very realistic experience of playing or singing with an orchestra. The purpose of making this experience available through a next generation network-enabled platform is to provide the extensive tools and resources necessary at very low cost and wherever there is access to a high speed network.


Health Services Virtual Organization
The HSVO aims to create a sustainable research platform for experimental development of shared ICT-based health services. This includes support for patient treatment planning as well as team and individual preparedness in the operating room, emergency room, general practice clinics, and patients' bedsides. In the context of the Network-Enabled Platforms program, the project seeks to offer such support to distributed communities of learners and health-care practitioners. Achieving these goals entails the development of tools for simultaneous access to the following training and collaboration resources: remote viewing of surgical procedures (or cadaveric dissections), virtual patient simulation involving medical mannequins and software simulators, access to 3D anatomical visualization resources, and integration of these services with the SAVOIR middleware along with the Argia network resource management software.


Simulating a Food Analysis Instrument
We build on HTML5 and other web-related technologies to implement a simulator used for teaching the use of a spectrometer for the detection of food bacteria (e.g., in yogurt, milk, or chicken). Accurate detection of these bacteria is an important topic in the food industry, which directly impacts on our health and wellbeing. Importantly, making such simulators available through the web allows access to the underlying pedagogical content and training of students in third-world countries, where the Internet is available, but qualified educators are in short supply. Hands-on experience with a simulator of sufficient fidelity, especially one designed with instructional case scenarios, can provide invaluable educational and training opportunities for these students that would not otherwise be possible. In our simulator, the student is presented with a case scenario of food poisoning in a Montreal restaurant, and is then given the task of analyzing a food sample. Various options are presented, ranging from watching a brief documentary of the operation of the machine, to a guided set of steps that the student is invited to perform in a laboratory to solve the task. Users can directly control the knobs and buttons of the simulated spectrometer and are provided with a rich visual experience of the consequences of their actions, as the appropriate video clip is played back (forward or in reverse, e.g., to illustrate the effects of a switch being turned off).


3D Visualization and Gestural Interaction with Multimodal Neurological Data
This project deals with the challenges of medical image visualization, in particular within the domain of neurosurgery. We wish to provide an effective means of visualizing and interacting with data of the patient's brain, in a manner that is natural to surgeons, for training, planning, and surgical tasks. This entails three fundamental objectives: advanced scientific visualization, robust recognition of an easily learned and usable set of input gestures for navigation and control, and real-time communication of the data between multiple participants to permit effective understanding and interpretation of the contents. The required expertise to accomplish these tasks spans the areas of neurosurgery, human-computer interaction, image processing, visualization, network communications.


Mobile Game Device for Amblyopia Treatment
Amblyopia is a visual disorder affecting a significant proportion of the population. We are developing a prototype device for assessment and treatment of this symptom, based on a modified game application running on a compact autostereoscopic display platform. By sending a calibrated "balanced-point" representation to both eyes, we aim for a therapeutic process to gradually engage signals from the weaker eye to engage it in the visual process. The adaptation of this approach from a lab-based and controlled environment to a portable device for daily use has the potential to make amblyopia treatment more accessible.


Enhanced Virtual Presence and Performance
This project will enhance the next generation of virtual presence and live performance technologies in a manner that supports the task-specific demands of communication, interaction, and production. The goals are to: improve the functionality, usability, and richness of the experience; support use by multiple people, possibly at multiple locations, engaged in work, artistic performance, or social activities; and avoid inducing greater fatigue than the alternative (non-mediated) experience. This work builds on recent activities in Shared Spaces and the World Opera Project.


World Opera
Can opera be performed if the opera singers are standing on different stages in different time zones in different countries? This question is at the heart of the World Opera Project, a planned joint, real-time live opera performance to take place simultaneously in several Canadian, U.S. and European cities. The project is envisioned as a worldwide opera house located in cyberspace.


Underwater High Definition Video Camera Platform
The Undersea Window transmits live full broadcast high definition video from a camera on the undersea VENUS network, 100 m below the surface of the Saanich Inlet on Vancouver Island, to scientists, educators and the public throughout Canada and around the world via CA*net 4 and inter-connected broadband networks. The project will serve as a test bed for subsequent high definition video camera deployment on the NEPTUNE network in the Pacific Ocean. We subsequently worked on the development of Web services software that matches a common set of underwater video camera control inputs and video stream outputs to the bandwidth available to a particular scientist and allows scientists to collaborate through sharing the same underwater view in real time. We then produced a web-based video camera user interface that makes use of the controls and features available through these web services. In addition, we tested an existing automated event detection algorithm for possible integration into the "live" system.


Adaptive streaming for Interactive Mobile Audio
This work involves evaluation of audio codec quality in the context of end-to-end network transmission systems, development of adaptive streaming protocols for wireless audio with low latency and high fidelity characteristics, and testing of these protocols in real-world settings. Our freely downloadable streaming engine, nStream is available for Linux, OS X, and Gumstix platforms.


Augmented Reality Board Games
As novel gaming interfaces increase in popularity, we are investigating the possibilities afforded by augmenting traditional game play with interactive digital technology. The intent is to overcome the physical limitations of game play to create new, more compelling experiences, while retaining the physicality, social aspects, and engagement of board games.


Natural Interactive Walking (aka Haptic Snow)
This project is based on the synthesis of ground textures to create the sensation of walking on different surfaces (e.g. on snow, sand, and through water). Research issues involve sensing and actuation methods, including both sound and haptic synthesis models, as well as the physical architecture of the floor itself.


Audioscape: Mobile Immersive Interaction with Sound and Music
This project involves the creation of a compelling experience of immersive 3D audio for each individual in a group of users, located in a common physical space of arbitrary scale. The architecture builds upon our earlier immersive real-time audiovisual framework: a modeled audio performance space consisting of sounds and computational sound objects, represented in space as graphical objects. Current and planned activities include experimentation with different technologies for low-latency wireless audio communication, a large-scale augmented reality environment to support immersive interaction, and embedding of 3D video textures (e.g., other human participants) into the displayed space.


User interface paradigms for manipulation of and interaction with a 3D audiovisual environment
We would like to develop an effective interface for object instantiation, position, view, and other parameter control, which moves beyond the limited (and often bewilderingly complex) keyboard and mouse devices, in particular within the context of performance. The problem can be divided into a number of actions (or gestures that the user needs to perform), the choice of sensor (to acquire these input gestures), and appropriate feedback (to indicate to the user what has been recognized and/or performed).


Evaluation of Affective User Experience
The goal of this project is to develop and validate a suite of reliable, valid, and robust quantitative and quqlitative, objective and subjective evaluation methods for computer game-, new media-, and animation environments that address the unique challenges of these technologies. Our work in these area at McGill spans biological and neurological processes involved in human psychological and physiological states, pattern recognition of biosignals for automatic psychophysiological state recognition, biologically inspired computer vision for automatic facial expression recognition, physiological responses to music, and stress/anxiety measurement using physiological data.


Automatic multi-projector calibration
Multiple video projectors can be used to provide a seamless, undistorted image or video over one or more display surfaces. Correct rendering requires calibration of the projectors with respect to these surface(s) and an efficient mechanism to distribute and warp the frame buffer data to the projectors. Typically, the calibration process involves some degree of manual intervention or embedding of optical sensors in the display surface itself, neither of which is practical for general deployment by non-technical users. We show that an effective result can in fact be achieved without such intervention or hardware augmentation, allowing for a fully automatic multi-projector calibration that requires nothing more than a low-cost uncalibrated camera and the placement of paper markers to delimit the boundaries of the desired display region. Both geometric and intensity calibration are performed by projection of graycoded binary patterns, observed by the camera. Finally, the frame buffer contents for display are distributed in real time by a remote desktop transport to multiple rendering machines, connected to the various projectors.


Virtual Rear Projection
We transform the walls of a room into a single logical display using front-projection of graphics and video. The output of multiple projectors is pre-warped to correct misalignment and the intensity reduced in regions where these overlap to create a uniformly illuminated display. Occlusions are detected and compensated for in real-time, utilizing overlapping projectors to fill in the occluded region, thereby producing an apparently shadow-free display. Ongoing work is aimed at similar capabilities without any calibration steps as well as using deliberately projected graphics content on the occluding object to augment interaction with the environment.


Efficient Super-Resolution Algorithms
Super-resolution attempts to recover a high-resolution image or video sequence from a set of degraded and aliased low-resolution ones. We are working on efficient preconditioning methods that accelerate super-resolution algorithms without reducing the quality of the results achieved. These methods apply equally to image restoration problems and compressed video sequences, and have been demonstrated to work effectively for rational magnification factors.


Dynamic Image Mosaicing with Robustness to Parallax
Image mosaicing is commonly used to generate wide field-of-view results by stitching together many images or video frames. Existing methods are constrained by camera motion model and the amount of overlap required between adjoining images. For example, they cope poorly with parallax introduced by general camera motion, translation in non-planar scenes, or cases with limited overlap between adjacent camera views. Our research aims to resolve these limitations effectively to support real-time video mosaicing at high-resolution.


Dynamic View Synthesis
Acquiring video of users in a CAVE-like environment and regenerating it at a remote location poses two problems: segmentation, the extraction of objects of interest, i.e., people, from the background, and arbitrary view generation or view synthesis, to render the video from an appropriate virtual camera. As our background is dynamic and complex, naive segmentation techniques such as blue screening are inappropriate. However, we can exploit available geometric information, registering all background pixels with the environment empty and then, during operation, determine whether each pixel corresponds to the background through color consistency tests. Our view synthesis approach is to build a volumetric model through an efficient layered approach, in which input images are warped into a sequence of planes in the virtual camera space. For each pixel in each plane, we determine its occupancy and color through color consistency, using this to compose the novel image in a back-to-front manner.


Machine Learning Techniques for Closed-Loop Gestural Interaction
This project seeks to model the dynamics of movement for the purpose of sensory motor interaction design. The goal is to learn continuous models of movement or gesture, capturing the most salient features of the dynamics as well as the normative ranges of variability, and to do so in a way that facilitates using the movement models in closed loop interaction. The idea is to facilitate the acquisition and use of internal models of the dynamics in question on the part of users. Two main approaches are being explored: The learning of movement primitives by a kind of parametric semi-Bayesian nonlinear dynamical system (based on the Dynamic Movement Primitives of Ijspeert, Schaal, and Nakanishi), and the modeling of movement by nonparametric Bayesian dynamical systems. The novel aspect is the tight integration of statistical models with nonvisual feedback designed to aid interaction.


High-Resolution Video Synthesis from Mixed-Resolution Video
To increase the frame rate at high resolution of CMOS image sensors, we propose using their non destructive read-out capabilities to simultaneously generate high-resolution frames H at frame rate h and low-resolution frames L at frame rate l > h. Our method applies an image-processing algorithm to both sequences in order to synthesize a high-resolution video sequence S, at high frame rate l, containing the high-resolution details and the low-resolution motion dynamics. A motion evaluation algorithm is used to evaluate pixel motion in a coarse manner between the last interpolated (synthesized) high-resolution frame St-1 and the current low-resolution frame Lt generated by the camera.


Automated Door Attendant
The ADA is an interactive agent that serves the role of a simplified secretary, tailored for a university environment. The agent greets visitors, with a "talking head," takes messages, schedules appointments, and allows the browsing of selected documents. Components includes a video monitor, speaker, microphone, and camera. The attendant is presently being augmented with an animated face that allows for dynamic control of its movement in order to simulate the acts of speaking, turning to look in the direction of a visitor, and even yawning. We wish to carry out such control of the head as appropriate to the activity currently taking place.


Peripheral Communications
We consider two problems related to communication between geographically distributed family members. First, we examine the problem of supporting peripheral awareness, in order to improve both emotional well-being and awareness of family activity. This is based on a field study to determine the role and importance of various peripheral cues in different aspects of everyday activities. The results from the study were used to guide the design of our proposed augmented communications environment. Second, we consider the choice of mechanism to facilitate the on-demand transition to foreground communication in such an environment. The design suggests an expansion of Buxton's taxonomy of foreground and background interaction technologies to encompass a third class of peripheral communications.


Disparity from contour for object segmentation with occlusion
A new disparity-based segmentation method is proposed that explores the static 3D geometry of a background, and produces disparity-embedded object contours which can be used to separate objects via a multi-histogram scheme. This method does not require identical cameras or frame by frame full stereo reconstruction. It has low computational cost and can be applied to various vision applications that require object segmentation as a first step processing. The experiment results show that the proposed method is able to segment multiple objects despite occlusions.


Hierarchical Image Coding and Region of Interest Selection
We are developing low-complexity hierarchical encoding algorithms that provide modest data reduction at low cost for transmission over computer networks. A key feature is that the encoding is progressive, permitting truncation of the data stream at an arbitrary position with reduction in image quality rather than loss of content. On a related theme, we note that transmission of the entire data content of a video stream does not take into account the potentially diverse interests or capabilities of heterogeneous clients nor the relative importance of different components of the scene. Assuming operation on a multicast network, the challenge here is to ensure that individual client requests are balanced against overall system constraints, such as total available server bandwidth and limit of multicast channels. Our long-term goal is for such region selection to be automated with the assistance of intelligent agents, possibly given some hints from the user, for example, "I'm interested in this person's face" or "follow that object."


Interaction Paradigms in a Large Screen Environment
Virtual interaction metaphors for two-handed control have been studied in the past primarily in terms of speed and efficiency. We concentrate our analysis instead on the cognitive effects such metaphors have on users within a large screen environment. Based on a series of experiments we determine how best to manage the division of labour between hands in order to minimize conceptual error. Empirical evidence suggests that the proficiency of bimanual paradigms, such as toolglasses or pieglasses, varies according to a number of factors, for instance the amount of effort required by the non-preferred hand.


Parsing and Interpreting Gestures in a Multimodal Virtual Environment
Human-computer interaction based on the traditional input mode of keyboard and mouse fails to scale to the demands of large immersive environments, where users may be standing and moving about the space. Instead, we propose a gestural interaction paradigm in which users employ physical gestures to commmunicate their intentions. We are developing a framework for the acquisition and parsing of such gestures, using input from either video camera, data glove, or computer mouse (as a prototype). The architecture is fully configurable through XML files and uses a common data type in order to facilitate integration with other software components distributed over the network.


Statistical Multi-Object Tracking
We are developing a generic object tracker capable of following, in real-time, multiple objects in a dynamic, real-world, possibly cluttered environment, in which lighting levels can change dramatically, for example, a classroom where the instructor walks in front of a projection screen. Our tracker uses a combination of movement detection and statistical feature extraction to locate and maintain objects within the camera's field of view. A final step matches the various features found in the current image with the objects previously identified by the system.


Hand and Fingertip Tracking for Gesture Recognition
In augmented reality environments, traditional input interfaces such as the keyboard-mouse combination are no longer adequate. We turn, instead, to gestural language, long an important component of human interaction, employing computer vision techniques to perform hand tracking and gesture recognition. Our approach employs edge detection for foreground segmentation and tracks the wrist location with a particle filter. Based on the wrist location and orientation, we then determine the positions of the fingertips, exploiting their semi-circular shape by modelling the fingertip extremities as a circular arc. The fingertips can be located by looking for maximal responses of a circular Hough transform, applied to the hand boundary image, followed by several heuristic tests to filter out false positives and duplicate detection.


Stochastic Parsing with Semantic Constraints in Multimodal Interaction
This project uses typed feature structures and syntactic/semantic constraints to interpret user actions through arbitrary modes such as speech, gesture, and handwriting. To this end we have developed a unique parsing algorithm that takes advantage of this approach to search through partially specified hierarchical descriptions of user activity. This algorithm is the core of a larger multimodal framework that can generically incorporate many existing techniques in multimodal interaction such as temporal constraints, prosodic effects, and dialogue management. We intend to demonstrate these capabilities in a handful of applications, among them a simple multimodal game and a multimodal map navigation system.


Parallel Distributed Camera Arrays
To provide more robust and efficient object tracking for Intelligent Environments, we are working with colleagues to create a set of networked low-cost camera arrays that collectively provide high resolution and large field-of-view image processing capabilities. Our approach involves the development of a number of novel technologies, such as smart cameras with on-board reconfigurable image processing and network communication capabilities, techniques for cooperative parallel distributed image processing that are suitable for multi-camera image data, and techniques for reconstruction of arbitrary viewpoints from a network of video cameras viewing a scene. Our present efforts are aimed at developing algorithms to support an array of cameras for parallel distributed processing of image sequences. This involves synchronized video acquisition, monocular processing of the individual images, stereo processing of nearby pairs, matching and triangulation for depth extraction, and finally, integration of the stereo information from multiple pairs to generate a rich model of the objects.


Camera Calibration Methods
We conducted a thorough study investigating the effects of training data quantity, pixel coordinate noise, training data measurement error, and the choice of camera model on camera calibration results. The study includes a detailed comparison of various camera models, in order to determine the relative importance of the various radial and decentering distortion coefficients. While Tsai's world-reference based method yielded the most accurate results when trained on data of low measurement error, this, however, is difficult to achieve in practice without an expensive and time-consuming setup. In contrast, Zhang's planar calibration method, although sensitive to noise in training data, requires only relative measurements between adjacent calibration points, which can be accomplished accurately with trivial effort, suggesting that in the absence of sophisticated measurement apparatus, this may easily outperform Tsai's method.


Recording Studio that Spans a Continent
On Saturday September 23, 2000, a jazz group performed in a concert hall at McGill University in Montreal and the recording engineers mixing the 12 channels of audio during the performance were not in a booth at the back of the hall, but rather in a theatre at the University of Southern California in Los Angeles.


Intelligent Classroom Project
Classroom presentation technology was augmented with sensors, wired to computers for context-sensitive processing. Now, rather than require manual control, the room activates and configures the appropriate equipment automatically, in response to instructor activity. For example, when an instructor logs on to the computer, the system infers that a lecture is being started, automatically turns off the lights, lowers the screen, turns on the projector, and switches the projector to computer input. The simple act of placing an overhead transparency on the document viewer causes the slide to be displayed and the room lights adjusted to an appropriate level. Similarly, audiovisual sources such as the VCR or laptop computer output are displayed automatically in response to activation cues. Together, these mechanisms assume the role of skilled operator, taking responsibility for the low-level control of the technology, thereby freeing the instructor to concentrate on the lecture itself, rather than the user interface.


RoboCup Legged Competition
From 1999 through 2002, McGill was the only Canadian university and one of only four North American schools to participate in the Sony Legged league of the RoboCup Competition. This competition pitted our Sony legged robots against teams from other universities in a "cat-eat-cat" test of artificial intelligence and soccers skills.


Phidgets Interface
Based on the work of Greenberg and Fitchett, a project group designed and prototyped an elegant, USB-based I/O system to allow for easy and rapid development of software that interfaces to analog and digital inputs, digital outputs, and stepper motor control. The software environment surrounding this system was initially limited to running under Visual Basic on Windows systems but we are now extending the libraries with more advanced graphical capabilities and porting the system to Linux.


GraffitiBoard
The GraffitiBoard is a wall-sized computer display that tracks the position of a pointer (such as a user's finger) and displays the resulting penstrokes as if the user were writing on the wall. A video projector produces the displayed image while a video camera captures the users' actions. By applying a simple colour tracking algorithm or a more complex cross-correlation technique, it is possible to recognize certain actions and respond accordingly. For example, if the user's hand is placed on the wall, a palette with various painting options can be generated at that location. For our demostration program, we use both colour tracking and correlation techniques to track the movement of user's finger and draw and pictures and letters.


UbiVCR speech interface
This project uses speech recognition software and video overlay text messages to provide an intuitive VCR interface. Current projects include rebuilding a perl script that generates an electronic TV guide from the web, improving the grammar to deal with context-sensitive help, and running a formal experiment comparing the UbiVCR with other VCR-programming methods.

Millenium Exhibit
This project involved the development of two components of a fictitious house of the future for the Ontario Science Center. The exhibit consists of a dining room and living room scenario. Each room reacts to user activity, utilizing information from video cameras, voice recognition, and various low-level sensors, providing output through synthesized speech, audio and video clips.


Reactive Room
This project (1993-1995) developed a state of the art videoconferencing facility, augmented with various sensors, which reacted to user activity by automatically selecting appropriate configurations of audio and video sources. The system infers the intentions of users and reacts accordingly, allowing them to conduct both local and videoconference meetings, making full use of the presentation technology (document camera, VCR, digital whiteboard) without needing to interact with the computer.


Adaptive File Distribtion Protocol
AFDP is a protocol for the efficient and reliable distribution of large files to many hosts on a LAN or internetwork. The protocol is built on top of UDP, and uses a rate-based flow control mechanism following the publishing metaphor.


NOVICE: Neural network robotic control
A robotic system using simple visual processing and controlled by neural networks was developed. The robot performs docking and target reaching without prior geometric calibration of its components. All effects of control signals on the robot are learned by the controller through visual observation during a training period, and refined during actual operation. Minor changes in the system's configuration result in a brief period of degraded performance while the controller adapts to the new mappings.


Last update: 3 September 2017