Disparity from contours for object segmentation with occlusions

This work was the research of Ph.D. student Wei Sun.


Object segmentation in a static or dynamic environment is a necessary first step for person or gaze tracking, gesture recognition, and various human-computer interaction applications.

Most systems are based on 2D object segmentation using a single camera. There are two approaches in general: one is to treat an image in a sequence as static, and the other is to take into account the temporal information in the sequence. The static methods include background subtraction and appearance modeling based on Principal Component Analysis or wavelet. Color, edge, and shape information has also been used but are often coupled with motion information. The simplest and most efficient temporal approach is frame differencing. A more sophisticated method involves the computation of optical flows. Most of these approaches, such as background subtraction or frame differencing, are either based on oversimplified assumptions about objects and background and have difficulty dealing with occlusions between multiple objects, or computationally expensive, such as appearance models or optical flows.

Some systems solve the occlusion problem by using reconstructed 3D information from multiple calibrated cameras. However, frame by frame stereo reconstruction is time-consuming. Existing stereo matching algorithms are generally sensitive to intensity difference between cameras, thus requiring multiple cameras with identical intensity response, which may not be possible in practice. Another well known problem of stereo matching is its poor performance on uniform or repetitive textures, which appear common in indoor scenes or virtual reality environments.

A new disparity-based segmentation method is proposed that explores the static 3D geometry of a background, and produces disparity-embedded object contours which can be used to separate objects via a multi-histogram scheme. This method does not require identical cameras or frame by frame full stereo reconstruction. It has low computational cost and can be applied to various vision applications that require object segmentation as a first step processing. The experiment results show that the proposed method is able to segment multiple objects despite occlusions.

Experimental Results

Figure 1:  Segmentation results on an image sequence containing dynamic background.

Last update: 19 January 2007