Multi-view video coding

Recent advances in multi-view display technology enable the presentation of multi-view or multi-view video sequences and will soon become a standard feature not only for specialized applications, such as medical imaging, but also for the consumer market. These multi-view video processing systems will shift the definition of video towards a more structured data representation, most probably being a mixture of geometric information and video textures. This requires new approaches for an easy acquisition of multi-view content and new video coding standard for low bit rate communication. In a joint project between Philips Research and the Eindhoven University of Technology, I have participated to the development of an end-to-end multi-view video processing system from multi-view content creation to visualization. The main building-blocks of the developed system include multi-view video acquisition, camera calibration, depth estimation, H.264/MPEG-4 AVC multi-view video-coding and multi-view image rendering.

Multi-view video acquisition and calibration

Multi-view camera A multi-view video stream is typically obtained from a set of synchronised cameras, which are capturing the same scene. Practically, we have implemented a multi-view capturing system based on a set of digital Fire-Wire Sony cameras. The cameras are synchronized using an external trigger and can deliver up to 15 color images per second at a resolution of 1024x768 pixels.


chess board The multi-view camera acquisition system is fully calibrated using a planar chess-board pattern. The calibration software is composed of a C++ implementation of the Zhang's camera calibration algorithm and a real-time corner point detector. The software includes a simple user interface so that calibration of the multi-camera system can be performed within minutes.

Depth-image/depth-map compression

depth map encoded The transmission of multi-view images can be simplified by only transmitting the texture image for the central view and a corresponding depth image or depth map. Additional to the coding of the texture data, this technique requires an efficient compression of depth image. We propose an algorithm that models depth images using piecewise-linear functions (platelets). To adapt to varying scene detail, we employ a quadtree decomposition that divides the image into blocks of variable size, each block being approximated by one platelet. In order to preserve sharp object boundaries, the support area of each platelet is adapted to the object boundary. The subdivision of the quadtree and the selection of the platelet type are optimized such that a global rate-distortion trade-off is realized. Experimental results show that the described method can improve the resulting picture quality after compression of depth images by 1-3 dB when compared to a JPEG-2000 encoder.
Publication:
Platelet-based coding of depth maps for the transmission of multiview images
Yannick Morvan, Dirk Farin and Peter H. N. de With
Proceedings of SPIE, Stereoscopic Displays and Applications
vol. 6055, January 2006, San Jose (CA), USA
Download pdf from TU/e websitepub5


H.264/MPEG-4 AVC multi-view video coding

H.264 coder We also investigated the coding of multi-view images obtained from the multi-camera system. To exploit the inter-view correlation, two view-prediction tools have been implemented and used in parallel: a block-based motion compensation scheme and a Depth Image Based Rendering technique (DIBR). Whereas DIBR relies on an accurate depth image, the block-based motion-compensation scheme can be performed without any geometry information. Our encoder adaptively selects the most appropriate prediction scheme using a rate-distortion criterion for an optimal prediction-mode selection. The attractiveness of the algorithm is that the compression algorithm is robust against inaccurately estimated depth images and requires only one single reference camera for fast random-access to different views. We present experimental results for several multiview sequences, that result in a quality improvement of up to 1.4 dB as compared to H.264/MPEG-4 AVC compression.

Multi-view image rendering

Breakdancers A single texture image and a corresponding depth image are sufficient to synthesize novel views from arbitrary positions. For rendering novel multi-view synthetic images, we have employed the relief texture mapping that we have expressed with an alternative formulation that fits better the camera calibration framework. Using the developed framework, it is possible to render synthetic or virtual video with a bullet time effect as employed in the movie "The Matrix".
See also 3D video coding.

Past projects overview

Implementation of noise filtering algorithms for X-ray images using FPGAs (Philips Medical Systems)

Xilinx System Generator In this project, we conducted a case study on the applicability of the Xilinx System Generator design flow for implementing medical-imaging algorithms on FPGAs. System Generator is an high-level architecture-synthesis tool for designing DSP algorithms using FPGAs. In the analysis, we focus on the implementation of medical-imaging algorithms: multi-resolution decomposition and adaptive noise filters. Experimental results showed that the investigated architecture-synthesis tool constitutes an attractive design flow by dramatically simplifying and shortening the development time.

Real-time object tracking with a low-cost smart camera (Philips Research Eindhoven)

real-time object tracking Reliable object tracking at video speed is a task that requires high performance. We have analyzed the various parts of typical algorithms and found that some parts of program flows are best performed on a parallel platform for performance reasons and other parts are better performed on a sequential processor. Combining the two processing profiles in one platform gives the best of both worlds and enables real time, reliable performance. To prove this concept, we have designed and built a programmable architecture consisting of a massive parallel pixel processor and a sequential DSP. On this platform we mapped the challenging task of high speed object tracking. It was found that offloading the pixel intensive tasks to the parallel processor gives ample processing power to make reliable vision decisions and environment control on the sequential processor. Real-time results are demonstrated for tracking a selected object of interest.
Publication:
Real time object tracking with a low-cost smart camera
Yannick Morvan, Richard Kleihorst, Anteneh Abbo, Harry Broers, Peter Raedts
14th Workshop on Circuits, Integrated systems and Signal Processing
vol. 1 p. 533-538, November 2003, Veldhoven, The Netherlands
Download pdf from TU/e websitepub5 and the corresponding report pdf.