Acivs 2005 Abstracts

This page is regenerated automatically every 60 minutes.

Invited papers

Paper 106: A system approach towards 3D-in-the-box

Author(s): Marc Op de Beeck

As early as the 1920s, TV pioneers dreamed of developing high-definition three-dimensional color TV, as only this would provide the most natural viewing experience. The early black-and-white prototypes have evolved into high-quality color TV, but the hurdle of 3D-TV still remains. From a commercial point of view, 3D-TV can only be introduced successfully, if both 3D content and 3D displays are widely available at the same time. To circumvent this chicken and egg problem, we have designed algorithms that automatically generates depth information for legacy 2D video inside a consumer device.

Information present in 2D video sequences is often incomplete. In general a geometrically correct depth map cannot be reconstructed. However, qualitative depth cues such as focus and motion, are often present. By combining these physical depth cues with image heuristics, qualitative depth maps can be generated. In our system-approach, we have designed a matching lenticular 3D display, that emits multiple views in discrete directions. These views can be rendered out of the existing 2D images and the calculated depth maps.

A real-time PC-based demonstrator of the "3D-in-the-box" system will be shown.

Paper 108: Processing Challenges in Intelligent Video Adaptation

Author(s): Fernando Pereira

Multimedia data and services are nowadays omnipresent in our society. These services, especially those involving communications, engage technology with its associated limitations as well as the human users who also have associated limitations, or more generally speaking characteristics and preferences. In this context, the service goal is typically maximizing 'quality of service' for the available resources or minimizing the required resources for a prescribed quality of service. The growing heterogeneity of networks, terminals and users and the increasing availability and usage of multimedia content have been raising the relevance of content adaptation technologies able to fulfill the needs associated to all usage conditions without multiplying the number of versions available for the same piece of content while simultaneously maximizing user satisfaction.

In a heterogeneous world, the delivery path for multimedia content to a multimedia terminal is not straightforward. The notion of Universal Multimedia Access (UMA) calls for the provision of different presentations of the same information, with more or less complexity, suiting different usage environments (i.e., the context) in which the content will be consumed; for this purpose, multimedia content has to be adapted either off-line or in real-time.

While universal multimedia adaptation is still in its infancy it has already become clear that, as delivery technology evolves, the human factors associated with multimedia consumption assume an increasing importance. In particular, the importance of the user rather than the terminal as the final point in the multimedia consumption chain is becoming clear. We are starting to speak about Universal Multimedia Experiences (UME) which provide the users with adapted, informative (in the sense of cognition), and exciting (in the sense of feelings) experiences. Following the same trends, the notion of 'quality of service' has to evolve to something more encompassing like 'quality of experience' where user satisfaction considers not only the sensorial and perceptual dimensions but also the important emotional dimension.

While the current vision of the video adaptation process sees it mostly conditioned by the resources available, especially in terms of networks and devices, this is not always the case since the maximization of user satisfaction may require some adaptation processing even if there are no resource constraints. Here the driving force for adaptation would not be the 'resource constraints' part of the equation but the 'satisfaction maximization' part of it. Content and usage environment (or context) descriptions are central to video adaptation since they provide information that can control a suitable adaptation process.

This talk will address the processing challenges in advanced, intelligent video adaptation. After analyzing the major motivations for video adaptation, the talk will discuss the major processes involved in video adaptation from content retrieval to transcoding, transmoding and semantic filtering.

Paper 109: High Resolution Images from a Sequence of low Resolution Observations. A Bayesian perspective

Author(s): Rafael Molina

Super resolution of images and video is the research area devoted to the problem of obtaining a high resolution (HR) image or sequences of HR images from a set of low resolution (LR) observations. The LR images are under- sampled and they are acquired either by multiple sensors imaging a single scene or by a single sensor imaging the scene over a period of time

The field of super-resolution processing for uncompressed and compressed low resolution image sequences is surveyed. The introduction of motion vectors, observation noise and additional redundancies within the image sequence make this problem fertile ground for novel processing methods. In conducting this survey though, we develop and present all techniques within the Bayesian framework. This adds consistency to the presentation and facilitates comparison between the different methods.

We first describe the models used in the literature to relate the HR image we want to estimate to the observed LR images. Then we examine the available prior information on the original high resolution image intensities and displacement values. Next we discuss solutions for the super-resolution problem and provide examples of several approaches. Finally, we consider future research directions as well as areas of application.

Regular papers

Paper 112: Gamma-Convergence Approximation to Piecewise Constant Mumford-Shah Segmentation

Author(s): Jianhong Shen

Piecewise constant Mumford-Shah segmentation has been rediscovered by Chan and Vese in the context of region based active contours. The work of Chan and Vese demonstrated many practical applications thanks to their clever numerical implementation using the level-set technology of Osher and Sethian. The current work proposes a Gamma-convergence formulation to the piecewise constant Mumford-Shah model, and demonstrates its simple implementation by the iterated integration of a linear Poisson equation. The new formulation makes unnecessary some intermediate tasks like normal data extension and level-set reinitialization, and thus lowers the computational complexity.

Paper 119: Mirror Symmetry in Perspective

Author(s): Rudi Penne

We present an automatic procedure to reduce pixel noise in the image of a scene that contains a plane of symmetry (mirror). More precisely, we obtain a "homological constraint" for the (perspective) image of coplanar feature points and their mirror reflections. In a first stage the vertex and the axis of the involved homology is computed. Then, by means of a nonlinear numerical optimization, we obtain the most likely correction for the given noisy image data.

Paper 120: Cognition Theory Based Performance Characterization in Computer Vision

Author(s): Aimin Wu, De Xu, Zhaozheng Nie, Xu Yang,

It is very difficult to evaluate the performance of computer vision algorithms at present. We argue that visual cognition theory can be used to challenge this task. In this paper, we first illustrate why and how to use vision cognition theory to evaluate the performance of computer vision algorithms. Then from the perspective of computer science, we summarize some of important assumptions of visual cognition theory. Finally, some cases are introduced to show effectiveness of our methods.

Paper 127: Hidden Markov Model Based 2D Shape Classification

Author(s): Ninad Thakoor, Jean Gao

In this paper, we propose a novel two step shape classification approach consisting of a description and a discrimination phase. In the description phase, curvature features are extracted from the shape and are utilized to build a Hidden Markov Model (HMM). The HMM provides a robust Maximum Likelihood (ML) description of the shape. In the discrimination phase, a weighted likelihood discriminant function is formulated, which weights the likelihoods of curvature at individual points of shape to minimize the classification error. The weighting scheme emulates feature selection procedure in which features important for classification are selected. A Generalized Probabilistic Descent (GPD) method based method for estimation of the weights is proposed. To demonstrate the accuracy of the proposed method, we present classification results achieved for fighter planes in terms of classification accuracy and discriminant functions.

Paper 130: Estimation of Intensity Uncertainties for Computer Vision Applications

Author(s): Alberto Ortiz and Gabriel Oliver

The irradiance measurement performed by vision cameras is not noise-free due to both processing errors during CCD fabrication and the behaviour of the electronic device itself. A proper characterization of sensor performance, however, allows accounting for it within image processing algorithms. This paper proposes a robust algorithm named R2CIU for characterizing the noise sources affecting CCD performance with the aim of estimating the uncertainty of the intensity values yielded by vision cameras. Experimental results can be found at the end of the paper.

Paper 132: A Novel Histogram Based Fuzzy Impulse Noise Restoration Method for Colour Images

Author(s): Stefan Schulte, Valérie De Witte, Mike Nachtegael, Dietrich Van der Weken, Etienne Kerre

In this paper, we present a new restoration technique for colour images. This technique is developed for restoring colour images that are corrupted with impulse noise. The estimated histograms for the colour component differences (red-green, red-blue and green-blue) are used to construct fuzzy sets. Those fuzzy sets are then incorporated in a fuzzy rule based system in order to filter out the impulse noise. Experiments finally show the shortcomings of the conventional methods in contrast to the proposed method.

Paper 136: An Alternative Fuzzy Compactness and Separation Clustering Algorithm

Author(s): Miin-Shen Yang and Hsu-Shen Tsai

This paper presents a fuzzy clustering algorithm, called an alternative fuzzy compactness & separation (AFCS) that is based on an exponential-type distance function. The proposed AFCS algorithm is more robust than the fuzzy c-means (FCM) and the fuzzy compactness & separation (FCS) proposed by Wu et al. (2005). Some numerical experiments are performed to assess the performance of FCM, FCS and AFCS algorithms. Numerical results show that the AFCS has better performance than the FCM and FCS from the robust point of view.

Paper 138: A New Reference Free Approach for the Quality Assessment of MPEG Coded Videos

Author(s): Rémi Barland, Abdelhakim Saadane

Currently, the growing of digital video delivery leads to compress at high ratio, the video sequences. Different coding algorithms like MPEG-4 introduce different artifacts (blocking, blurring, ringing) degrading the perceptual video quality. Such impairments are generally exploited by the No-Reference quality assessment. In this paper, we propose to use the principal distortions introduced by MPEG-4 coding to design a new reference free metric. Using the frequency and space features of each image, a distortion measure for blocking, blurring and ringing effects is computed respectively. On the one hand, the blocking measure and on the other hand, a joint measure of blurring and ringing effects are perceptually validated, which assures the relevance of distortion measures. To produce the final quality score, a new pooling model is also proposed. High correlation between the objective scores of the proposed metric and the subjective assessment ratings has been achieved.

Paper 139: Reduced-Bit, Full Search Block-Matching Algorithms and their Hardware Realizations

Author(s): Vincent M Dwyer, Shahrukh Agha and Vassilios A Chouliaras

The Full Search Block Matching Motion Estimation (FSBME) algorithm is often employed in video coding for its regular dataflow and straightforward architectures. By iterating over all candidates in a defined Search Area of the reference frame, a motion vector is determined for each current frame macroblock by minimizing the Sum of Absolute Differences (SAD) metric. However, the complexity of the method is prohibitively high, amounting to 60 to 80% of the encoder's computational burden, and making it unsuitable for many real time video applications. One means of alleviating the problem is to calculate SAD values using fewer bits (Reduced Bit SAD), however the reduced dynamic range may compromise picture quality. The current work presents an algorithm, which corrects the RBSAD to full resolution under appropriate conditions. Our results demonstrate that the optimal conditions for correction include a knowledge of the motion vectors of neighboring blocks in space and/or time.

Paper 142: Distance and Nearest Neighbor Transforms of Gray-Level Surfaces Using Priority Pixel Queue Algorithm

Author(s): Leena Ikonen, Pekka Toivanen

This article presents a nearest neighbor transform for gray-level surfaces. It is based on the Distance Transform on Curved Space (DTOCS) calculated using an efficient priority pixel queue algorithm. A simple extension of the algorithm produces the nearest neighbor transform simultaneously with the distance map. The transformations can be applied for example to estimate surface roughness.

Paper 147: Reduction of Blocking Artifacts in Block-based Compressed Images

Author(s): G.A. Triantafyllidis, D. Tzovaras, M.G. Strintzis

A novel frequency domain technique for image blocking artifact reduction is presented in this paper. For each block, its DC and AC coefficients are recalculated for artifact reduction. To achieve this, a closed form representation of the optimal correction of the DCT coefficients is produced by minimizing a novel enhanced form of the Mean Squared Difference of Slope (MSDS), for every frequency separately. Experimental results illustrating the performance of the proposed method are presented and evaluated.

Paper 149: Natural Scene Classification and Retrieval Using Ridgelet-Based Image Signatures

Author(s): Herve Le Borgne, Noel O'Connor

This paper deals with knowledge extraction from visual data for content-based image retrieval of natural scenes. Images are analysed using a ridgelet transform that enhances information at different scales, orientations and spatial localizations. The main contribution of this work is to propose a method that reduces the size and the redundancy of this ridgelet representation, by defining both global and local signatures that are specifically designed for semantic classification and content-based retrieval. An effective recognition system can be built when these descriptors are used in conjunction with a support vector machine (SVM). Classification and retrieval experiments are conducted on natural scenes, to demonstrate the effectiveness of the approach.

Paper 150: Do Fuzzy Techniques Offer an Added Value for Noise Reduction in Images?

Author(s): Mike Nachtegael, Stefan Schulte, Dietrich Van der Weken, Valérie De Witte, Etienne E. Kerre

In this paper we discuss an extensive comparative study of 38 different classical and fuzzy filters for noise reduction, both for impulse noise and gaussian noise. The goal of this study is twofold: (1) we want to select the filters that have a very good performance for a specific noise type of a specific strength; (2) we want to find out whether fuzzy filters offer an added value, i.e. whether fuzzy filters outperform classical filters. The first aspect is relevant since large comparative studies did not appear in the literature so far; the second aspect is relevant in the context of the use of fuzzy techniques in image processing in general.

Paper 152: Flow Coherence Diffusion: Linear and Nonlinear Case

Author(s): Terebes Romulus, Olivier Lavialle, Monica Borda, Pierre Baylou

The paper proposes a novel tensor based diffusion filter, dedicated for filtering images composed of line like structures. We propose a linear version of nonlinear diffusion partial derivative equation, previously presented. Instead of considering nonlinearity in the image evolution process we are only including it at the computation of the diffusion tensor. The unique tensor construction is based on an adaptive orientation estimation step and yields a significant reduction of the computational complexity. The properties of the filter are analyzed both theoretically and experimentally.

Paper 155: Multi-Banknote Identification Using a Single Neural Network

Author(s): Adnan Khashman, Boran Sekeroglu

Real-life applications of neural networks require a high degree of success, usability and reliability. Image processing has an importance for both data preparation and human vision to increase the success and reliability of pattern recognition applications. The combination of both image processing and neural networks can provide sufficient and robust solutions to problems where intelligent recognition is required. This paper presents an implementation of neural networks for the recognition of various banknotes. One combined neural network will be trained to recognize all the banknotes of the Turkish Lira and the Cyprus Pound; as they are the main currencies used in Cyprus. The flexibility, usability and reliability of this Intelligent Banknote Identification System (IBIS) will be shown through the results and a comparison will be drawn between using separate neural networks or a combined neural network for each currency.

Paper 158: A New Voting Algorithm for Tracking Human Grasping Gestures

Author(s): Pablo Negri, Xavier Clady, Maurice Milgram

This article deals with a monocular vision system for grasping gesture acquisition. This system could be used for medical diagnostic, robot or game control. We describe a new algorithm, the Chinese Transform, for the segmentation and localization of the fingers. This approach is inspired in the Hough Transform utilizing the position and the orientation of the gradient from the image edge's pixels. Kalman filters are used for gesture tracking. We presents some results obtained from images sequence recording a grasping gesture. These results are in accordance with medical experiments.

Paper 159: Video Pupil Tracking for Iris Based Identification

Author(s): W. Ketchantang, S. Derrode, S. Bourennane and L. Martin

Currently, iris identification systems are not easy to use since they need a strict cooperation of the user during the snapshot acquisition process. Several acquisitions are generally needed to obtain a workable image of the iris for recognition purpose. To make the system more flexible and open to large public applications, we propose to work on the entire sequence acquired by a camera during the enrolment. Hence the recognition step can be applied on a selected number of the ``best representative images'' of the iris within the sequence. In this context, the aim of the paper is to present a method for pupil tracking based on a dynamic Gaussian Mixture Model (GMM) together with Kalman prediction of the pupil position along the sequence. The method has been experimented on a real video sequence captured by a near Infra-Red (IR) sensitive camera and has shown its effectiveness in nearly real time computing.

Paper 160: Three Dimensional Fingertip Tracking in Stereovision

Author(s): Simon Conseil, Salah Bourennane, Lionel Martin

This paper presents a real time estimation method of the three dimensional trajectory of a fingertip. Pointing with the finger is indeed a natural gesture for Human Computer Interaction. Our approach is based on stereoscopic vision, with two standard webcams. The hand is segmented with skin color detection, and the fingertip is detected by the analysis of the curvature of finger boundary. The fingertip tracking is carried out by a three dimensional Kalman filter, in order to improve the detection with a local research, centered on the prediction of the 3-D position, and to filter the trajectory to reduce the estimation error.

Paper 163: Heuristic Algorithm for Computing Fast Template Motion in Video Streams

Author(s): Elena Sánchez-Nielsen, Mario Hernández-Tejera

Many vision problems require computing fast template motion in dynamic scenes. These problems can be formulated as exploration problems and thus can be expressed as a search into a state space based representation approach. However, these problems are hard to solve because they involve search through a high dimensional space. In this paper, we propose a heuristic algorithm through the space of transformations for computing target 2D motion. Three features are combined in order to compute efficient motion: (1) a quality of function match based on a holistic similarity measurement, (2) Kullback-Leibler measure as heuristic to guide the search process and (3) incorporation of target dynamics into the search process for computing the most promising search alternatives. The paper includes experimental evaluations that illustrate the efficiency and suitability for real-time vision based tasks.

Paper 165: A Wavelet Statistical Model for Characterizing Chinese Ink Paintings

Author(s): Xiqun Lu

This paper addresses a wavelet statistical model for characterizing Chinese ink painting styles. The distinct digital profile of an artist is defined as a set of feature-tons and their distribution, which characterize the strokes and stochastic nature of the painting style. Specifically, the feature-tons is modeled by a set of high-order wavelet statistics, and the high-order correlation statistics across scales and orientations, while the feature-ton distribution is represented by a finite mixture of Gaussian models estimated by an unsupervised learning algorithm from multivariate statistical features. To measure the extent of association between an unknown painting and the captured style, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. The research has the potential to provide a computer-aided tool for art historians to study connections among artists or periods in the history of Chinese ink painting art.

Paper 170: A Novel Region-Based Image Retrieval Algorithm Using Selective Visual Attention Model

Author(s): Feng Songhe, Xu De, Yang Xu, Wu Aimin

Selective Visual Attention Model (SVAM) plays an important role in region based image retrieval. In this paper, a robust and accurate method for salient region detection is proposed which integrates SVAM and image segmentation. After that, the concept of salient region adjacency graphs (SRAGs) is introduced for image retrieval. The whole process consists of three levels. First in the pixel level, the salient value of each pixel is calculated using an improved spatial based attention model. Then in the region level, the salient region detection method is presented. Furthermore, in the scene level, salient region adjacency graphs (SRAGs) are introduced to represent the salient groups in the image, which take the salient regions as root nodes. Finally, the constructed SRAGs are used for image retrieval. Experiments show that the proposed method works well.

Paper 171: Region Analysis of Business Card Images Acquired in PDA Using DCT and Information Pixel Density

Author(s): Ick Hoon Jang, Chong Heun Kim, and Nam Chul Kim

In this paper, we present a method of region analysis for business card images acquired in a PDA (personal digital assistant) using DCT and information pixel (IP) density. The proposed method consists of three parts: region segmentation, information region (IR) classification, and character region (CR) classification. In the region segmentation, an input business card image is partitioned into 8x8 blocks and the blocks are classified into information blocks (IBs) and background blocks (BBs) by a normalized DCT energy. The input image is then segmented into IRs and background regions (BRs) by region labeling on the classified blocks. In the IR classification, each IR is classified into CR or picture region (PR) by using a ratio of DCT energy of edges in horizontal and vertical directions to DCT energy of low frequency components and a density of IPs. In the CR classification, each CR is classified into large CR (LCR) or small CR (SCR) by using the density of IPs and an averaged run length of IPs. Experimental results show that the proposed region analysis yields good performance for test images of several types of business cards acquired in a PDA under various surrounding conditions. In addition, error rates of the proposed method are shown to be 2.2-10.1% lower in region segmentation and 7.7% lower in IR classification than those of the conventional methods.

Paper 172: Updating Geospatial Database: an Automatic Approach Combining Photogrammetry and Computer Vision Techniques

Author(s): In-Hak Joo, Tae-Hyun Hwang, and Kyoung-Ho Choi

In this paper, we suggest an automatic approach based on photogrammetry and computer vision techniques to build and update geospatial database more effectively. Stereo image or video is spotlighted as a useful media for constructing and representing geospatial database. We can acquire coordinates of geospatial objects appearing in image frames captured by camera with mobile mapping system, but quite a lot of manual input are required. We suggest a change detection method for geospatial objects in video frames by combining computer vision technique and photogrammetry. With the suggested scheme, we can make the construction and update process more efficient and reduce the update cost of geospatial database.

Paper 173: Design of a Hybrid Object Detection Scheme for Video Sequences

Author(s): Nikolaos Markopoulos, Michalis Zervakis

A method is presented for extracting object information from an image sequence taken by a static monocular camera. The method was developed towards a low computational complexity in order to be used in real-time surveillance applications. Our approach makes use of both intensity and edge information of each frame and works efficiently in an indoor environment. It consists of two major parts: background processing and foreground extraction. The background estimation and updating makes the object detection robust to environment changes like illumination changes and camera jitter. The fusion of intensity and edge information allows a more precise estimation of the position of the different foreground objects in a video sequence. The result obtained are quite reliable, under a variety of environmental conditions.

Paper 174: FIMDA: A Fast Intra-Frame Mode Decision Algorithm for MPEG-2/H.264 Transcoding

Author(s): Gerardo Fernández-Escribano, Pedro Cuenca, Luis Orozco-Barbosa and Antonio Garrido

The H.264 video compression standard provides tools for coding improvements of at least 2 dB, in terms of PSNR, and at least 50% in bit rate savings as compared with MPEG-2 video compression standard. It is expected that the H.264/MPEG-4 AVC will take over the digital video market, replacing the use of MPEG-2 in most digital video applications. The complete migration to the new video-coding algorithm will take several years given the wide scale use of MPEG-2 in the market place today. This creates an important need for transcoding technologies for converting the large volume of existent video material from the MPEG-2 into the H.264 format and vice versa. However, given the significant differences between the MPEG-2 and the H.264 encoding algorithms, the transcoding process of such systems is much more complex to other heterogeneous video transcoding processes. In this paper, we introduce and evaluate two versions of a fast intra-frame mode decision algorithm to be used as part of a high-efficient MPEG-2 to H.264 transcoder. In this work, we utilize an architecture of pixel domain video transcoding but we use the DC coefficient of the MPEG-2 DCT 8x8 blocks. Our evaluation results show that the proposed algorithm considerably reduces the complexity involved in the intra-frame prediction.

Paper 178: Affine Invariant Feature Extraction Using Symmetry

Author(s): Arasanathan Anjulan, Nishan Canagarajah

This paper describes a novel method for extracting affine invariant regions from images, based on an intuitive notion of symmetry. We define a local affine-invariant symmetry measure and derive a technique for obtaining symmetry regions. Compared to previous approaches the regions obtained are considered to be salient regions, of the image. We apply the symmetry-based technique to obtain affine-invariant regions in images with large-scale difference and demonstrate superior performance compared to existing methods.

Paper 181: Image Formation in Highly Turbid Media by Adaptive Fusion of Gated Images

Author(s): Andrzej Sluzek, Ching Seong Tan

A visibility enhancement technique for highly-scattering media (e.g. turbid water) using an adaptive fusion of gated images is proposed. Returning signal profiles produced by gated imaging contain two peaks: the backscattering peak followed by the target-reflected peak. The timing of the backscattering peak is determined by the laser pulse parameters, while the location of second peak depends on the target distance. Thus, a sequence of gated images ranged over a variety of distances can be used to visualize scenes of diversified depths. For each fragment of the scene, the gated image containing the maximum signal strength (after ignoring the backscattering peak) is identified to form the corresponding fragment of the fused image. This unique capability of capturing both visual and depth information can lead to development of fast and robust sensors for vision-guided navigation in extremely difficult conditions.

Paper 182: Impulse Noise Detection Based on Robust Statistics and Genetic Programming

Author(s): Nemanja Petrovic, Vladimir Crnojevic

A new impulse detector design method for image impulse noise is presented. Robust statistics of local pixel neighborhood present features in a binary classification scheme. Classifier is developed through the evolutionary process realized by genetic programming. The proposed filter shows very good results in suppressing both fixed-valued and random-valued impulse noise, for any noise probability, and on all test images.

Paper 186: A Likelihood Ratio Test for Functional MRI Data Analysis to Account for Colored Noise

Author(s): Jan Sijbers, Arnold Jan den Dekker, and Robert Bos

Functional magnetic resonance (fMRI) data are often corrupted with colored noise. To account for this type of noise, many pre-whitening and pre-coloring strategies have been proposed to process the fMRI time series prior to statistical inference. In this paper, a generalized likelihood ratio test for brain activation detection is proposed in which the temporal correlation structure of the noise is modelled as an autoregressive (AR) model. The order of the AR model is determined from practical null data sets (acquired in the absence of activity). The test proposed is based on an exact expression for the likelihood function of the data. Simulation tests reveal that, for a fixed false alarm rate, the proposed test is slightly (2-3%) better than current tests incorporating colored noise in terms of detection rate.

Paper 188: A Fast Method to Detect and Recognize Scaled and Skewed Road Signs

Author(s): Yi-Sheng Liou, Der-Jyh Duh, Shu-Yuan Chen, and Jun-Wei Hsieh

A fast method to detect and recognize scaled and skewed road signs is proposed in this paper. The input color image is first quantized in HSV color model. Border tracing those regions with the same colors as road signs is adopted to find the regions of interest (ROI). Verification is then performed to find those ROIs satisfying specific constraints as road sign candidates. The candidate regions are extracted and normalization is automatically calculated to handle scaled and skewed road signs. Finally, matching based on distance maps is adopted to measure the similarity between the scene and model road signs to accomplish recognition. Experimental results show that the proposed method is effective and efficient, even for scaled and skewed road signs in complicated scenes. On the average, it takes 4--50 and 11 ms for detection and recognition, respectively. Thus, the proposed method is adapted to be implemented in real time.

Paper 192: Configurable Complexity-Bounded Motion Estimation for Real-Time Video Encoding

Author(s): Zhi Yang, Jiajun Bu, Chun Chen, Linjian Mo

Motion estimation (ME) is by far the main bottleneck in real-time video coding applications. In this paper, a configurable complexity-bounded motion estimation (CCBME) algorithm is presented. This algorithm is based on prediction-refinement techniques, which make use of spatial correlation to predict the search center and then use local refinement search to obtain the final motion field. During the search process, the ME complexity is ensured bounded through three configuration schemes: 1) configure the number of predictors; 2) configure the search range of local refinement; 3) configure the subset pattern of matching criterion computation. Different configuration leads to different distortion. Through joint optimization, we obtain a near-optimal complexity-distortion (C-D) curve. Based on the C-D curve, we preserve 6 effective configurable modes to realize the complexity scalability, which can achieve a good tradeoff between ME accuracy and complexity. Experimental results have shown that our proposed CCBME exhibits higher efficiency than some well-known ME algorithms when applied on a wide set of video sequences. At the same time, it possesses the configurable complexity-bounded feature, which can adapt to various devices with a wide range of computational capability for real-time video coding applications.

Paper 193: Skeletonization of Noisy Images via the Method of Legendre Moments

Author(s): Khalid Zenkouar, Hakim El Fadili and Hassan Qjidaa

This paper presents a new concept of skeletonization which produces a graph containing all the topological information needed to derive a skeleton of noisy shapes, the proposed statistical method is based on Legendre moment theory controlled by Maximum Entropy Principle (M.E.P.). We propose a new approach for estimating the underlying probability density function (p.d.f.) of input data set. Indeed the p.d.f. is expanded in terms of Legendre polynomials by means of the Legendre moments. Then the order of the expansion is selected according to the (M.E.P.). The points corresponding to the local maxima of the selected p.d.f. will be true points of the skeleton to be extracted by the proposed algorithm. We have tested the proposed Legendre Moment Skeletonization Method (LMSM) on a variety of real and simulated noisy images, it produces excellent and visually appealing results, with comparison to some well known methods.

Paper 197: Interactive Object-Based Retrieval Using Relevance Feedback

Author(s): Sorin Sav, Hyowon Lee, Noel O'Connor, Alan F. Smeaton

In this paper we present an interactive, object-based video retrieval system which features a novel query formulation method that is used to iteratively refine an underlying model of the search object. As the user continues query composition and browsing of retrieval results, the system's object modeling process, based on Gaussian probability distributions, becomes incrementally more accurate, leading to better search results. To make the interactive process understandable and easy to use, a custom user-interface has been designed and implemented that allows the user to interact with segmented objects in formulating a query, in browsing a search result, and in re-formulating a query by selecting an object in the search result.

Paper 198: Pseudo-Stereo Conversion from 2D Video

Author(s): Yue Feng, Jianmin Jiang

In this paper, we propose a fast and effective pseudo-stereo conversion algorithm to transform the conventional 2D videos into their stereo versions. As conventional 2D videos do not normally have sufficient true depth information for stereo conversion, we explore the principle of extracting the closest disparity to reconstruct the stereo frame pair, where a simple content-based approach is followed. The proposed algorithm features in: (i) original 2D video frame is taken as the reference frame; (ii) closest disparity information is extracted by a texture-based matching inside a library of stereo image pairs; and (iii) the extracted disparity is then used to reconstruct the right video frame to complete the pseudo-stereo conversion. Our experiments show that certain level of stereo effect has been achieved, where all test video clips are publicly available on the Internet for the convenience of repetition of our proposed work.

Paper 199: An Automated Facial Pose Estimation Using Surface Curvature and Tetrahedral Structure of a Nose

Author(s): Ik-Dong Kim, Yeunghak Lee, Jae-Chang Shim

This paper introduces an automated 3D face pose estimation method using the tetrahedral structure of a nose. This method is based on the feature points extracted from a face surface using curvature descriptors. A nose is the most protruding component in a 3D face image. A nose shape that is composed of the feature points such as a nasion, nose tip, nose base, and nose lobes, and is similar to a tetrahedron. Face pose can be estimated by fitting the tetrahedron to the coordinate axes. Each feature point can be localized by curvature descriptors. This method can be established using nasion, nose tip, and nose base. It can be applied to face tracking and face recognition.

Paper 203: A Hybrid Color-Based Foreground Object Detection Method for Automated Marine Surveillance

Author(s): Daniel Socek, Dubravko Culibrk, Oge Marques, Hari Kalva and Borko Furht

This paper proposes a hybrid foreground object detection method suitable for the marine surveillance applications. Our approach combines an existing foreground object detection method with an image color segmentation technique to improve accuracy. The foreground segmentation method employs a Bayesian decision framework, while the color segmentation part is graph-based and relies on the local variation of edges. We also establish the set of requirements any practical marine surveillance algorithm should fulfill, and show that our method conforms to these. Experiments show good results in the domain of marine surveillance sequences.

Paper 204: Lossy Compression of Images with Additive Noise

Author(s): Nikolay Ponomarenko, Vladimir Lukin, Mikhail Zriakhov, Karen Egiazarian, Jaakko Astola

Lossy compression of noise-free and noisy images differs from each other. While in the first case image quality is decreasing with an increase of compression ratio, in the second case coding image quality evaluated with respect to a noise-free image can be improved for some range of compression ratios. This paper is devoted to the problem of lossy compression of noisy images that can take place, e.g., in compression of remote sensing data. The efficiency of several approaches to this problem is studied. Image pre-filtering is shown to be expedient for coded image quality improvement and/or increase of compression ratio. Some recommendations on how to set the compression ratio to provide quasioptimal quality of coded images are given. A novel DCT-based image compression method is briefly described and its performance is compared to JPEG and JPEG2000 with application to lossy noisy image coding.

Paper 205: Morse Connections Graph for Shape Representation

Author(s): David Corriveau, Madjid Allili, and Djemel Ziou

We present an algorithm for constructing efficient topological shape descriptors of three dimensional objects. Given a smooth surface S and a Morse function f defined on S, our algorithm encodes the relationship among the critical points of the function f by means of a connection graph, called the Morse Connections Graph, whose nodes represent the critical points of f. Two nodes are related by an edge if a connection is established between them. This graph structure is extremely suitable for shape comparison and shape matching and inherits the invariant properties of the given Morse function f.

Paper 206: Gender Classification in Human Gait Using Support Vector Machine

Author(s): Jang-Hee Yoo, Doosung Hwang, and Mark S. Nixon

We describe an automated system that classifies gender by utilising a set of human gait data. The gender classification system consists of three stages: i) detection and extraction of the moving human body and its contour from image sequences; ii) extraction of human gait signature by the joint angles and body points; and iii) motion analysis and feature extraction for classifying gender in the gait patterns. A sequential set of 2D stick figures is used to represent the gait signature that is primitive data for the feature generation based on motion parameters. Then, an SVM classifier is used to classify gender in the gait patterns. In experiments, higher gender classification performances, which are 96% for 100 subjects, have been achieved on a considerably larger database.

Paper 208: Background Modeling Using Color, Disparity, and Motion Information

Author(s): Jong Weon Lee, Hyo Sung Jeon, Sung Min Moon and Sung W. Baik

A new background modeling approach is presented in this paper. In most background modeling approaches, input images are categorized into foreground and background regions using pixel-based operations. Because pixels on the input image are considered individually, parts of foreground regions are frequently turned into the background, and these errors cause incorrect foreground detections. The proposed approach reduces these errors and improves the accuracy of a background modeling. Each input image is categorized into three regions in the proposed approach instead of two regions, background and foreground regions. The proposed approach divides traditional foreground regions into two sub-regions, intermediate background and foreground regions, using activity measurements computed from optical flows at each pixel. The other difference of the proposed approach is grouping pixels into objects and using those objects at the background updating procedure. Pixels on each object are turned into the background at the same rate. The rate of each object is computed differently depending on its category. By controlling the rate of turning input pixels into the background accurately, the proposed approach can model the background accurately.

Paper 212: Multistage Face Recognition Using Adaptive Feature Selection and Classification

Author(s): Fei Zuo, Peter H. N. de With and Michiel van der Veen

In this paper, we propose a cascaded face-identification framework for enhanced recognition performance. During each stage, the classification is dynamically optimized to discriminate a set of promising candidates selected from the previous stage, thereby incrementally increasing the overall discriminating performance. To ensure improved performance, the base classifier at each stage should satisfy two key properties: (1) adaptivity to specific populations, and (2) high training and identification efficiency such that dynamic training can be performed for each test case. To this end, we adopt a base classifier with (1) dynamic person-specific feature selection, and (2) voting of an ensemble of simple classifiers based on selected features. Our experiments show that the cascaded framework effectively improves the face recognition rate by up to 5% compared to a single stage algorithm, and it is 2-3% better than established well-known face recognition algorithms.

Paper 213: Fast Face Detection Using a Cascade of Neural Network Ensembles

Author(s): Fei Zuo and Peter H. N. de With

We propose a (near) real-time face detector using a cascade of neural network (NN) ensembles for enhanced detection accuracy and efficiency. First, we form a coordinated NN ensemble by sequentially training a set of neural networks with the same topology. The training implicitly partitions the face space into a number of disjoint regions, and each NN is specialized in a specific sub-region. Second, to reduce the total computation cost for the face detection, a series of NN ensembles are cascaded by increasing complexity of base networks. Simpler NN ensembles are used at earlier stages in the cascade, which are able to reject a majority of non-face patterns in the backgrounds. Our proposed approach achieves up to 94% detection rate on the CMU+MIT test set, a 98% detection rate on a set of video sequences and 3-4 frames/sec. detection speed on a normal PC (P-IV, 3.0GHz).

Paper 214: Latency Insensitive Task Scheduling for Real-Time Video Processing and Streaming

Author(s): Richard Y. D. Xu, Jesse S. Jin

In recent times, computer vision and pattern recognition (CVPR) techniques made possible for automatic feature extraction, events detection in real-time on-the-fly video processing and streaming. However, these multiple and computational expensive video processing tasks require specialized processors to ensure its higher frame rate output. We propose a framework for achieving high video frame rate using a single processor high-end PC while multiple, computational video tasks such as background subtraction, object tracking, recognition and facial localization have been performed simultaneously. We show the framework in detail, illustrating our unique scheduler using latency insensitive tasks distribution and the execution content parameters generation function (PGF). The experiments have indicated successful results using high end consumer type PC.

Paper 216: Fuzzy Linguistic Rules Classifier for wooden board color sorting

Author(s): Emmanuel Schmitt, Vincent Bombardier and Raphaël Vogrig

This article exposes wood pieces classification method according to their color. The main difficulties encountered by the Company are primarily in the color recognition according to a certain graduality, and the decision to take on all the board with the different sides. These problems imply the use of flexible/robust model and the use of an "intelligent" information management delivered by the sensors. In order to improve the current system, we propose to integrate a method, whose principle is a fuzzy inference system, itself built thanks to fuzzy linguistic rules. The results obtained with our method show a real improvement of the recognition rate compared to a bayesian classifier already used by the Company.

Paper 218: Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures

Author(s): Hamed Fatemi, Henk Corporaal, Twan Basten, Richard Kleihorst and Pieter Jonker

Image processing is widely used in many applications, including medical imaging, industrial manufacturing and security systems. In these applications, the size of the image is often very large, the processing time should be very small and the real-time constraints should be met. Therefore, during the last decades, there has been an increasing demand to exploit parallelism in applications. It is possible to explore parallelism along three axes: data-level parallelism (DLP), instruction-level parallelism (ILP) and task-level parallelism (TLP).

This paper explores the limitations and bottlenecks of increasing support for parallelism along the DLP and ILP axes in isolation and in combination. To scrutinize the effect of DLP and ILP in our architecture (template), an area model based on the number of ALUs (ILP) and the number of processing elements (DLP) in the template is defined, as well as a performance model. Based on these models and template, a set of kernels of image processing applications has been studied to find Pareto optimal architectures in terms of area and number of cycles via multi-objective optimization.

Paper 220: On the Performance Improvement of Sub-Sampling MPEG-2 Motion Estimation Algorithms with Vector/SIMD Architectures

Author(s): Vassilios A. Chouliaras, Vincent M. Dwyer, Sharukh Agha

The performance improvement of a number of motion estimation algorithms are evaluated following vector/SIMD instruction set extensions for MPEG-2 TM5 video encoding. Simulation-based results indicate a substantial complexity metric reduction for Full-Search, Three Step Search, Four Step Search and Diamond Search making the later three appropriate for execution on a high performance embedded VLSI platform. A simple model is developed to explain the simulated results, and a compound performance/power metric, the complexity-power-product (CPP), is proposed for algorithmic optimisation in vectorized applications for low-power, consumer devices. The performance improvement of a number of motion estimation algorithms are evaluated following vector/SIMD instruction set extensions for MPEG-2 TM5 video encoding. Simulation-based results indicate a substantial complexity metric reduction for Full-Search, Three Step Search, Four Step Search and Diamond Search making the later three appropriate for execution on a high performance embedded VLSI platform. A simple model is developed to explain the simulated results, and a compound performance/power metric, the complexity-power-product (CPP), is proposed for algorithmic optimisation in vectorized applications for low-power, consumer devices.

Paper 221: Image Registration Using Uncertainty Transformations

Author(s): Kristof Teelen and Peter Veelaert

In this work we introduce a new technique for a frequently encountered problem in computer vision: image registration. The registration is computed by matching features, points and lines, in the reference image to their corresponding features in the test image among many candidate matches. Convex polygons are used to captivate the uncertainty of the transformation from the reference features to uncertainty regions in the test image in which candidate matches are to be found. We present a simple and robust method to check the consistency of the uncertainty transformation for all possible matches and construct a consistency graph. The distinction between the good matches and the rest can be computed from the information of the consistency graph. Once the good matches are determined, the registration transformation can be easily computed.

Paper 222: 3DSVHT: Extraction of 3D Linear Motion via Multi-View, Temporal Evidence Accumulation

Author(s): Jose Artolazábal, John Illingworth

Shape recognition and motion estimation are two of the most difficult problems in computer vision, especially for arbitrary shapes undergoing severe occlusion. Much work has concentrated on tracking over short temporal scales and the analysis of 2D image-plane motion from a single camera. In contrast, in this paper we consider the global analysis of extended stereo image sequences and the extraction of specified objects undergoing linear motion in full 3D. We present a novel Hough Transform based algorithm that exploits both stereo geometry constraints and the invariance properties of the cross-ratio to accumulate evidence for a specified shape undergoing 3D linear motion (constant velocity or otherwise). The method significantly extends some of the ideas originally developed in the Velocity Hough Transform, VHT, where detection was limited to 2D image motion models. We call our method the 3D Stereo Velocity Hough Transform, 3DSVHT. We demonstrate 3DSVHT on both synthetic and real imagery and show that it is capable of detecting objects undergoing linear motion with large depth variation and in image sequences where there is significant object occlusion.

Paper 224: Road Markings Detection and Tracking Using Hough Transform and Kalman Filter

Author(s): Vincent Voisin, Manuel Avila, Bruno Emile, Stephane Begot, Jean-Christophe Bardet

A lane marking tracking method using Hough Transform and Kalman Filtering is presented. Since the HT is a global feature extraction algorithm, it leads to a robust detection relative to noise or partial occlusion. The Kalman filter is used to track the roadsides which are detected in the image by this HT. The Kalman prediction step leads to predict the road marking parameters in the next frame, so we can apply the detection algorithm in smaller regions of interest, the computional cost is being consequently reduced.

Paper 225: A Fully Unsupervised Image Segmentation Algorithm Based on Wavelet Domain Hidden Markov Tree Models

Author(s): Qiang Sun, Yuheng Sha, Xinbo Gao, Biao Hou, and Licheng Jiao

A fully unsupervised image segmentation algorithm is presented in this paper, in which wavelet domain hidden Markov tree model is exploited together with the cluster analysis and validity techniques. The true number of textures in a given image is determined by calculating the likelihood disparity of textures using the modified partition fuzzy degree (MPFD) function at one suitable scale. Then, possibilistic C means (PCM) clustering is performed to determine the training sample data from different textures according to the true number of textures obtained. The unsupervised segmentation is changed into supervised one, and the HMTseg algorithm is used to achieve the final segmentation results. This algorithm is applied to segment a variety of composite texture images into distinct homogeneous regions and good segmentation results are reported.

Paper 227: Use of Human Motion Biometrics for Multiple-View Registration

Author(s): László Havasi, Zoltán Szlávik, Tamás Szirányi

A novel image-registration method is presented which is applicable to multi-camera systems viewing human subjects in motion. The method is suitable for use with indoor or outdoor surveillance scenes. The paper summarizes an efficient walk-detection and biometric method for extraction of image characteristics which enables the walk properties of the viewed subjects to be used to establish corresponding image-points for the purpose of image-registration between cameras. The leading leg of the walking subject is a good feature to match, and the presented method can identify this from two successive walksteps (one walk cycle). Using this approach, the described method can detect a sufficient number of corresponding points for the estimation of correspondence between views from two cameras. An evaluation study has demonstrated the method?s feasibility in the context of an actual indoor real-time surveillance system.

Paper 229: A Clustering Approach for Color Image Segmentation

Author(s): F. Hachouf, N. Mezhoud

Abstract. This paper describes a clustering approach for color image segmentation using fuzzy classification principles. The method uses classification to group pixels into homogeneous regions. Both global and local information are taken into account. This is particularly helpful in taking care of small objects and local variation of color images. Color, mean and standard deviation are used as a data source. The classification is achieved by a new version of self-organizing maps algorithm . This new algorithm is equivalent to classic fuzzy C-mean algorithm (FCM) whose objective function has been modified. Code vectors that constitute centers of classes, are distributed on a regular low dimension grid. In addition, a penalization term is added to guarantee a smooth distribution of the values of the code vectors on the grid. Tests achieved on color images, followed by an automatic evaluation revealed the good performances of the proposed method .

Paper 230: The Hough Transform Application Including its Hardware Implementation

Author(s): Witold Zorski

This paper presents an application of the Hough transform to the tasks of identifying irregular patterns. The presented method is based on the Hough transform for irregular objects, with a parameter space defined by translation, rotation and scaling operations. The technique may be used in a robotic system, identification system or for image analysis, directly on grey-level images. An example application of the Hough transform to a robot monitoring within computer vision systems is presented. A hardware implementation of the Hough technique is introduced which accelerates the calculations considerably.

Paper 236: Image Pattern Recognition with Separable Trade-off Correlation Filters

Author(s): César San Martín, Asticio Vargas, Juan Campos and Sergio Torres

In this paper, a method to design separable trade-off correlation filters for optical pattern recognition is developed. The proposed method not only is able to include the information about de desirable peak correlation value but also is able to minimize both the average correlation energy and the effect of additive noise on the correlation output. These optimization criteria are achieved by employing multiple training objects. The main advantage of the method is based on using multiple information for improving the optical pattern recognition work on images with various objects. The separable Trade-off filter is experimentally tested by using both digital and optical pattern recognition.

Paper 237: A Bayesian Approach for Weighting Boundary and Region Information for Segmentation

Author(s): Mohand Saïd Allili, Djemel Ziou

Variational image segmentation combining boundary and region information was and still is the subject of many recent works. This combination is usually subject to arbitrary weighting parameters that control the boundary and region features contribution during the segmentation process. However, since the objective functions of the boundary and the region features is different in nature their arbitrary combination may conduct to local conflicts, that stem principally from abrupt illumination changes or the presence of texture inside the regions. In the present paper, we investigate an adaptive estimation of the weighting parameters (hyper-parameters) on the regions data during the segmentation by using a Bayesian method. This permits to give adequate contributions of the boundary and region features to segmentation decision making for pixels and, therefore, improving the accuracy of the localization of region boundaries. We validated the approach on examples of real world image segmentation.

Paper 240: Approximation of Linear Discriminant Analysis for Word Dependent Visual Features Selection

Author(s): Hervé Glotin, Sabrina Tollari and Pascale Giraudet

To automatically determine a set of keywords that describes the content of a given image is a difficult problem, because of (i) the huge dimension number of the visual space and (ii) the unsolved object segmentation problem. Therefore, in order to solve matter (i), we present a novel method based on an Approximation of Linear Discriminant Analysis (ALDA) from the theoretical and practical point of view. Application of ALDA is more generic than usual LDA because it doesn't require explicit class labelling of each training sample, and however allows efficient estimation of the visual features discrimination power. This is particularly interesting because of (ii) and the expensive manually object segmentation and labelling tasks on large visual database. In first step of ALDA, for each word W, the train set is split in two, according if images are labelled or not by W. Then, under weak assumptions, we show theoretically that Between and Within variances of these two sets are giving good estimates of the best discriminative features for W. Experimentations are conducted on COREL database, showing an efficient word adaptive feature selection, and a great enhancement (+37%) of an image Hierarchical Ascendant Classification (HAC) for which ALDA saves also computational cost reducing by 90% the visual features space.

Paper 242: Dynamic Pursuit With a Bio-Inspired Neural Model

Author(s): Claudio Castellanos Sánchez and Bernard Girau

In this paper we present a bio-inspired connectionist model for visual perception of motion and its pursuit. It is organized in three stages: a causal spatio-temporal filtering of Gabor-like type, an antagonist inhibition mechanism and a densely interconnected neural population. These stages are inspired by the treatment of the primary visual cortex, middle temporal area and superior visual areas. This model has been evaluated on natural image sequences.

Paper 245: A Fast Sequential Rainfalling Watershed Segmentation Algorithm

Author(s): Johan De Bock, Patrick De Smet, and Wilfried Philips

In this paper we present a new implementation of a rainfalling watershed segmentation algorithm. Our previous algorithm was a one-run algorithm. All the steps needed to compute a complete watershed segmentation were done in one run over the input data. In our new algorithm we tried another approach. We separated the watershed algorithm in several low-complexity relabeling steps that can be performed sequentially on a label image. The new implementation is approximately two times faster for parameters that produce visually good segmentations. The new algorithm also handles plateaus in a better way. First we describe the general layout of a rainfalling watershed algorithm. Then we explain the implementations of the two algorithms. Finally we give a detailed report on the timings of the two algorithms for different parameters.

Paper 246: A New Rate-Distortion Optimization Using Structural Information in H.264 I-Frame Encoder

Author(s): Zhi-Yi Mai, Chun-Ling Yang, Lai-Man Po, Sheng-Li Xie

Rate distortion optimization is the key technique in video coding standards to efficiently determine a set of coding parameters. In the RD optimization for H.264 I frame encoder, the distortion (D) is measured as the sum of the squared differences (SSD) between the reconstructed and the original blocks, which is same as MSE. Recently, a new image measurement called Structural Similarity (SSIM) based on the degradation of structural information was brought forward. It is proved that the SSIM can provide a better approximation to the perceived image distortion than the currently used PSNR (or MSE). In this paper, a new rate distortion optimization for H.264 I frame encoder using SSIM as the distortion metric is proposed. Experiment results show that the proposed algorithm can reduced 2.2~6.45% bit rate while maintaining the perceptual quality.

Paper 250: Affine Coregistration of Diffusion Tensor Magnetic Resonance Images Using Mutual Information

Author(s): Alexander Leemans, Jan Sijbers, Steve De Backer, Evert Van der Vliet, and Paul Parizel

In this paper, we present an affine image coregistration technique for Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) data sets based on mutual information. A multi-channel approach has been developed where the diffusion weighted images are aligned according to the corresponding acquisition gradient directions. Also, in addition to the coregistration of the DT-MRI data sets, an appropriate reorientation of the diffusion tensor is worked out in order to remain consistent with the corresponding underlying anatomical structures. This reorientation strategy is determined from the spatial transformation while preserving the diffusion tensor shape. This entropy-based method is fully automatic and has the advantage to be independent of the applied diffusion framework.

Paper 252: Scene-Cut Processing in Motion-Compensated Temporal Filtering

Author(s): Maria Trocan, Béatrice Pesquet-Popescu

Motion-compensated temporal filtering (MCTF) is a powerful technique entering scalable video coding schemes. However, its performance decreases significantly if the video signal correlation is poor and, in particular, when scene-cuts occur. In this paper we propose an improved structure for MCTF by detecting and processing the scene-cuts that may appear in video sequences. It significantly reduces the ghosting artefacts in the temporal approximation subband frames, providing a higher quality temporal scalability, and dramatically improves the global coding efficiency when such abrupt transitions happen.

Paper 254: Fast Mode Decision and Motion Estimation with Object Segmentation in H.264/AVC Encoding

Author(s): Marcos Nieto, Luis Salgado, and Julián Cabrera

In this paper we present a new and complete scheme of mode decision and motion estimation compliant with H.264/AVC encoding and decoding systems based on a moving object segmentation. It is particularly suited for applications, like surveillance, where a moving object segmentation is available. The knowledge of the moving object areas and background areas allows to reduce the set of modes permitted by the standard. This sub-set is selected in order to obtain a more accurately motion estimation for active objects and less intensive for quasi-static background. The number of comparisons needed to find the best motion vectors is reduced, conforming an encoding process simple and fast, compliant with the real time requirements of a surveillance application. An improved prediction of the motion vector is computed based on the result of the object segmentation. This avoids erroneous predictions to be carried out. Results will show that the number of comparisons needed to perform inter prediction is reduced by 60%-70% depending on the sequence, keeping the same image quality and bit-rate obtained without using segmentation.

Paper 255: BISK Scheme Applied to Sign Encoding and to Magnitude Refinement

Author(s): Maria Bras-Amorós, Pere Guitart-Colom, Jorge González-Conejero, Joan Serra-Sagristà, Fernando García-Vílchez

A shape-adaptive search is defined based on the BISK scheme and it is applied to sign encoding and magnitude refinement of images. It can be generalized to a complete bitplane encoder whose performance is comparable to that of other state-of-the-art encoders.

Paper 256: Multi-Object Digital Auto-Focusing Using Image Fusion

Author(s): Jeongho Shin, Vivek Maik, Jungsoo Lee, and Joonki Paik

This paper proposes a novel digital auto-focusing algorithm using image fusion, which restores an out-of-focus image with multiple, differently out-of-focus objects. The proposed auto-focusing algorithm consists of (i) building a prior set of point spread functions (PSFs), (ii) image restoration, and (iii) fusion of the restored images. Instead of designing an image restoration filter for multi-object auto-focusing, we propose an image fusion-based auto-focusing algorithm by fusing multiple, restored images based on prior estimated set of PSFs. The prior estimated PSFs overcome heavy computational overhead and make the algorithm suitable for real-time applications. By utilizing both redundant and complementary information provided by different images, the proposed fusion algorithm can restore images with multiple, out-of-focus objects. Experimental results show the performance of the proposed auto-focusing algorithm.

Paper 260: Moving Object Segmentation Based on Automatic Foreground / Background Identification of Static Elements

Author(s): Laurent Isenegger, Luis Salgado, Narciso García

A new segmentation strategy is proposed to precisely extract moving objects in video sequences. It is based on the automatic detection of the static elements, and its classification as background and foreground using static differences and contextual information. Additionally, tracking information is incorporated to reduce the computational cost. Finally, segmentation is refined through a Markov random field (MRF) change detection analysis including the foreground information, which allows improving the accuracy of the segmentation. This strategy is presented in the context of low quality sequences of surveillance applications but it could be applied to other applications, the only requirement being to have a static or quasi static background.

Paper 261: Optimum Design of Dynamic Parameters of Active Contours for Improving their Convergence in Image Segmentation

Author(s): Rafael Verdú, Juan Morales, Rafael Berenguer, and Luis Weruaga

Active contours are useful tools for segmenting images. The classical formulation is given in the spatial domain and is based on a second order system. The formulation based on a frequency-domain analysis offers a new perspective for studying the convergence of the snake. This paper addresses an analysis and optimization for a snake-based segmentation algorithm. The study allows us to choose optimum values of the system dynamic parameters in the design of the active contour for improving its speed of convergence in a segmentation problem.

Paper 268: Selective Color Edge Detector Based on a Neural Classifier

Author(s): Horacio M. González-Velasco, Carlos J. García-Orellana, Miguel Macías-Macías, Ramón Gallardo-Caballero

Conventional edge detectors are not very useful for generating an edge map to be used in the search of a concrete object with deformable models or genetic algorithms. In this work, a selective color edge detector is presented, which is able to obtain the edges in the image and determine whether or not those edges are originated in a concrete object. The system is based on a multilayer perceptron neural network, which classifies the edges previously detected by the multidimensional gradient (color images), and is trained using some images of the searched object whose edges are known. The method has been successfully applied to bovine livestock images, obtaining edge maps to be used for a boundary extraction with genetic algorithms technique.

Paper 269: Entropy Reduction of Foveated DCT Images

Author(s): Giovanni Iacovoni, Salvatore Morsa, Alessandro Neri

This contribution addressees the problem of the theoretical assessment of the bit rate reduction that can be achieved through foveated image compression for codecs operating in the DCT domain. Modeling the image components as Compound Gaussian Random Fields (CGRFs), we extend the mathematical analysis of the DCT coefficient distributions reported in [1] to foveated images. As a general result, we demonstrate that the DCT coefficients of low pass filtered image blocks can be effectively modelled with Laplacian distributions. This property allows us to express the Shannon rate reduction achievable with foveated compression in a simple and compact form as function of the foveal filter coefficients. Experiments results used to validate the theoretical analysis are also included.

Paper 273: Majority Ordering and the Morphological Pattern Spectrum

Author(s): Alessandro Ledda and Wilfried Philips

Binary and grayscale mathematical morphology have many applications in different area. On the other hand, colour morphology is not widespread. The reason is the lack of a unique ordering of colour that makes the extension of grayscale morphology to colour images not straightforward.

We will introduce a new majority sorting scheme that can be applied on binary, grayscale and colour images. It is based on the area of each colour or grayscale present in the image, and has the advantage of being independent of the values of the colours or grayvalues.

We will take a closer look at the morphological pattern spectrum and will show the possible differences of the morphological pattern spectrum on colour images with the grayscale image pattern spectrum.

Paper 274: A Restoration and Segmentation Unit for the Historic Persian Documents

Author(s): Shahpour Alirezaee, Alireza Shayesteh Fard, Hassan Aghaeinia and Karim Faez

This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.

Paper 277: A Quantitative Criterion to Evaluate Color Segmentations. Application to Cytological Images

Author(s): Estelle Glory, Vannary Meas-Yedid, Christian Pinset, Jean-Christophe Olivo-Marin and Georges Stamon

Evaluation of segmentation is a non-trivial task and most often, is carried out by visual inspection for a qualitative validation. Until now, only a small number of objective and parameter-free criteria have been proposed to automatically assess the segmentation of color images. Moreover, existing criteria generally produce incorrect results on cytological images because they give an advantage to segmentations with a limited number of regions. Therefore, this paper suggests a new formulation based on two normalized terms which control the number of small regions and the color heterogeneity. This new criterion is applied to find an algorithm parameter to segment biological images.

Paper 278: Flexible Storage of Still Images with a Perceptual Quality Criterion

Author(s): Vincent Ricordel, Patrick Le Callet, Mathieu Carnec and Benoit Parrein

The purpose of the paper is to introduce a new method for flexible storage of still images. The complete design of the system is described with the scalable encoding, the distortion computation, the bits allocation strategy, and the method for the memory management. The main improvement is the full exploitation of a perceptual metric to assess precisely the introduced distortion when removing a layer in a scalable coding stream. Experimental results are given and compared with a system which uses the PSNR as distortion metric.

Paper 281: Image De-quantizing via Enforcing Sparseness In Overcomplete Representations

Author(s): Luis Mancera, Javier Portilla

We describe a method for removing quantization artifacts (de-quantizing) in the image domain, by enforcing a high degree of sparseness in its representation with an overcomplete oriented pyramid. For that purpose we devise a linear operator that returns the minimum L2-norm image preserving a set of significant coefficients, and estimate the original by minimizing the cardinality of that subset, always ensuring that the result is compatible with the quantized observation. We implement this solution by alternated projections onto convex sets, and test it through simulation with a set of standard images. Results are highly satisfactory in terms of performance, robustness and efficiency.

Paper 282: Non-Rigid Tracking Using 2-D Meshes

Author(s): Pascaline Parisot, Vincent Charvillat, and Géraldine Morin

Mesh motion estimation is a tracking technique useful in particular for low bitrate compression, object based coding and virtual views synthesis. In this article we present a new triangular mesh tracking algorithm preserving the mesh connectivity. Our method generalizes the rigid template tracking method proposed by Jurie and Dhome to the case of non rigid objects. Thanks to a learning step that can be done off line for a given number of nodes, the tracking step can be performed in real time.

Paper 283: Cleaning and Enhancing Historical Document Images

Author(s): Ergina Kavallieratou, Hera Antonopoulou

In this paper we present a recursive algorithm for the cleaning and the enhancing of historical documents. Most of the algorithms, used to clean and enhance documents or transform them to binary images, implement combinations of complicated image processing techniques which increase the computational cost and complexity. Our algorithm simplifies the procedure by taking into account special characteristics of the document images. Moreover, the fact that the algorithm consists of iterated steps, makes it more flexible concerning the needs of the user. At the experimental results, comparison with other methods is provided.

Paper 285: An Offline Bidirectional Tracking Scheme

Author(s): Tom Caljon, Valentin Enescu, Peter Schelkens and Hichem Sahli

A generic bidirectional scheme is proposed that robustifies the estimation of the maximum a posteriori (MAP) sequence of states of a visual object. It enables creative, non technical users to obtain the path of interesting objects in offline available video material, which can then be used to create interactive movies. To robustify against tracker failure the proposed scheme merges the filtering distributions of a forward tracking particle filter and a backward tracking particle filter at some timesteps, using a reliability based voting scheme such as in democratic integration. The MAP state sequence is obtained using the Viterbi algorithm on reduced state sets per timestep derived from the merged distributions and is interpolated linearly where tracking failure is suspected. The presented scheme is generic, simple and efficient and shows good results for a color based particle filter.

Paper 286: Real Time Tracking of Multiple Persons on Colour Image Sequences

Author(s): Ghiles Mostafaoui, Catherine Achard, Maurice Milgram

We propose a real time algorithm to track moving persons without any a priori knowledge neither on the model of person, nor on their size or their number, which can evolve with time. It manages several problems such as occlusion and under or over-segmentations. The first step consisting in motion detection, leads to regions that have to be assigned to trajectories. This tracking step is achieved using a new concept: elementary tracks. They allow on the one hand to manage the tracking and on the other hand, to detect the output of occlusion by introducing coherent sets of regions. Those sets enable to define temporal kinematical model, shape model or colour model. Significant results have been obtained on several sequences with ground truth as shown in results.

Paper 288: Image Indexing by Focus Map

Author(s): Levente Kovács, Tamás Szirányi

Content-based indexing and retrieval (CBIR) of still and motion picture databases is an area of ever increasing attention. In this paper we present a method for still image information extraction, which in itself provides a somewhat higher level of features and also can serve as a basis for high level, i.e. semantic, image feature extraction and understanding. In our proposed method we use blind deconvolution for image area classification by interest regions, which is a novel use of the technique. We prove its viability for such and similar use.

Paper 290: Recovering the Shape from the Texture Using Logpolar Filters

Author(s): Corentin Massot, Jeanny Herault

How does the visual cortex extract perspective information from textured surfaces? To answer this question, we propose a biologically plausible algorithm based on a simplified model of the visual processing. First, new log-normal filters are presented in replacement of the classical Gabor filters. Particularly, these filters are separable in frequency and orientation and this characteristic is used to derive a robust method to estimate the local mean frequency in the image. Based on this new approach, a local decomposition of the image into patches, after a retinal pre-treatment, leads to the estimation of the local frequency variation all over the surface. The analytical relation between the local frequency and the geometrical parameters of the surface, under perspective projection, is derived and finally allows to solve the so-called problem of recovering the shape from the texture. The accuracy of the method is evaluated and discussed on different kind of textures, both regular and irregular, and also on natural scenes.

Paper 291: A Dynamic Bayesian Network-Based Framework for Visual Tracking

Author(s): Hang-Bong Kang and Sang-Hyun Cho

In this paper, we propose a new tracking method based on dynamic Bayesian network. Dynamic Bayesian network provides a unified probabilistic framework in integrating multi-modalities by using a graphical representation of the dynamic systems. For visual tracking, we adopt a dynamic Bayesian network to fuse multi-modal features and to handle various appearance target models. We extend this framework to multiple camera environments to deal with severe occlusions of the object of interest. The proposed method was evaluated under several real situations and promising results were obtained.

Paper 292: Affine Normalization of Symmetric Objects

Author(s): Tomas Suk, Jan Flusser

A new method of normalization is used for the construction of the affine moment invariants. The affine transform is decomposed into translation, scaling, stretching, two rotations and mirror reflection. The object is successively normalized to these elementary transforms by means of low order moments. After normalization, other moments of normalized object can be used as affine invariant features of the original object. We pay special attention to the normalization of symmetric objects.

Paper 293: Object Recognition Using Local Characterisation and Zernike Moments

Author(s): A. Choksuriwong, H. Laurent, C. Rosenberger, C. Maaoui

Even if lots of object invariant descriptors have been proposed in the literature, putting them into practice in order to obtain a robust system face to several perturbations is still a studied problem. Comparative studies between the most commonly used descriptors put into obviousness the invariance of Zernike moments for simple geometric transformations and their ability to discriminate objects. Whatever, these moments can reveal themselves insufficiently robust face to perturbations such as partial object occultation or presence of a complex background. In order to improve the system performances, we propose in this article to combine the use of Zernike descriptors with a local approach based on the detection of image points of interest. We present in this paper the Zernike invariant moments, Harris keypoint detector and the supervised classification method. Experimental results present the contribution of the local approach face to the global one in the second part of this article.

Paper 294: Video Denoising Algorithm in Sliding 3D DCT Domain

Author(s): Dmytro Rusanovskyy and Karen Egiazarian

The problem of denoising of video signals corrupted by additive Gaussian noise is considered in this paper. A novel 3D DCT-based video-denoising algorithm is proposed. Video data are locally filtered in sliding/running 3D windows (arrays) consisting of highly correlated spatial layers taken from consecutive frames of video. Their selection is done by the use of a block matching or similar techniques. Denoising in local windows is performed by a hard thresholding of 3D DCT coefficients of each 3D array. Final estimates of reconstructed pixels are obtained by a weighted average of the local estimates from all overlapping windows. Experimental results show that the proposed algorithm provides a competitive performance with state-of-the-art video denoising methods both in terms of PSNR and visual quality.

Paper 295: FPGA Design and Implementation of a Wavelet-Domain Video Denoising System

Author(s): Mihajlo Katona, Aleksandra Pizurica, Nikola Teslic, Vladimir Kovacevic, and Wilfried Philips

Multiresolution video denoising is becoming an increasingly popular research topic over recent years. Although several wavelet based algorithms reportedly outperform classical single-resolution approaches, their concepts are often considered as prohibitive for real-time processing. Little research has been done so far towards hardware customization of wavelet domain video denoising. A number of recent works have addressed the implementation of critically sampled orthogonal wavelet transforms and the related image compression schemes in Field Programmable Gate Arrays (FPGA). However, the existing literature on FPGA implementations of overcomplete (non-decimated) wavelet transforms and on manipulations of the wavelet coefficients that are more complex than thresholding is very limited.

In this paper we develop FPGA implementation of an advanced wavelet domain noise filtering algorithm, which uses a non-decimated wavelet transform and spatially adaptive Bayesian wavelet shrinkage. The standard composite television video stream is digitalized and used as source for real-time video sequences. The results demonstrate the effectiveness of the developed scheme for real time video processing.

Paper 297: Identification of Intestinal Motility Events of Capsule Endoscopy Video Analysis

Author(s): Panagiota Spyridonos, Fernando Vilariño,Jordy Vitria and Petia Radeva

In this paper we introduce a system for assisting the analysis of capsule endoscopy (CE) data, and identifying sequences of frames related to small intestine motility. The imbalanced recognition task of intestinal contractions was addressed by employing an efficient two level video analysis system. At the first level, each video was processed resulting in a number of possible sequences of contractions. In the second level, the recognition of contractions was carried out by means of a SVM classifier. To encode patterns of intestinal motility a panel of textural and morphological features of the intestine lumen were extracted. The system exhibited an overall sensitivity of 73.53% in detecting contractions. The false alarm ratio was of the order of 59.92%. These results serve as a first step for developing assisting tools for computer based CE video analysis, reducing drastically the physician's time spent in image evaluation and enhancing the diagnostic potential of CE examination.

Paper 298: Computing Stereo-Vision in Video Real-Time with Low-Cost SIMD-Hardware

Author(s): Gerold Kraft and Richard Kleihorst

The XETAL chip by Philips Electronics is a low-cost hardwaresolution for image processing on pixel level. The architecture of XETAL focuses on a low-energy environment and it is therefore highly suited for integration into mobile vision and intelligent cameras. While hardware support for 2D-vision has reached the level of affordable state-of-theart technology by thorough research, also real-time 3D-vision by stereo, based on the support by a low-cost and low-energy hardware, appears to be able to reach this level soon.

Paper 299: Noise Reduction of Video Sequences Using Fuzzy Logic Motion Detection

Author(s): Stefan Schulte, Vladimir Zlokolica, Aleksandra Pizurica, Wilfried Philips and Etienne Kerre

In this paper we present a novel video denoising method based on a fuzzy logic recursive motion detection scheme. For each pixel a fuzzy quantity (motion confidence) is calculated, indicating the membership degree of the fuzzy set ``motion''. Next, this fuzzy quantity is used to perform adaptive temporal filtering, where the amount of filtering is inversely proportional to the determined membership degree. Since big motion changes reduce temporal filtering, a non-stationary noise will be introduced. Hence a new fuzzy spatial filter is applied subsequently in order to obtain the final denoised image sequence. Experimental results show that the proposed method outperform other state of the art non-multiscale video denoising techniques and are comparable with some multi-scale (wavelet) based video denoising techniques.

Paper 300: A Study on Non-Intrusive Facial and Eye Gaze Detection

Author(s): Kang Ryoung Park, Min Cheol Whang and Joa Sang Lim

This paper addresses the accurate gaze detection method by tracking facial and eye movement at the same time. For that, we implemented our gaze detection system with a wide and narrow view stereo camera. In order to make it easier to detect the facial and eye feature positions, the dual IR-LED illuminators are also used for our system. The performance of detecting facial features could be enhanced by Support Vector Machine and the eye gaze position on a monitor is computed by a multi-layered perceptron. Experimental results show that the RMS error of gaze detection is about 2.4 degrees (1.68 degrees on X axis and 1.71 degrees on Y axis at the Z distance of 50 cm)

Paper 301: A Real-Time Iris Image Acquisition Algorithm Based on Specular Reflection and Eye Model

Author(s): Kang Ryoung Park and Jang Hee Yoo

In this paper, we propose a new method to capture user's focused iris image at fast speed based on the corneal specular reflection and the human eye model. Experimental results show that the focused iris image acquisition time for the users with and without glasses is 450 ms on average and our method can be used for a real-time iris recognition camera.

Paper 302: An Image Sensor with Global Motion Estimation for Micro Camera Module

Author(s): F. Gensolen, G. Cathebras, L. Martin and M. Robert

We describe in this paper the building of a vision sensor able to provide video capture and the associated global motion between two consecutive frames. Our objective is to propose embedded solutions for mobile applications. The global motion considered here is the one typically produced by handheld devices movement, which is required for our purpose of video stabilization. We extract this global motion from local motion measures at the periphery of the image acquisition area. Thanks to this peculiar and "task-oriented" configuration, the resulting system architecture can take advantage of CMOS focal plane processing capabilities without sacrificing the sensor fill factor. Our approach is currently implemented in a CMOS 0.13µm technology.

Paper 304: Parameterization of Tubular Surfaces on the Cylinder

Author(s): Toon Huysmans, Jan Sijbers and Brigitte Verdonk

Not available

Paper 305: Automated quality assessment of road network data based on VHR images

Author(s): Leyden Martinez-Fonte, Werner Goeman, Sidharta Gautama and Johan D'Haeyer

Not available