Advanced Concepts for Intelligent Vision Systems
Aug. 22-25 2011
Het Pand, Ghent, Belgium
Acivs 2011 Abstracts
Paper 226: Camera-based sensory substitution and augmented reality for the blind
Rapid developments in mobile computing and sensing with smartphones open up new opportunities for augmenting our reality with information and experiences that our senses could not directly provide. One current trend is towards augmented reality applications based on location-based services (LBS) and computer vision. Apart from mass-market uses, there also arise new uses in niche markets such as technology for the blind.
Despite its more limited commercial value, I will in my talk discuss how this particular niche market is extremely interesting for bringing together research on man-machine interfaces, computer vision, brain plasticity, synesthesia, and even contemporary philosophy. It is also an area where fundamental research (e.g. on brain plasticity) may prove directly socially relevant through applications that are readily made globally available over the web, and that run on mass-market devices.
Hybrid applications convey via sound or touch the raw visual information from live camera views as well as semantic information for nearby items of interest, as recognized through computer vision or identified through location databases. Moreover, neuroscience research has in the past decade established that the visual cortex of blind people becomes responsive to sound and touch, thus adding some biological plausibility to the idea of creating non-invasive sensory by-passes in the form of sensory substitution.
Paper 227: Distributed Smart Cameras for Health and Wellbeing
With the increasing elderly population there is a growing interest in systems that are able to monitor the activities of elderly and to use the information for coaching or for alarming. In this way people are able to live independently in their homes for a longer period of time. I will give an overview of the types of activities that are relevant for monitoring, and how cameras can be used for these applications. In this context I will present our work on fall detection with different camera systems. Apart from alarm functions, cameras are also used for therapies and gaming. I will present some of our work in this field. Finally I will present some of our work that studies privacy issues related to camera monitoring.
Paper 228: The tale of 1000 cameras
More and more cameras are appearing in public areas. Such cameras are mostly collecting images for later inspection and provide little more. The presentation discusses what keeps us from exploiting them in more intelligent ways for collaboration. We will look within cameras with multiple vision sensors, and within intelligent mobile networks. From there, we will outline the 1000 camera project for quality control on factory lines.
Paper 229: Computers Seeing Humans --- Vision-based Perception of Humans for Smart Environments and Other Applications
Vision-based perception of humans has a wide range of applications ranging from building human-friendly technical systems such as human-friendly robots or smart environments to applications in surveillance and image retrieval. In this talk I will present some of our recent efforts towards building such systems. In particular I will talk about an ongoing smart control room project, where our aim is to build an attentive smart room to support crisis control work. In this room, real-time perception of people is used to enable personalized workspace that follow people in the room, and to allow gesture and gaze based interaction with large displays and across devices in the room. I will also talk about some ongoing efforts in person identification and retrieval in multimedia data and camera networks. Finally, I will mention some commercial uses cases of such technology such as video-based customer monitoring, including age and gender recognition.
Paper 230: Optical Issues in the Paintings of Jan Van Eyck
The major decade in the artistic life of Jan Van Eyck (+/-1390-1441) was the decade in which linear perspective was codified as the backbone of the art of painting by Leon Battista Alberti in De pictura (1435). According to the archival research of art historian Hugo van der Velden of Harvard University, the completely assembled Ghent Altarpiece, by far the largest work of the Van Eyck's, came about in the very same year (and not in 1432 as currently indicated on the frame). Art historians have valued Jan Van Eyck for what seems painstaking realism in depicting materials but they have deplored his failure to adopt the strict rules of linear perspective. In the talk Marc De Mey will highlight some of the clever tricks and ingenious devices of Jan Van Eyck which indicate advanced optical understanding complementary and even superior to Albertian perspective. The talk will be illustrated with materials and macro photographs in a powerpoint presentation based on the digitized Dierick collection of high quality photographic negatives made available to Ghent University by the family of the late father Alfons Dierick.
Paper 104: A Low-Cost System to Detect Bunches of Grapes in Natural environment from Color images
Despite the benefits of precision agriculture and precision viticulture production systems, its adoption rate in the Portuguese Douro Demarcated Region remains low. One of the most demanding tasks in wine making is harvesting. Even for humans, the environment makes grape detection difficult, especially when the grapes and leaves have a similar color, which is generally the case for white grapes. In this paper, we propose a system for the detection and location, in the natural environment, of bunches of grapes in color images. The system is also able to distinguish between white and red grapes, at the same time, it calculates the location of the bunch stem. The proposed system achieved 97% and 91% correct classifications for red and white grapes, respectively.
Paper 107: Content Makes the Difference in Compression Standard Quality Assessment
In traditional compression standard quality assessment, compressor parameters and performance measures are the main experimental variables. In this paper, we show that the image content is an equally crucial variable which still remains unused. We compare JPEG, JPEG2000 and a proprietary JPEG2000 on four visually different datasets. We base our comparison on PSNR, SSIM, time and bits rate measures. This approach reveals that the JPEG2000 vs. JPEG comparison strongly depends on compressed images visual content.
Paper 109: A Geographical Approach to Self-Organizing Maps Algorithm Applied to Image Segmentation
Image segmentation is one of the most challenging steps in image processing. Its results are used by many other tasks regarding information extraction from images. In remote sensing, segmentation generates regions according to found targets in a satellite image, like roofs, streets, trees, vegetation, agricultural crops, or deforested areas. Such regions differentiate land uses by classification algorithms. In this paper we investigate a way to perform segmentation using a strategy to classify and merge spectrally and spatially similar pixels. For this purpose we use a geographical extension of the Self- Organizing Maps (SOM) algorithm, which exploits the spatial correlation among near pixels. The neurons in the SOM will cluster the objects found in the image, and such objects will define the image segments.
Paper 111: Temporal Prediction and Spatial Regularization in Differential Optical Flow
In this paper we present an extension to the Bayesian formulation of multi-scale differential optical flow estimation by Simoncelli et. al.. We exploit the observation that optical flow is consistent in consecutive time frames and thus propagating information over time should improve the quality of the flow estimation. This propagation is formulated via insertion of additional Kalman filters that filter the flow over time by tracking the movement of each pixel. To stabilize these filters and the overall estimation, we insert a spatial regularization into the prediction lane. Through the recursive nature of the filter the regularization has the ability to perform filling-in of missing information over extended spatial extents. We benchmark our algorithm, which is implemented in the nVidia Cuda framework to exploit the processing power of modern graphical processing units (GPUs), against a state-of-the-art variational flow estimation algorithm that is also implemented in Cuda. The comparison shows that, while the variational method yields somewhat higher precision, our method is more than an order of magnitude faster and can thus operate in real-time on live video streams.
Paper 112: Improved Support Vector Machines with Distance Metric Learning
This paper introduces a novel classification approach which improves the performance of support vector machines (SVMs) by learning a distance metric. The metric learned is a Mahalanobis metric previously trained so that examples from different classes are separated with a large margin. The learned metric is used to define a kernel function for SVM classification. In this context, the metric can be seen as a linear transformation of the original inputs before applying an SVM classifier that uses Euclidean distances. This transformation increases the separability of classes in the transformed space where the classification is applied. Experiments demonstrate significant improvements in classification tasks on various data sets.
Paper 115: Fourier Fractal Descriptors for Colored Texture Analysis
This work proposes the use of a texture descriptor based on the Fourier fractal dimension applied to the analysis of colored textures. The technique consists in the transform of the color space of the texture through a colorimetry approach followed by the extraction of fractal descriptors from each transformed color channell. The fractal descriptors are obtained from the Fourier fractal dimension of the texture. The descriptors are finally concatenated, composing the complete descriptors. The performance of the proposed technique is compared to the classical chromaticity moment and the state-of-the-art multispectral Gabor filters in the classification of samples from the VisTex dataset.
Paper 117: A Biologically Inspired Image Coder with Temporal Scalability
We present a novel bio-inspired and dynamic coding scheme for static images. Our coder aims at reproducing the main steps of the visual stimulus processing in the mammalian retina taking into account its time behavior. The main novelty of this work is to show how to exploit the time behavior of the retina cells to ensure, in a simple way, scalability and bit allocation. To do so, our main source of inspiration will be the biologically plausible retina model called Virtual Retina. Following a similar structure, our model has two stages. The first stage is an image transform which is performed by the outer layers in the retina. Here it is modelled by filtering the image with a bank of difference of Gaussians with time-delays. The second stage is a time-dependent analog-to-digital conversion which is performed by the inner layers in the retina. Thanks to its conception, our coder enables scalability and bit allocation across time. Also, compared to the JPEG standards, our decoded images do not show annoying artefacts such as ringing and block effects. As a whole, this article shows how to capture the main properties of a biological system, here the retina, in order to design a new efficient coder.
Paper 122: Human Identification Based on Gait Paths
Gait paths are spatial trajectories of selected body points during person's walk. We have proposed and evaluated features extracted from gait paths for the task of person identification. We have used the following gait paths: skeleton root element, feet, hands and head. In our motion capture laboratory we have collected human gait database containing 353 different motions of 25 actors. We have proposed four approaches to extract features from motion clips: statistical, histogram, Fourier transform and timeline We have prepared motion filters to reduce the impact of the actor's location and actor's height on the gait path. We have applied supervised machine learning techniques to classify gaits described by the proposed feature sets. We have prepared scenarios of the features selections for every approach and iterated classification experiments. On the basis of obtained classifications results we have discovered most remarkable features for the identification task. We have achieved almost 97% identification accuracy for normalized paths
Paper 123: DTW for Matching Radon Features - A Pattern Recognition and Retrieval Method
In this paper, we present a method for pattern such as graphical symbol and shape recognition and retrieval. It is basically based on dynamic programming for matching the Radon features. The key characteristic of the method is to use DTW algorithm to match corresponding pairs of histograms at every projecting angle. This allows to exploit the Radon property to include both boundary as internal structure of shapes, while avoiding compressing pattern representation into a single vector and thus miss information, thanks to the DTW. Experimental results show that the method is robust to distortion and degradation including affine transformations.
Paper 124: Supervised Visual Vocabulary with Category Information
The bag-of-words model has been widely employed in image classification and object detection tasks. The performance of bag-of-words methods depends fundamentally on the visual vocabulary that is obtained by quantizing the image features into visual words. Traditional vocabulary construction methods (e.g. k-means) are unable to capture the semantic relationship between image features. In order to increase the discriminative power of the visual vocabulary, this paper proposes a technique to construct a supervised visual vocabulary by jointly considering image features and their class labels. The method uses a novel cost function in which a simple and effective dissimilarity measure is adopted to deal with category information. And, we adopt a prototype-based approach which tries to find prototypes for clusters instead of using the means in k-means algorithm. The proposed method works, as the k-means algorithm, by efficiently minimizing a clustering cost function. The experiments on different data sets show that the proposed vocabulary construction method is effective for image classification.
Paper 125: Robust Visual Odometry using Uncertainty Models
In dense, urban environments, GPS by itself cannot be relied on to provide accurate positioning information. Signal reception issues (e.g. occlusion, multi-path effects) often prevent the GPS receiver from getting a positional lock, causing holes in the absolute positioning data. In order to keep assisting the driver, other sensors are required to track the vehicle motion during these periods of GPS disturbance. In this paper, we propose a novel method to use a single on-board consumer-grade camera to estimate the relative vehicle motion. The method is based on the tracking of ground plane features, taking into account the uncertainty on their backprojection as well as the uncertainty on the vehicle motion. A Hough-like parameter space vote is employed to extract motion parameters from the uncertainty models. The method is easy to calibrate and designed to be robust to outliers and bad feature quality. Preliminary testing shows good accuracy and reliability, with a positional estimate within 2 metres for a 400 metre elapsed distance. The effects of inaccurate calibration are examined using artificial datasets, suggesting a self-calibrating system may be possible in future work.
Paper 127: A Space-Time Depth Super-Resolution Scheme For 3D Face Scanning
Current 3D imaging solutions are often based on rather specialized and complex sensors, such as structured light camera/projector systems, and require explicit user cooperation for 3D face scanning under more or less controlled lighting conditions. In this paper, we propose a cost effective 3D acquisition solution with a 3D space-time super-resolution scheme which is particularly suited to 3D face scanning. The proposed solution uses a low-cost and easily movable hardware involving a calibrated camera pair coupled with a non calibrated projector device. We develop a hybrid stereovision and phase-shifting approach using two shifted patterns and a texture image, which not only takes advantage of the assets of stereovision and structured light but also overcomes their weaknesses. We carry out a new super-resolution scheme to correct the 3D facial model and to enrich the 3D scanned view. Our scheme performs the super-resolution despite facial expression variation using a CPD non-rigid matching. We demonstrate both visually and quantitatively the efficiency of the proposed technique.
Paper 128: Robust Active Contour Segmentation with an Efficient Global Optimizer
Active contours or snakes are widely used for segmentation and tracking. Recently a new active contour model was proposed, combining edge and region information. The method has a convex energy function, thus becoming invariant to the initialization of the active contour. This method is promising, but has no regularization term. Therefore segmentation results of this method are highly dependent of the quality of the images. We propose a new active contour model which also uses region and edge information, but which has an extra regularization term. This work provides an efficient optimization scheme based on Split Bregman for the proposed active contour method. It is experimentally shown that the proposed method has significant better results in the presence of noise and clutter.
Paper 132: Image Sharpening by DWT-Based Hysteresis
Improvement of edge details in an image is basically a process of extracting high frequency details from the image and then adding this information to the blurred image. In this paper we propose an image sharpening technique in which high frequency details are extracted using wavelet transforms and then added with the blurred image to enhance the edge details and visual quality. Before this addition, we perform some spatial domain processing on the high pass images, based on hysteresis, to suppress the pixels which may not belong to the edges but retained in the high-pass image.
Paper 134: Underwater Image Enhancement: Using Wavelength Compensation and Image Dehazing (WCID)
Underwater environments often cause color scatter and color cast during photography. Color scatter is caused by haze effects occurring when light reflected from objects is absorbed or scattered multiple times by particles in the water. This in turn lowers the visibility and contrast of the image. Color cast is caused by the varying attenuation of light in different wavelengths, rendering underwater environments bluish. To address distortion from color scatter and color cast, this study proposes an algorithm to restore underwater images that combines a dehazing algorithm with wavelength compensation (WCID). Once the distance between the objects and the camera was estimated using dark channel prior, the haze effects from color scatter were removed by the dehazing algorithm. Next, estimation of the photography scene depth from the residual energy ratios of each wavelength in the background light of the image was performed. According to the amount of attenuation of each wavelength, reverse compensation was conducted to restore the distortion from color cast. An underwater video downloaded from the YouTube website was processed using WCID, Histogram equalization, and a traditional dehazing algorithm. Comparison of the results revealed that WCID simultaneously resolved the issues of color scatter and color cast as well as enhanced image contrast and calibrated color cast, producing high quality underwater images and videos.
Paper 136: Dynamic Texture Analysis and Classification using Deterministic Partially Self-avoiding Walks
Dynamic texture has been attracting extensive attention in the field of computer vision in the last years. These patterns can be described as moving textures which the idea of self-similarity presented by static textures is extended to the spatio-temporal domain. Although promising results have been achieved by recent methods, most of them cannot model multiple regions of dynamic textures and/or both motion and appearance features. To overcome these drawbacks, a novel approach for dynamic texture modeling based on deterministic partially self-avoiding walks is proposed. In this method, deterministic partially self-avoiding walks are performed in three orthogonal planes to combine appearance and motion features of the dynamic textures. Experimental results on two databases indicate that the proposed method improves correct classification rate compared to the existing methods.
Paper 137: Self-Similarity Measure for Assessment of Image Visual Quality
An opportunity of using self-similarity in evaluation of image visual quality is considered. A method for estimating self-similarity for a given image fragment that takes into account contrast sensitivity function is proposed. Analytical expressions for describing the proposed parameter distribution are derived, and their importance to human vision system based image visual quality full-reference evaluation is proven. A corresponding metric is calculated and a mean squared difference for the considered parameter maps in distorted and reference images is considered. Correlation between this metric and mean opinion score (MOS) for five largest openly available specialized image databases is calculated. It is demonstrated that the proposed metric provides a correlation at the level of the best known metrics of visual quality. This, in turn, shows an importance of fragment self-similarity in image perception.
Paper 141: Swarm Intelligence Based Searching Schemes for Articulated 3D Body Motion Tracking
We investigate swarm intelligence based searching schemes for effective articulated human body tracking. The fitness function is smoothed in an annealing scheme and then quantized. This allows us to extract a pool of candidate best particles. The algorithm selects a global best from such a pool. We propose a global-local annealed particle swarm optimization to alleviate the inconsistencies between the observed human pose and the estimated configuration of the 3D model. At the beginning of each optimization cycle, estimation of the pose of the whole body takes place and then the limb poses are refined locally using smaller number of particles. The investigated searching schemes were compared by analyses carried out both through qualitative visual evaluations as well as quantitatively through the use of the motion capture data as ground truth. The experimental results show that our algorithm outperforms the other swarm intelligence searching schemes. The images were captured using multi-camera system consisting of calibrated and synchronized cameras.
Paper 144: A Multi-Layer `gas of circles' Markov Random Field Model for the Extraction of Overlapping Near-Circular Objects
We propose a multi-layer binary Markov random field (MRF) model that assigns high probability to object configurations in the image domain consisting of an unknown number of possibly touching or overlapping near-circular objects of approximately a given size. Each layer has an associated binary field that specifies a region corresponding to objects. Overlapping objects are represented by regions in different layers. Within each layer, long-range interactions favor connected components of approximately circular shape, while regions in different layers that overlap are penalized. Used as a prior coupled with a suitable data likelihood, the model can be used for object extraction from images, e.g. cells in biological images or densely-packed tree crowns in remote sensing images. We present a theoretical and experimental analysis of the model, and demonstrate its performance on various synthetic and biomedical images.
Paper 148: Relation Learning - A New Approach to Face Recognition
Most of current machine learning methods used in face recognition systems require sufficient data to build a face model or face data description. However data insufficiency is currently a common issue. This paper presents a new learning approach to tackle this issue. The proposed learning method employs not only the data in facial images but also relations between them to build relational face models. Preliminary experiments performed on the AT&T and FERET face corpus show a significant improvement for face recognition rate when only a small facial data set is available for training.
Paper 149: Parallel Implementation of the Integral Histogram
The integral histogram is a recently proposed preprocessing technique to compute histograms of arbitrary rectangular gridded (i.e. image or volume) regions in constant time. We formulate a general parallel version of the the integral histogram and analyse its implementation in Star Superscalar (StarSs). StarSs provides a uniform programming and runtime environment and facilitates the development of portable code for heterogeneous parallel architectures. In particular, we discuss the implementation for the multi-core IBM Cell Broadband Engine (Cell/B.E.) and provide extensive performance measurements and tradeoffs using two different scan orders or histogram propagation methods. For 640x480 images, a tile or block size of 28x28 and 16 histogram bins the parallel algorithm is able to reach greater than real-time performance of more than 200 frames per second.
Paper 152: 3D Facial Expression Recognition Based on Histograms of Surface Differential Quantities
3D face models accurately capture facial surfaces, making it possible for precise description of facial activities. In this paper, we present a novel mesh-based method for 3D facial expression recognition using two local shape descriptors. To characterize shape information of the local neighborhood of facial landmarks, we calculate the weighted statistical distributions of surface differential quantities, including histogram of mesh gradient (HoG) and histogram of shape index (HoS). Normal cycle theory based curvature estimation method is employed for the first time on 3D face models along with the common cubic fitting curvature estimation method for the purpose of comparison. Based on the basic fact that different expressions involve different local shape deformations, the SVM classier with both linear and RBF kernels achieves state of the art results on the subset of the BU-3DFE database with the same experimental setting.
Paper 153: Enhancing the Texture Attribute with Partial Differential Equations: a Case of Study with Gabor Filters
Texture is an important visual attribute used to discriminate images. Although statistical features have been successful, texture descriptors do not capture the richness of details present in the images. In this paper we propose a novel approach for texture analysis based on partial differential equations (PDE) of Perona and Malik. Basically, an input image f is decomposed into two components f= u+v, where u represents the cartoon component and v represents the textural component. We show how this procedure can be employed to enhance the texture attribute. Based on the enhanced texture information, Gabor filters are applied in order to compose a feature vector. Experiments on two benchmark datasets demonstrate the superior performance of our approach with an improvement of almost 6%. The results strongly suggest that the proposed approach can be successfully combined with different methods of texture analysis.
Paper 154: Real-Time Depth Estimation with Wide Detectable Range using Horizontal Planes of Sharp Focus Proceedings
We have been investigating a real-time depth estimation technique with a wide detectable range. This technique employs tilted optics imaging to use the variance of the depth of field on the horizontal planes of sharp focus. It requires considerably fewer multiple focus images than the conventional passive methods, e.g., the depth-from-focus and the depth-from-defocus methods. Hence, our method helps avoid the bottleneck of the conventional methods: the fact that the motion speed of optical mechanics is significantly slower than that of the image processing parts. Therefore, it is suitable for applications, such as for use in automobiles and for robotic tasks, involving depth estimation with a wide detectable range and real-time processing.
Paper 156: Comparison of Visual Registration Approaches of 3D Models for Orthodontics
We propose to apply vision techniques to develop a main tool for orthodontics: the virtual occlusion of two dental casts. For that purpose, we process photos of the patient mouth and match points between these photos and the dental 3D models. From a set of 2D/3D matches of the two arcades, we calculate the projection matrix, before the mandible registration under the maxillary through a rigid transformation. We perform the mandible registration minimizing the reprojection errors. Two computation methods, depending on the knowledge of camera intrinsic parameters, are compared. Tests are carried out both on virtual and real images. In the virtual case, assumed as perfect, we evaluate the robustness against noise and the increase of performance using several views. Projection matrices and registration efficiency are evaluated respectively by reprojection errors and the differences between the rigid transformation and the reference pose, recorded on the six degrees of freedom.
Paper 157: Feasibility Analysis of Ultra High Frame Rate Visual Servoing on FPGA and SIMD Processor
Visual servoing has been proven to obtain better performance than mechanical encoders for position acquisition. However, the often computationally intensive vision algorithms and the ever growing demands for higher frame rate make its realization very challenging. This work performs a case study on a typical industrial application, organic light emitting diode (OLED) screen printing, and demonstrates the feasibility of achieving ultra high frame rate visual servoing applications on both field programmable gate array (FPGA) and single instruction multiple data (SIMD) processors. We optimize the existing vision processing algorithm and propose a scalable FPGA implementation, which processes a frame within 102 us. Though a dedicated FPGA implementation is extremely efficient, lack of flexibility and considerable amount of implementation time are two of its clear drawbacks. As an alternative, we propose a reconfigurable wide SIMD processor, which balances among efficiency, flexibility, and implementation effort. For input frames of 120 × 45 resolution, our SIMD can process a frame within 232 us, sufficient to provide a throughput of 1000 fps with less than 1 ms latency for the whole vision servoing system. Compared to the reference realization on MicroBlaze, the proposed SIMD processor achieves a 21× performance improvement.
Paper 159: Efficiency Optimization of Trainable Feature Extractors for a Consumer Platform
This paper proposes an algorithmic optimization for the feature extractors of biologically inspired Convolutional Neural Networks (CNNs). CNNs are successfully used for different visual pattern recognition applications such as OCR, face detection and object classification. These applications require complex networks exceeding 100,000 interconnected computational nodes. To reduce the computational complexity a modified algorithm is proposed; real benchmarks show 65 - 83% reduction, with equal or even better recognition accuracy. Exploiting the available parallelism in CNNs is essential to reduce the computational scaling problems. Therefore the modified version of the algorithm is implemented and evaluated on a GPU platform to demonstrate the suitability on a cost effective parallel platform. A speedup of 2.5x with respect to the standard algorithm is achieved.
Paper 160: hSGM: Hierarchical Pyramid based Stereo Matching Algorithm
In this paper, we propose a variant of Semi-Global Matching, hSGM which is a hierarchical pyramid based dense stereo matching algorithm. Our method aggregates the matching costs from the coarse to fine scale in multiple directions to determine the optimal disparity for each pixel. It has several advantages over the original SGM: a low space complexity and efficient implementation on GPU. We show several experimental results to demonstrate our method is efficient and obtains a good quality of disparity maps.
Paper 162: Analysis of Wear Debris Through Classification
This paper introduces a novel method of wear debris analysis through classification of the particles based on machine learning. Wear debris consists of particles of metal found in e.g. lubricant oils used in engineering equipment. Analytical ferrography is one of methods for wear debris analysis and it is very important for early detection or even prevention of failures in engineering equipment, such as combustion engines, gearboxes, etc. The proposed novel method relies on classification of wear debris particles into several classes defined by the origin of such particles. Unlike the earlier methods, the proposed classification approach is based on visual similarity of the particles and supervised machine learning. The paper describes the method itself, demonstrates its experimental results, and draws conclusions.
Paper 163: System on Chip Coprocessors for High Speed Image Feature Detection and Matching
Successful establishing of point correspondences between consecutive image frames is important in tasks such as visual odometry, structure from motion or simultaneous localization and mapping. In this paper, we describe the architecture of the compact, energy-efficient dedicated hardware processors, enabling fast feature detection and matching.
Paper 165: Combining Linear Dimensionality Reduction and Locality Preserving Projections with Feature Selection for Recognition Tasks
Recently, a graph-based method was proposed for Linear Dimensionality Reduction (LDR). It is based on Locality Preserving Projections (LPP). It has been successfully applied in many practical problems such as face recognition. In order to solve the Small Size Problem that usually affects face recognition, LPP is preceded by a Principal Component Analysis (PCA). This paper has two main contributions. First, we propose a recognition scheme based on the concatenation of the features provided by PCA and LPP. We show that this concatenation can improve the recognition performance. Second, we propose a feasible approach to the problem of selecting the best features in this mapped space. We have tested our proposed framework on several public benchmark data sets. Experiments on ORL, UMIST, PF01, and YALE Face Databases and MNIST Handwritten Digit Database show significant performance improvements in recognition.
Paper 166: Hierarchical Blurring Mean-Shift
In recent years, various Mean-Shift methods were used for filtration and segmentation of images and other datasets. These methods achieve good segmentation results, but the computational speed is sometimes very low, especially for big images and some specific settings. In this paper, we propose an improved segmentation method that we call Hierarchical Blurring Mean-Shift. The method achieve significant reduction of computation time and minimal influence on segmentation quality. A comparison of our method with traditional Blurring Mean-Shift and Hierarchical Mean-Shift with respect to the quality of segmentation and computational time is demonstrated. Furthermore, we study the influence of parameter settings in various hierarchy depths on computational time and number of segments. Finally, the results promising reliable and fast image segmentation are presented.
Paper 168: Evaluation of Image Segmentation Algorithms from the Perspective of Salient Region Detection
The present paper addresses the problem of image segmentation evaluation by comparing seven different approaches. We are presenting a new method of salient object detection with very good results relative to other already known object detection methods. We developed a simple evaluation framework in order to compare the results of our method with other segmentation methods. The results of our experimental work offer good perspectives for our algorithm, in terms of efficiency and precision.
Paper 170: Ridges and Valleys Detection in Images using Difference of Half Rotating Smoothing Filters
In this paper we propose a new ridge/valley detection method in images based on the difference of rotating Gaussian semi filters. The novelty of this approach resides in the mixing of ideas coming both from directional filters and DoG method. We obtain a new ridge/valley anisotropic DoG detector enabling very precise detection of ridge/valley points. Moreover, this detector performs correctly at crest lines even if highly bended, and is precise on junctions. This detector has been tested successfully on various image types presenting difficult problems for classical ridges/valleys detection methods.
Paper 171: Surface Reconstruction of Rotating Objects from Monocular Video
The ability to model 3D objects from monocular video allows for a number of very useful applications, for instance: 3D face recognition, fast prototyping and entertainment. At present there are a number of methods available for 3D modelling from this and similar data. However many of them are either not robust when presented with real world data, or tend to bias their results to a prior model. Here we use energy minimisation of a restricted circular motion model to recover the 3D shape of an object from video of it rotating. The robustness of the algorithm to noise in the data and deviations from the assumed motion is tested and a 3D model of a real polystyrene head is created.
Paper 172: Combining Plane Estimation with Shape Detection for Holistic Scene Understanding
Structural scene understanding is an interconnected process wherein modules for object detection and supporting structure detection need to co-operate in order to extract cross-correlated information, thereby utilizing the maximum possible information rendered by the scene data. Such an inter-linked framework provides a holistic approach to scene understanding, while obtaining the best possible detection rates. Motivated by recent research in coherent geometrical contextual reasoning and object recognition, this paper proposes a unified framework for robust 3D supporting plane estimation using a joint probabilistic model which uses results from object shape detection and 3D plane estimation. Maximization of the joint probabilistic model leads to robust 3D surface estimation while reducing false perceptual grouping. We present results on both synthetic and real data obtained from an indoor mobile robot to demonstrate the benefits of our unified detection framework.
Paper 173: Calibration and Reconstruction Algorithms for a Handheld 3D Laser Scanner
We develop a precise calibration algorithm and an efficient three-dimensional reconstruction algorithm for a handheld 3D laser scanner. Our laser scanner consists of a color camera and a line laser oriented in a fixed relation to each other. Besides the three-dimensional coordinates of the observed object our reconstruction algorithm returns a comprehensive measure of uncertainty for the reconstructed points. We experimentally evaluate the applicability of our methods on several interesting practical examples. In particular, for a calibrated sensor setup we can estimate for each pixel a human-interpretable upper bound for the reconstruction quality. This determines a ``working area'' in the image of the camera where the pixels have a reasonable accuracy. This helps to remove outliers and to increase the computation speed of our implementation.
Paper 178: A Method to Generate Articial 2D Shape Contour Based in Fourier Transform and Genetic Algorithms
This work presents a simple method to generate 2D contours based in the small number of samples. The method uses the Fourier transform and genetic algorithms. Using crossover and mutation operator new samples were generated. An application case is presented and the samples produced were tested in the classifier construction. The result obtained indicated the method can be a good solution to solve the small sample problem to feature vectors based in shape characteristics.
Paper 180: Facial Feature Tracking for Emotional Dynamic Analysis
This article presents a feature-based framework to automatically track 18 facial landmarks for emotion recognition and emotional dynamic analysis. With a new way of using multi-kernel learning, we combine two methods: the first matches facial feature points between consecutive images and the second uses an offline learning of the facial landmark appearance. Matching points results in a jitter-free tracking and the offline learning prevents the tracking framework from drifting. We train the tracking system on the Cohn- Kanade database and analyze the dynamic of emotions and Action Units on the MMI database sequences. We perform accurate detection of facial expressions temporal segment and report experimental results.
Paper 181: Salient Region Detection using Discriminative Feature Selection
Detecting visually salient regions is useful for applications such as object recognition/segmentation, image compression, and image retrieval. In this paper we propose a novel method based on discriminative feature selection to detect salient regions in natural images. To accomplish this, salient region detection was formulated as a binary labeling problem, where the features that best distinguish a salient region from its surrounding background are empirically evaluated and selected based on a two-class variance ratio. A large image data set was employed to compare the proposed method to six state- of-the-art methods. From the experimental results, it has been confirmed that the proposed method outperforms the six algorithms by achieving higher precision and better F-measurements.
Paper 182: Detection of Human Groups in Videos
In this paper, we consider the problem of finding and localizing social human groups in videos, which can form a basis for further analysis and monitoring of groups in general. Our approach is motivated by the collective behavior of individuals which has a fundament in sociological studies. We design a detection-based multi-target tracking framework which is capable of handling short-term occlusions and producing stable trajectories. Human groups are discovered by clustering trajectories of individuals in an agglomerative fashion. A novel similarity function related to distances between group members, robustly measures the similarity of noisy trajectories. We have evaluated our approach on several test sequences and achieved acceptable miss rates (19.4%, 29.7% and 46.7%) at reasonable false positive detections per frame (0.129, 0.813 and 0.371). The relatively high miss rates are caused by a strict evaluation procedure, whereas the visual results are quite acceptable.
Paper 183: Precise Registration of 3D Images Acquired from a Hand-Held Visual Sensor
This paper presents a method for precise registration of 3D images acquired from a new sensor for 3D digitization moved manually by an operator around an object. The system is equipped with visual and inertial devices and with a speckle pattern projector. The presented method has been developed to address the problem that a moving speckle pattern during a sequence prevents from correlating points between images acquired from two successive viewpoints. So several solutions are proposed, based on images acquired with a moving speckle pattern. It improves ICP-based methods classically used for precise registration of two clouds of 3D points.
Paper 184: Image Analysis Applied to Morphological Assessment in Bovine Livestock
Morphological assessment is one important parameter considered in conservation and improvement programs of bovine livestock. This assessment process consists of scoring an animal attending to its morphology, and is normally carried out by highly-qualified staff.
In this paper, a system designed to provide an assessment based on a lateral image of the cow is presented. The system consists of two main parts: a feature extractor stage, to reduce the information of the cow in the image to a set of parameters, and a neural network stage to provide a score considering that set of parameters. For the image analysis section, a model of the object is constructed by means of point distribution models (PDM). Later, that model is used in the searching process within each image, that is carried out using genetic algorithm (GA) techniques. As a result of this stage, the vector of weights that describe the deviation of the given shape from the mean is obtained. This vector is used in the second stage, where a multilayer perceptron is trained to provide the desired assessment, using the scores given by experts for selected cows.
The system has been tested with 124 images corresponding to 44 individuals of a special rustic breed, with very promising results, taking into account that the information contained in only one view of the cow is not complete.
Paper 187: Simple Single View Scene Calibration
This paper addresses automatic calibration of images, where the main goal is to extract information about objects and relations in the scene based on the information contained in the image itself. The purpose of such calibration is to enable, for example, determination of object coordinates, measurements of distances or areas between objects in the image, etc. The idea of the presented work here is to detect objects in the image whose size is known (e.g. traffic signs in the presented case) and to exploit their relative sizes and positions in the image in order to perform the calibration under some assumptions about possible spatial distribution of the objects (e.g. their positioning on a plane in the presented case). This paper describes related research and the method itself. It also shows and discusses the results and proposes possible extensions.
Paper 188: Mutual Information Refinement for Flash-no-Flash Image Alignment
Flash-no-flash imaging aims to combine ambient light images with details available in flash images. Flash can alter color intensities radically leading to changes in gradient directions and strengths, as well as natural shadows possibly being removed and new ones created. This makes flash-no-flash image pair alignment a challenging problem. In this paper, we present a new image registration method utilizing mutual information driven point matching accuracy refinement. For a phase correlation based method, accuracy improvement through the suggested point refinement was over 40 %. The new method also performed better than the reference methods SIFT and SURF by 3.0 and 9.1 % respectively in alignment accuracy. Visual inspection also confirmed that in several cases the proposed method succeeded in registering flash-no-flash image pairs where the tested reference methods failed.
Paper 192: A New Anticorrelation-based Spectral Clustering Formulation
This paper introduces the Spectral Clustering Equivalence (SCE) algorithm which is intended to be an alternative to spectral clustering (SC) with the objective to improve both speed and quality of segmentation. Instead of solving for the spectral decomposition of a similarity matrix as in SC, SCE converts the similarity matrix to a column-centered dissimilarity matrix and searches for a pair of the most anticorrelated columns. The orthogonal complement to these columns is then used to create an output feature vector (analogous to eigenvectors obtained via SC), which is used to partition the data into discrete clusters. We demonstrate the performance of SCE on a number of artificial and real datasets by comparing its classification and image segmentation results with those returned by kernel-PCA and Normalized Cuts algorithm. The column-wise processing allows the applicability of SCE to Very Large Scale problems and asymmetric datasets.
Paper 193: Nonparametric Estimation of Fisher Vectors to Aggregate Image Descriptors
We investigate how to represent a natural image in order to be able to recognize the visual concepts within it. The core of the proposed method consists in a new approach to aggregate local features, based on a non-parametric estimation of the Fisher vector, that result from the derivation of the gradient of the loglikelihood. For this, we need to use low level local descriptors that are learned with independent component analysis and thus provide a statistically independent description of the images. The resulting signature has a very intuitive interpretation and we propose an efficient implementation as well. We show on publicly available datasets that the proposed image signature performs very well.
Paper 195: A 3-D Tube Scanning Technique based on Axis and Center Alignment of Multi-Laser Triangulation
This paper presents a novel 3D tube scanning technique based on a multi-laser triangulation. A multi-laser and camera module, which will be mounted in front of a mobile robot, captures a sequence of 360 degree shapes of the inner surface of a cylindrical tube. In each scan of the sequence, a circular shape, which is composed of four partial ellipses, is reconstructed from a multi- laser triangulation technique. To reconstruct a complete shape of the tube, the center and axis of the circular shape in each scan are aligned to a common tube model. To overcome inherent alignment noises due to off-axis robot motion, sensor vibration, and etc., we derive and apply a 3D Euclidean transformation matrix in each scan. In experimental results, we show that the proposed technique reconstructs very accurate 3D shapes of a tube even though there is motion vibration.
Paper 196: Curve-Skeletons Based on the Fat Graph Approximation
We present a new definition of the 3D curve-skeleton. This definition provides a mathematically strict way to compare and evaluate various approaches to the skeletonization of 3D shapes. The definition is based on the usage of fat curves. A fat curve is a 3D object which allows to approximate tubular fragments of the shape. A set of fat curves is used to approximate the entire shape; such a set can be considered as a generalization of the 2D medial axis. We also present an algorithm which allows to build curve-skeletons according to the given definition. The algorithm is robust and efficient.
Paper 199: Segmentation Based Tone-mapping for High Dynamic Range Images
In this paper, we present a novel segmentation based method for displaying high dynamic range image. We segment images into regions and then carry out adaptive contrast and brightness adjustment using global tone mapping operator in the local regions to reproduce local contrast and brightness and ensure better quality. We propose a weighting scheme to eliminate the boundary artifacts caused by the segmentation and decrease the local contrast enhancement adaptively in the uniform area to eliminate the noise introduced. We demonstrate that our methods are easy to use and a fixed set of parameter values produces good results for a wide variety of images.
Paper 201: Feature Space Warping Relevance Feedback With Transductive Learning
Relevance feedback is a widely adopted approach to improve content-based information retrieval systems by keeping the user in the retrieval loop. Among all, the feature space warping has been proposed as an effective approach for bridging the gap between high-level semantics and the low-level features. Recently, combination of feature space warping and query point movement techniques has been proposed in contrast to learning based approaches, showing good performance under different data distributions. In this paper we propose to merge feature space warping and transductive learning, in order to benefit from both the ability of adapting data to the user hints and the information coming from unlabeled samples. Experimental results on an image retrieval task reveal significant performance improvements from the proposed method.
Paper 202: Estimation of Human Orientation in Images Captured with a Range Camera
Estimating the orientation of the observed person is a crucial task for some application fields like home entertainment, man-machine interaction, or intelligent vehicles. In this paper, we discuss the usefulness of conventional cameras for estimating the orientation, present some limitations, and show that 3D information improves the estimation performance.
Technically, the orientation estimation is solved in the terms of a regression problem and supervised learning. This approach, combined to a slicing method of the 3D volume, provides mean errors as low as 9.2° or 4.3° depending on the set of considered poses. These results are consistent with those reported in the literature. However, our technique is faster and easier to implement than existing ones.
Paper 205: Automatic Occlusion Removal from Facades for 3D Urban Reconstruction
Object removal and inpainting approaches typically require a user to manually create a mask around occluding objects. While creating masks for a small number of images is possible, it rapidly becomes untenable for longer image sequences. Instead, we accomplish this step automatically using an object detection framework to explicitly recognize and remove several classes of occlusions. We propose using this technique to improve 3D urban reconstruction from street level imagery, in which building facades are frequently occluded by vegetation or vehicles. By assuming facades in the background are planar, 3D scene estimation provides important context to the inpainting process by restricting input sample patches to regions that are coplanar to the occlusion, leading to more realistic final textures. Moreover, because non-static and reflective occlusion classes tend to be difficult to reconstruct, explicitly recognizing and removing them improves the resulting 3D scene.
Paper 207: Fast Hough Transform on GPUs: Exploration of Algorithm Trade-offs
The Hough transform is a commonly used algorithm to detect lines and other features in images. It is robust to noise and occlusion, but has a large computational cost. This paper introduces two new implementations of the Hough transform for lines on a GPU. One focuses on minimizing processing time, while the other has an input-data independent processing time. Our results show that optimizing the GPU code for speed can achieve a speed-up over naive GPU code of about 10x. The implementation which focuses on processing speed is the faster one for most images, but the implementation which achieves a constant processing time is quicker for about 20% of the images.
Paper 209: Knowledge Driven Saliency: Attention to the Unseen
This paper deals with attention in 3D environments based upon knowledge-driven cues as an advancement in the state-of-the-art of visual attention modeling. Using learnt 3D scenes as top-down influence, the proposed system is able to mark high saliency to locations occupied by objects that are new, changed, or even missing from their location as compared to the already learnt situation. The proposal adresses a system level solution covering learning of 3D objects and scenes using visual, range and odometry sensors, storage of spatial knowledge using multiple-view theory from psychology, and validation of scenes using recognized objects as anchors. The proposed system is designed to handle the complex scenarios of recognition with partially visible objects during revisit to the scene from an arbitrary direction. Simulation results have shown success of the designed methodology under ideal sensor readings from range and odometry sensors.
Paper 210: Quantifying Appearance Retention in Carpets using Geometrical Local Binary Patterns
Quality assessment in carpet manufacturing is performed by humans who evaluate the appearance retention (AR) grade on carpet samples. To quantify the AR grades objectively, different research based on computer vision have been developed. Among them Local Binary Pattern (LBP) and its variations have shown promising results. Nevertheless, the requirements of quality assessment on a wide range of carpets have not been met yet. One of the difficulties is to distinguish between consecutive AR grades in carpets. For this, we adopt an extension of LBP called Geometrical Local Binary Patterns (GLBP) that we recently proposed. The basis of GLBP is to evaluate the grey scale differences between adjacent points defined on a path in a neighbourhood. Symmetries of the paths in the GLBPs are evaluated. The proposed technique is compared with an invariant rotational mirror based LBP technique. The results show that the GLBP technique is better for distinguishing consecutive AR grades in carpets.
Paper 211: Virtual Restoration of the Ghent Altarpiece Using Crack Detection and Inpainting
In this paper, we present a new method for virtual restoration of digitized paintings, with the special focus on the Ghent Altarpiece (1432), one of Belgium's greatest masterpieces. The goal of the work is to remove cracks from the digitized painting thereby approximating how the painting looked like before ageing for nearly 600 years and aiding art historical and palaeographical analysis. For crack detection, we employ a multiscale morphological approach, which can cope with greatly varying thickness of the cracks as well as with their varying intensities (from dark to the light ones). Due to the content of the painting (with extremely many fine details) and complex type of cracks (including inconsistent whitish clouds around them), the available inpainting methods do not provide satisfactory results on many parts of the painting. We show that patch-based methods outperform pixel-based ones, but leaving still much room for improvements in this application. We propose a new method for candidate patch selection, which can be combined with different patch-based inpainting methods to improve their performance in crack removal. The results demonstrate improved performance, with less artefacts and better preserved fine details.
Paper 212: Image Segmentation Based on Electrical Proximity in a Resistor-Capacitor Network
Measuring the distances is an important problem in many image-segmentation algorithms. The distance should tell whether two image points belong to a single or, respectively, to two different image segments. The paper deals with the problem of measuring the distance along the manifold that is defined by image. We start from the discussion of difficulties that arise if the geodesic distance, diffusion distance, and some other known metrics are used. Coming from the diffusion equation and inspired by the diffusion distance, we propose to measure the proximity of points as an amount of substance that is transferred in diffusion process. The analogy between the images and electrical circuits is used in the paper, i.e., we measure the proximity as an amount of electrical charge that is transported, during a certain time interval, between two nodes of a resistor-capacitor network. We show how the quantity we introduce can be used in the algorithms for supervised (seeded) and unsupervised image segmentation. We also show that the distance between the areas consisting of more than one point (pixel) can also be easily introduced in a meaningful way. Experimental results are also presented.
Paper 213: Constrast Enhanced Ultrasound Images Restoration
In this paper, we propose a new anisotropic diffusion scheme to restore contrast enhanced ultrasound images for a better quantification of liver arterial perfusion. We exploit the image statistics to define a new edge stopping function. The method has been tested on liver lesions. The results show that the assessment of lesion vascularization from our process can potentially be used for the diagnostic of liver carcinoma.
Paper 216: A Comparative Study of Vision-based Lane Detection Methods
Lane detection consists of detecting the lane limits where the vehicle carrying the camera is moving. The aim of this study is to propose a lane detection method through digital image processing. Morphological filtering, Hough transform and linear parabolic fitting are applied to realize this task. The results of our proposed method are compared with three proposed researches. The method presented here was tested on video sequences filmed by the authors on Tunisian roads, on a video sequence provided by Daimler AG as well as on the PETS2001 dataset provided by the Essex University.
Paper 218: Separating Occluded Humans by Bayesian Pixel Classifier with Re-weighted Posterior Probability
This paper proposes a Bayesian pixel classification method with re-weighted posterior probability for separating multiple occluded humans. We separate the occluded humans by considering the occlusion region as a pixel classification problem. First, we detect an isolated human using the human detector. Then we divide it into three body parts (head, torso, and legs) using the body part detector, and model the color distributions of each body part using a naive Bayes classifier. Next, we detect an occlusion region by associating the occluded humans in consecutive frames. Finally, we identify the pixels associated with a human or body parts in occlusion region by the Bayesian pixel classifier with re-weighted posterior probability, which can classify them more accurately. Experimental results show that our proposed method can classify pixels in an occlusion region and separate multiple occluded humans.
Paper 219: A New Multi-Camera Approach For Lane Departure Warning
In this paper, we present a new multi camera approach to Lane Departure Warning (LDW). Upon acquisition, the captured images are transformed to a bird's-eye view using a modified perspective removal transformation. Then, camera calibration is used to accurately determine the position of the two cameras relative to a reference point. Lane detection is performed on the front and rear camera images which are combined using data fusion. Finally, the distance between the vehicle and adjacent lane boundaries is determined allowing to perform LDW. The proposed system was tested on real world driving videos and shows good results when compared to ground truth.
Paper 220: An Edge-based Approach for Robust Foreground Detection
Foreground segmentation is an essential task in many image processing applications and a commonly used approach to obtain foreground objects from the background. Many techniques exist, but due to shadows and changes in illumination the segmentation of foreground objects from the background remains challenging. In this paper, we present a powerful framework for detections of moving objects in real-time video processing applications under various lighting changes. The novel approach is based on a combination of edge detection and recursive smoothing techniques. We use edge dependencies as statistical features of foreground and background regions and define the foreground as regions containing moving edges. The background is described by short- and long-term estimates. Experiments prove the robustness of our method in the presence of lighting changes in sequences compared to other widely used background subtraction techniques.
Paper 223: Simultaneous Partitioned Sampling for Articulated Object Tracking
In this paper, we improve the Partitioned Sampling (PS) scheme to better handle high-dimensional state spaces. PS can be explained in terms of conditional independences between random variables of states and observations. These can be modeled by Dynamic Bayesian Networks. We propose to exploit these networks to determine conditionally independent subspaces of the state space. This allows us to simultaneously perform propagations and corrections over smaller spaces. This results in reducing the number of necessary resampling steps and, in addition, in focusing particles into high-likelihood areas. This new methodology, called Simultaneous Partitioned Sampling, is successfully tested and validated for articulated object tracking.
Paper 224: Video Stippling
In this paper, we consider rendering color videos using a non-photo-realistic art form technique commonly called stippling. Stippling is the art of rendering images using point sets, possibly with various attributes like sizes, elementary shapes, and colors. Producing nice stippling is attractive not only for the sake of image depiction but also because it yields a compact vectorial format for storing the semantic information of media. In order to create stippled videos, our method improves over the naive scheme by considering dynamic point creation and deletion according to the current scene semantic complexity. Furthermore, we explain how to produce high quality stippled "videos" (eg., fully dynamic spatio-temporal point sets) for media containing various fading effects. We report on practical performances of our implementation, and present several stippled video results rendered on-the-fly using our viewer that allows both spatio-temporal dynamic rescaling (eg., upscale vectorially frame rate).