Acivs 2010 Advanced Concepts for Intelligent Vision Systems |
|
Photo: Sally Mayman, Courtesy of Tourism New South Wales |
|
Dec. 13-16, 2010 Macquarie University, Sydney, Australia |
Acivs 2010 Abstracts
Invited papers
Paper 251: UAV Video Analysis
Video surveillance and monitoring is one of most active areas of research in Computer Vision. The main steps in a video surveillance system include: detection and categorization of objects of interest in video (e.g. people, vehicles), tracking of those objects from frame to frame, and recognition of their activities, behavior and patterns.
In this talk, I will present an overview of our work in airborne video surveillance. First, I will present our method for tracking thousands of objects in high resolution, low frame-rate, and multiple camera aerial videos. The proposed algorithm avoids the pitfalls of global minimization of data association costs and instead maintains multiple local associations for each track. In contrast with 1-1 correspondence constraints of bipartite graph matching and multiple hypotheses tracking algorithms, the proposed method allows representation of object state in terms of many to many data associations per track.
Next, I will present a method for detecting humans in aerial imagery. The method uses three geometric constraints, namely the orientation of ground plane normal, orientation of shadows cast by humans, and relationship between height and shadow length. This information is used to estimate locations of humans in the scene, and then candidates for detected humans are classified using wavelet features and a Support Vector Machine.
Finally, I will present our approach for detecting key term motion patterns like vehicle turning, pedestrian crossing the road, etc. Motion patterns are spatial segment of the image that has a high degree of local similarity of speed as well as flow direction within the segment and otherwise outside. We employ a mixture model representation of salient patterns of optical flow for learning motion patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is used to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time.
Paper 252: Remote sensing options
A prime use for remote sensing are agricultural measurements. Analysing multi-spectral images from the earth gives feedback of the quality of the soil and vegetation and can also do predictions of crop-failures and insect plagues, leaving time for action.
Various options are possible for remote sensing. From orbiting satellites to monitoring from unmanned airplanes. They differ in the trade-off between viewing angle and resolution and in the number of spectral bands available. In VITO, Belgium, various remote sensing techniques are used and developed to feed our data-center where daily giga-pixel sized images from the earth are being processed. Also, we work on the use of remote sensing systems for disaster monitoring and for calibration procedures of already installed remote sensing devices.
The talk will give a high level overview of the options in remote sensing.
Paper 253: Convex and Discrete optimization techniques in Computer Vision
Paper 254: Image Analysis - is it just applied statistical analysis and approximation theory?
If one looks across the multitude of papers on solving image analysis problems, there are a limited number of central themes and variations in overall methodolgy.
For example, a paradigm stretching back well into the 70's and probably earlier, is: characterize the problem as the solution of an objective function, study the methods for efficiently solving that optimization problem (including, replacing it with an approximate characterization that is easier to solve).
Popular "schools" such as scale-space analysis, wavelet and (more recently) compressed sensing follow a line of attack that chooses a representation that has certain advantages - particularly leading to approximations with "nice" properties. Much, if not all, of machine learning is a form of statistical approximation (and sometimes entwined with non-statistical, such as geometric, approximation).
This talk will take some influential examples from image-processing/analysis and illustrate how they represent variations on treating the problem as an (possibly statistical) approximation problem. In a sense, the challenge boils down to dealing "the right" approximation that allows a solution methodology that is efficient and robust.
Regular papers
Paper 108: Optimisation-Based Image Grid Smoothing for SST Images
The present paper focuses on smoothing techniques for Sea Surface Temperature (SST) satellite images. Due to the non-uniformity of the noise in the image as well as their relatively low spatial resolution, automatic analysis on SST images usually gives poor results. This paper presents a new framework to smooth and enhance the information contained in the images. The gray levels in the image are filtered using a mesh smoothing technique called SOWA while a new technique for resolution enhancement, named grid smoothing, is introduced and applied to the SST images. Both techniques (SOWA and grid smoothing) represent an image using an oriented graph. In this framework, a quadratic criterion is defined according to the gray levels (SOWA) and the spatial coordinates of each pixel (grid smoothing) and minimised using non-linear programming. The two-steps enhancement method is tested on real SST images originated from Meteosat first generation satellite.
Paper 110: Dense Stereo Matching From Separated Views of Wide-Baseline Images
In this paper, we present a dense stereo matching algorithm from multiple wide-baseline images with separated views. The algorithm utilizes the coarse-to-fine strategy to propagate the sparse feature matching to dense stereo for image pixels. First, the images are segmented into non-overlapping homogeneous partitions. Then, in the coarse step, the initial disparity map is estimated by assigning the sparse feature correspondences, where the spatial location of these features is incorporated with the over-segmentation. The initial occlusion status is obtained by cross-checking test. Finally, the stereo maps are refined by the proposed discontinuity-preserving regularization algorithm, which directly coupling the disparity and occlusion labeling. The experimental results implemented on the real date sets of challenging samples, including the wide-baseline image pairs with both identical scale and different scale, demonstrated the good subjective performance of the proposed method.
Paper 111: Image Recognition through Incremental Discriminative Common Vectors
An incremental approach to the discriminative common vector (DCV) method for image recognition is presented. Two different but equivalent ways of computing both common vectors and corresponding subspace projections have been considered in the particular context in which new training data becomes available and learned subspaces may need continuous updating. The two algorithms are based on either scatter matrix eigendecomposition or difference subspace orthonormalization as with the original DCV method. The proposed incremental methods keep the same good properties than the original one but with a dramatic decrease in computational burden when used in this kind of dynamic scenario. Extensive experimentation assessing the properties of the proposed algorithms using several publicly available image databases has been carried out.
Paper 112: Regularized Kernel Locality Preserving Discriminant Analysis for Face Recognition
In this paper, a regularized kernel locality preserving discriminant analysis (RKLPDA) method is proposed for facial feature extraction and recognition.The proposed RKLPDA comes into the characteristic of LPDA that encodes both the geometrical and discriminant structure of the data manifold, and improves the classification ability for linear non-separable data by introducing kernel trick. Meanwhile, by regularizing the eigenvectors of the kernel locality preserving within-class scatter, RKLPDA utilizes all the discriminant information and eliminates the small sample size (SSS) problem. Experiments on ORL and FERET face databases are performed to test and evaluate the proposed algorithm. The results demonstrate the effectiveness of RKLPDA.
Paper 113: Fire Detection in Color Images using Markov Random Fields
Automatic video based fire detection can greatly reduce fire alert delay in large industrial and commercial sites, at a minimal cost, by using the existing CCTV camera network. Most traditional computer vision methods for fire detection model the temporal dynamics of the flames, in conjunction with simple color filtering. An important drawback of these methods is that their performance degrades at lower frame rates, and they cannot be applied to still images, limiting their applicability. Also, real-time operation often requires significant computational resources, which may be unfeasible for large camera networks. This paper presents a novel method for fire detection in static images, based on a Markov Random Field but with a novel potential function. The method detects 99.6% of fires in a large collection of test images, while generating less false positives then a state-of-the-art reference method. Additionally, parameters are easily trained on a 12-image training set with minimal user input.
Paper 120: A Caustic Approach of Panoramic Image Analysis
In this article, the problem of blur in a panoramic image from a catadioptric camera is analyzed through the determination of the virtual image. This determination is done first with an approximative method, and second through the caustic approach. This leads us to a general caustic approach of panoramic image analysis, where equations of virtual images are given. Finally, we give some direct applications of our analysis, such as depth of field (blur) or image resolution.
Paper 121: Watershed Based Document Image Analysis
Document image analysis is used to segment and classify regions of a document image into categories such as text, graphic and background. In this paper we first review existing document image analysis approaches and discuss their limits. Then we adapt the well-known watershed segmentation in order to obtain a very fast and efficient classification. Finally, we compare our algorithm with three others, by running all the algorithms on a set of document images and comparing their results with a ground-truth segmentation designed by hand. Results show that the proposed algorithm is the fastest and obtains the best quality scores.
Paper 122: Modeling Wavelet Coefficients for Wavelet Subdivision Transforms of 3D Meshes
In this paper, a Laplacian Mixture (LM) model is proposed to accurately approximate the observed histogram of the wavelet coefficients produced by lifting-based subdivision wavelet transforms. On average, the proposed mixture model gives better histogram fitting for both normal and non-normal meshes compared to the traditionally used Generalized Gaussian (GG) distributions. Exact closed-form expressions for the rate and the distortion of the LM probability density function quantized using generic embedded deadzone scalar quantizer (EDSQ) are derived, without making high-rate assumptions. Experimental evaluations carried out on a set of 3D meshes reveals that, on average, the D-R function for the LM model closely follows and gives a better indication of the experimental D-R compared to the D-R curve of the competing GG model. Optimal embedded quantization for the proposed LM model is experimentally determined. In this sense, it is concluded that the classical Successive Approximation Quantization (SAQ) is an acceptable, but in general, not an optimal embedded quantization solution in wavelet-based scalable coding of 3D meshes.
Paper 124: Dynamic Facial Expression Recognition using Boosted Component-Based Spatiotemporal Features and Multi-Classifier Fusion
Feature representation and classification are critical to facial expression recognition. In facial expression recognition, static images cannot provide as much discriminative information as dynamic sequences. In addition, approaches for extracting geometric features are sensitive to shape and resolution variations, and those for extracting appearance features have much redundant information. In this paper, we propose an effective component-based approach to facial expression recognition by computing spatiotemporal features from facial areas centered at 38 detected fiducial interest points and concatenating these histograms for describing facial expressions. Furthermore, since not all features are important to facial expression recognition, the issue how to select the most discriminative features for facial components is addressed by using AdaBoost for boosting the strong appearance and motion features. As far as we know, there are few applications of multi-classifier fusion for facial expression recognition. We also present a framework for multi-classifier fusion to recognize expressions based on median rule, mean rule, and product rule. Experimental studies conducted on the Cohn-Kanade database show that our approach that combines boosted component-based spatiotemporal features with multi-classifier fusion provides a better performance for expression recognition compared with earlier approaches.
Paper 125: Improved Grouping and Noise Cancellation for Automatic Lossy Compression of AVIRIS Images
An improved method for the lossy compression of the AVIRIS hyperspectral images is proposed. It is automatic and presumes blind estimation of the noise standard deviation in component images, their scaling (normalization) and grouping. A 3D DCT based coder is then applied to each group to carry out both the spectral and the spatial decorrelation of the data. To minimize distortions and provide a sufficient compression ratio, the quantization step is to be set at about 4.5. This allows removing the noise present in the original images practically without deterioration of the useful information. It is shown that for real life images the attained compression ratios can be of the order 8...35.
Paper 126: A Fast External Force Field for Parametric Active Contour Segmentation
Active contours or snakes are widely used for segmentation and tracking. We propose a new active contour model, which converges reliably even when the initialization is far from the object of interest. The proposed segmentation technique uses an external energy function where the energy slowly decreases in the vicinity of an edge. Based on this energy a new external force field is defined. Both energy function and force field are calculated using an efficient dual scan line algorithm. The proposed force field is tested on computational speed, its effect on the convergence speed of the active contour and the segmentation result. The proposed method gets similar segmentation results as the gradient vector flow and vector field convolution active contours, but the force field needs significantly less time to calculate.
Paper 131: Adaptive Constructive Polynomial Fitting
To extract geometric primitives from edges, we use an incremental linear-time fitting algorithm, which is based on constructive polynomial fitting. In this work, we propose to determine the polynomial order by observing the regularity and the increase of the fitting cost. When using a fixed polynomial order under- or even overfitting could occur. Second, due to a fixed treshold on the fitting cost, arbitrary endpoints are detected for the segments, which are unsuitable as feature points. We propose to allow a variable segment thickness by detecting discontinuities and irregularities in the fitting cost. Our method is evaluated on the MPEG-7 core experiment CE-Shape-1 database part B. In the experimental results, the edges are approximated closely by the polynomials of variable order. Furthermore, the polynomial segments have robust endpoints, which are suitable as feature points. When comparing adaptive constructive polynomial fitting (ACPF) to non-adaptive constructive polynomial fitting (NACPF), the average Hausdorff distance per segment decreases by 8.85% and the object recognition rate increases by 10.24%, while preserving simplicity and computational efficiency.
Paper 132: Football Players Classification in a Multi-camera Environment
In order to perform automatic analysis of sport videos acquired from a multi-sensing environment, it is fundamental to face the problem of automatic football team discrimination. A correct assignment of each player to the relative team is a preliminary task that together with player detection and tracking algorithms can strongly affect any high level semantic analysis. Supervised approaches for object classification, require the construction of ad hoc models before the processing and also a manual selection of different player patches belonging to the team classes. The idea of this paper is to collect the players patches coming from six different cameras, and after a pre-processing step based on CBTF (Cumulative Brightness Transfer Function) studying and comparing different unsupervised method for classification. The pre-processing step based on CBTF has been implemented in order to mitigate difference in appearance between images acquired by different cameras. We tested three different unsupervised classification algorithms (MBSAS - a sequential clustering algorithm; BCLS - a competitive one; and k-means - a hard-clustering algorithm) on the transformed patches. Results obtained by comparing different set of features with different classifiers are proposed. Experimental results have been carried out on different real matches of the Italian Serie A.
Paper 133: A Criterion of Noisy Images Quality
This work describes an objective criterion of quality estimation of fine details in the noisy images in the normalized equal color space. Comparison with the standard PSNR criterion is presented for noisy images.
Paper 134: Spectral Matching Functions and Ellipse Mappings in Search for More Uniform Chromaticity and Color Spaces
In this study, modifying the CIE xyz color matching functions was considered to achieve a more uniform chromaticity space. New color matching functions resulting both from the non-negative tensor factorization and from the optimization were combined with two ellipse mapping approaches. In both approaches the original MacAdam ellipses were mapped to the new space. The first mapping approach depended on the dominant wavelengths and the second one on the spectral information for the five points on the locus of each ellipse. Equal semiaxis lengths (a constant radius) and equal areas for the mapped MacAdam ellipses were the characteristics for the uniformity of the new chromaticity space. The new color matching functions were modelled with the non-uniform rational B-splines and the optimization modified the independent parameters, namely the control points, for NURBS. The cost function was based on the size and shape of the mapped MacAdam ellipses. NURBS were also utilized as a smoothing operator when the color matching functions were directly output from the optimization task. The results indicate that the modified color matching functions yield in more uniform chromaticity space. There still remains uncertainty about the ellipse mapping approaches and formulation on the cost function in the optimization tasks.
Paper 135: Projection Selection Algorithms for Discrete Tomography
In this paper we study how the choice of projection angles affect the quality of the discrete tomographic reconstruction of an object. We supply four different strategies for selecting projection angle sets and compare them by conducting experiments on a set of software phantoms. We also discuss some consequences of our observations. Furthermore, we introduce a possible application of the proposed angle selection algorithms as well.
Paper 136: Real-Time Retrieval of Near-duplicate Fragments in Images and Video-clips
Detection and localization of unspecified similar fragments in random images is one of the most challenging problems in CBVIR (classic techniques focusing on full-image or sub-image retrieval usually fail in such a problem). We propose a new method for near-duplicate image fragment matching using a topology-based framework. The method works on visual data only, i.e. no semantics or a'priori knowledge is assumed. Near-duplicity of image fragments is modeled by topological constraints on sets of matched keypoints (instead of geometric constrains typically used in image matching). The paper reports a time-efficient (i.e. capable of working in real time with a video input) implementation of the proposed method. The application can be run using a mid-range personal computer and a medium-quality video camera.
Paper 137: Object Tracking over Multiple Uncalibrated Cameras using Visual, Spatial and Temporal Similarities
Developing a practical multi-camera tracking solution for autonomous camera networks is a very challenging task, due to numerous constraints such as limited memory and processing power, heterogeneous visual characteristics of objects between camera views, and limited setup time and installation knowledge for camera calibration. In this paper, we propose a unified multi-camera tracking framework, which can run online in real-time and can handle both independent field of view and common field of view cases. No camera calibration, knowledge of the relative positions of cameras, or entry and exit locations of objects is required. The memory footprint of the framework is minimised by the introduction of reusing kernels. The heterogeneous visual characteristics of objects are addressed by a novel location-based kernel matching method. The proposed framework has been evaluated using real videos captured in multiple indoor settings. The framework achieves efficient memory usage without compromising tracking accuracy.
Paper 138: A Virtual Curtain for the Detection of Humans and Access Control
Biometrics has become a popular field for the development of techniques that aim at recognizing humans based upon one or more intrinsic physical or behavioral traits. In particular, many solutions dedicated to access control integrate biometric features like fingerprinting or face recognition.
Abstract. This paper describes a new method designed to interpret what happens when crossing an invisible vertical plane, called virtual curtain hereafter, at the footstep of a door frame. It relies on the use of two laser scanners located in the upper corners of the frame, and on the classification of the time series of the information provided by the scanners after registration. The technique is trained and tested on a set of sequences representative for multiple scenarios of normal crossings by a single person and for tentatives to fool the system.
We present the details of the technique and discuss classification results. It appears that the technique is capable to recognize many scenarios which may lead to the development of new commercial applications.
Paper 142: Mapping GOPs in an Improved DVC to H.264 Video Transcoder
In mobile to mobile video communications, both transmitter and receptor should keep low complexity constrains during video compression and decompression processes. Traditional video codecs have highly complex encoders and less complex decoders whereas the Wyner-Ziv video coding paradigm inverses the complexity by using more complex decoders and less complex encoders. For this reason, transcoding from Wyner-Ziv to H.264 provides a suitable framework where both devices have low complexity constraints. This paper proposes a flexible Wyner-Ziv to H.264 transcoder which allows us to map from a Wyner-Ziv GOP pattern to a traditional I11P H.264 GOP. Furthermore, the transcoding process is improved when reusing the motion vectors that have been calculated during the Wyner-Ziv decoding process to reduce the H.264 motion estimation complexity. Accordingly a time reduction up to 72% is obtained without significant rate-distortion loss.
Paper 144: Segmentation of Inter-Neurons in Three Dimensional Brain Imagery
Segmentation of neural cells in three dimensional fluorescence microscopy images is a challenging image processing problem. In addition to being important to neurobiologists, accurate segmentation is a vital component of an automated image processing system. Due to the complexity of the data, particularly the extreme irregularity in neural cell shape, generic segmentation techniques do not perform well. This paper presents a novel segmentation technique for segmenting neural cells in three dimensional images. Accuracy rates of over 90% are reported on a data set of 100 images containing over 130 neural cells and subsequently validated using a novel data set of 64 neurons.
Paper 146: An Efficient Mode Decision Algorithm for Combined Scalable Video Coding
Scalable video coding (SVC) is an extension of H.264/AVC that is used to provide a video standard for scalability. Scalability refers to the capability of recovering physically meaningful image or video information by decoding only partial compressed bitstreams. Scalable coding is typically accomplished by providing multiple layers of a video, in terms of quality resolution, spatial resolution, temporal resolution, or combinations of these options. To increase the coding efficiency, SVC adapts the inter layer prediction which uses the information of base layer to encode the enhancement layers. Due to the inter layer prediction, the computational complexity of SVC is much more complicated than that of H.264/AVC, such as mode decision based on rate- distortion optimization (RDO) and hierarchical bi-directional motion estimation. In this paper, we propose a fast mode decision algorithm for combined scalability to reduce the complexity. Experimental results show that the proposed algorithm achieves up to a 48% decrease in the encoding time with a negligible loss of visual quality and increment of bit rates.
Paper 148: A New Approach of GPU Accelerated Visual Tracking
In this paper a fast and robust visual tracking approach based on GPU acceleration is proposed. It is an effective combination of two GPU-accelerated algorithms. One is a GPU accelerated visual tracking algorithm based on the Efficient Second-order Minimization (GPU-ESM) algorithm. The other is a GPU based Scale Invariant Feature Transform (SIFT) algorithm, which is used in those extreme cases for GPU-ESM tracking algorithm, i.e. large image differences, occlusions etc. System performances have been greatly improved by our combination approach. We have extended the tracking region from a planar region to a 3D region. Translation details of both GPU algorithms and their combination strategy are described. System performances are evaluated with experimental data. Optimization techniques are presented as a reference for GPU application developers.
Paper 151: An Effective Rigidity Constraint for Improving RANSAC in Homography Estimation
A homography is a projective transformation which can relate two images of the same planar surface taken from two different points of view. It can hence be used to register images of scenes that can be assimilated to planes. For this purpose a homography is usually estimated by solving a system of equations involving several couples of points detected at different coordinates in two different images, but located at the same position in the real world. A usual and efficient way to obtain a set of good point correspondences is to start from a putative set obtained by some means and to sort out the good correspondences (inliers) from the wrong ones (outliers) using the so-called RANSAC algorithm. This algorithm relies on a statistical approach which necessitates to iteratively estimate many homographies from randomly chosen sets of four-correspondences. Unfortunately, homographies obtained in this way do not necessarily reflect a rigid transformation. Depending on the number of outliers, evaluating such degenerated cases in RANSAC drastically slows down the process and can even lead to wrong solutions. In this paper we present the expression of a lightweight rigidity constraint and show that it speeds-up the RANSAC process and prevents degenerated homographies.
Paper 153: Content-Based Retrieval of Aurora Images Based on the Hierarchical Representation
The boundary based image segmentation and representation system takes a thinned edge image and produces a unique hierarchical representation using a graph/tree data structure. The feature extraction algorithms have been developed to obtain geometric features by directly processing the graph/tree hierarchical data structure for diverse image processing and interpretation applications. This paper describes a content-based image retrieval system for the retrieval of aurora images utilizing the graph/tree hierarchical representation and the associated geometric feature extraction algorithms which extract features for the purpose of classification. The experimental results which prove that the hierarchical representation supports the fast and reliable computation of several geometric features which are useful for content based image retrieval are presented.
Paper 154: 3D Surface Reconstruction Using Structured Circular Light Patterns
Reconstructing a 3D surface from a 2D projected image has been a widely studied issue as well as one of the most important problems in image processing. In this paper, we propose a novel approach to reconstructing 3D coordinates of a surface from a 2D image taken by a camera using projected circular light patterns. Known information (i.e. intrinsic and extrinsic parameters of the camera, the structure of the circular patterns, a fixed optical center of the camera and the location of the reference plane of the surface) provides a mathematical model for surface reconstruction. The reconstruction is based on a geometrical relationship between a given pattern projected onto a 3D surface and a pattern captured in a 2D image plane from a viewpoint. This paper chiefly deals with a mathematical proof of concept for the reconstruction problem.
Paper 156: Computing Saliency Map from Spatial Information in Point Cloud Data
Saliency detection in 2D and 3D images has been extensively used in many computer vision applications such as obstacle detection, object recognition and segmentation. In this paper we present a new saliency detection method which exploits the spatial irregularities in an environment. A Time-of-Flight (TOF) camera is used to obtain 3D points that represent the available spatial information in an environment. Two separate saliency maps are calculated by employing local surface properties (LSP) in different scales and the distance between the points and the camera. Initially, residuals representing the irregularities are obtained by fitting planar patches to the 3D points in different scales. Then, residuals within the spatial scales are combined and a saliency map in which the points with high residual values represent non-trivial regions of the surfaces is generated. Also, another saliency map is generated by using the proximity of each point in the point cloud data. Finally, two saliency maps are integrated by averaging and a master saliency map is generated.
Paper 158: A Practical Approach For Calibration of Omnidirectional Stereo Cameras
This paper presents a calibration method of an omnidirectional stereo camera (ODSC) for the purpose of long distance measurement. Existing calibration methods can be used for calibration of an ODSC, but they may be applicable either to calibration of the ODSC with a small baseline or to individual calibration of its two cameras independently. In practice, it is difficult to calibrate the ODSC with a large baseline. A calibration test pattern, which is simultaneously captured by the two cameras of an ODSC system, appears very small in at least one of the cameras. Nevertheless, the baseline should be large enough for long distance measurement to ensure consistency of the estimated distance. In this paper, therefore, we propose a calibration method of the ODSC with a large baseline and verify its feasibility by presenting the experimental results of its distance estimation.
Paper 168: Fusing Large Volumes of Range and Image Data for Accurate Description of Realistic 3D Scenes
Hand-held time-of-flight laser scene scanners provide very large volumes of 3D range (coordinate) and optical (colour) measurements for modelling visible surfaces of real 3D scenes. To obtain an accurate model, the measurement errors resulting e.g. in gaps, uncertain edges and small details have to be detected and corrected. This paper discusses possibilities of using multiple calibrated scene images collected simultaneously with the range data for getting a more complete and accurate scene model. Experiments show that the proposed approach eliminates a number of range errors while still may fail on intricate disjoint surfaces that can be met in practice.
Paper 173: Fast Mean Shift Algorithm Based on Discretisation and Interpolation
A fast mean shift algorithm for processing the image data is presented. Although it is based on the known basic principles of the original mean shift method, it improves the computational speed substantially. It is being assumed that the spatial image coordinates and range coordinates can be discretised by introducing a regular grid. Firstly, the algorithm precomputes the values of shifts at the grid points. The mean shift iterations are then carried out by making use of the grid values and trilinear interpolation. In the paper, it is shown that this can be done effectively. Measured by the order of complexity, the values at all grid points can be precomputed in the time that is equal to the time required, in the original method, for computing only one mean shift iteration for all image points. The interpolation step is computationally inexpensive. The experimental results confirming the theoretical expectations are presented. The use of the step kernel for computing the shifts (corresponding to the Epanechnikov kernel for estimating the densities), and the images with only a single value at each pixel are required.
Paper 176: A Novel Rate Control Method for H.264/AVC Based on frame complexity and importance
In this paper, we present a new rate-control algorithm based on frame complexity and importance (CI) for H.264/AVC video coding. The proposed method aims at selecting accurate quantization parameters for inter-coded frames according to the target bit rates, by accurately predicting frame CI using the statistics of previously encoded frames. Bit budget is allocated to each frame adaptively updated according to its CI, combined with the buffer status. We compare the proposed method with JVT-G012 used by H.264/AVC with the software JM10.1. The PSNR performance of video coding is improved by the proposed method from 0.142 to 0.953 dB, and the BDPSNR performance is improved from 0.248 to 0.541dB. The proposed method can also provide more consistent visual quality and alleviated sharp drops for frames caused by high motions or scene changes with the PSNR standard deviation decreases from 0.134 to 1.514dB.
Paper 177: An Edge-Sensing Universal Demosaicing Algorithm
In this paper, we introduce an edge detection algorithm for mosaiced images which can be used to enhance generic demosaicing algorithms. The algorithm is based on pixels color differences in the horizontal, vertical and diagonal directions. By using our edge-detection technique to enhance the universal demosaicing algorithm of Lukac et al., experimental results show that the presence of color shifts and artefacts in demosaiced images is reduced. This is confirmed in regard to both subjective and objective evaluation.
Paper 178: Estimating 3D Polyhedral Building Models by Registering Aerial Images
We describe a model driven approach for extracting simple 3D polyhedral building models from aerial images. The novelty of the approach lies in the use of featureless and direct optimization based on image rawbrightness. The 3D polyhedral model is estimated by optimizing a criterion that combines a global dissimilarity measure and a gradient score over several aerial images. The proposed approach gives more accurate 3D reconstruction than feature-based approaches since it does not involve intermediate noisy data (e.g., the 3D points of a noisy Digital Elevation Maps). We provide experiments and evaluations of performance. Experimental results show the feasibility and robustness of the proposed approach.
Paper 179: Scalable H.264 Wireless Video Transmission over MIMO-OFDM Channels
A cross-layer optimization scheme is proposed for scalable video transmission over wireless Multiple Input Multiple Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) systems. The scalable video coding (SVC) extension of H.264/AVC is used for video source coding. The proposed cross-layer optimization scheme jointly optimizes application layer parameters and physical layer parameters. The objective is to minimize the expected video distortion at the receiver. Two methods have been developed for the estimation of video distortion at the receiver, which is essential for the cross-layer optimization. In addition, two different priority mappings of the SVC scalable layers are considered. Experimental results are provided and conclusions are drawn.
Paper 181: Toward the Detection of Urban Infrastructure Edge Shadows
In this paper, we propose a novel technique to detect the shadows cast by urban infrastructure, such as buildings, billboards, and traffic signs, using a sequence of images taken from a fixed camera. In our approach, we compute two different background models in parallel: one for the edges and one for the reflected light intensity. An algorithm is proposed to train the system to distinguish between moving edges in general and edges that belong to static objects, creating an edge background model. Then, during operation, a background intensity model allow us to separate between moving and static objects. Those edges included in the moving objects and those that belong to the edge background model are subtracted from the current image edges. The remaining edges are the ones cast by urban infrastructure. Our method is tested on a typical crossroad scene and the results show that the approach is sound and promising.
Paper 187: Neural Image Thresholding Using SIFT: A Comparative Study
The task of image thresholding mainly classifies the image data into two regions, a necessary step in many image analysis and recognition applications. Different images, however, possess different characteristics making the thresholding by one single algorithm very difficult if not impossible. Hence, to optimally binarize a single image, one must usually try more than one threshold in order to obtain maximum segmentation accuracy. This approach could be very complex and time-consuming especially when a large number of images should be segmented in real time. Generally the challenge arises because any thresholding method may perform well for a certain image class but not for all images. In this paper, a supervised neural network is used to ``dynamically'' threshold images by learning the suitable threshold for each image type. The thresholds generated by the neural network can be used to binarize the images in two different ways. In the first approach, the scale-invariant feature transform (SIFT) method is used to assign a number of key points to the whole image. In the second approach,the SIFT is used to assign a number of key points within a rectangle around the region of interest. The results of each test are compared with the Otsu algorithm, active shape models (ASM), and level sets technique (LS). The neural network is trained using a set of features extracted from medical images randomly selected form a sample set and then tested using the remaining images. This process is repeated multiple times to verify the generalization ability of the network. The average of segmentation accuracy is calculated by comparing every segmented image with corresponding gold standard images.
Paper 188: Exploiting Neighbors for Faster Scanning Window Detection in Images
Detection of objects through scanning windows is widely used and accepted method. The detectors traditionally do not make use of information that is shared between neighboring image positions although this fact means that the traditional solutions are not optimal. Addressing this, we propose an efficient and computationally inexpensive approach how to exploit the shared information and thus increase speed of detection. The main idea is to predict responses of the classifier in neighbor windows close to the ones already evaluated and skip such positions where the prediction is confident enough. In order to predict the responses, the proposed algorithm builds a new classifier which reuses the set of image features already exploited. The results show that the proposed approach can reduce scanning time up to four times with only minor increase of error rate. On the presented examples it is shown that, it is possible to reach less than one feature computed on average per single image position. The paper presents the algorithm itself and also results of experiments on several data sets with different types of image features.
Paper 189: Recognizing Objects in Smart Homes based on Human Interaction
We propose a system to recognize objects with a camera network in a smart home. Recognizing objects in a home environment from images is challenging, due to the variation in object appearances such as chairs, as well as the clutters in the scene. Therefore, we propose to recognize objects through user interactions. A hierarchical activity analysis is first performed in the system to recognize fine-grained activities including eating, typing, cutting etc. The object-activity relationship is encoded in the knowledge base of a Markov logic network (MLN). MLN has the advantage of encoding relationships in an intuitive way with first-order logic syntax. It can also deal with both soft and hard constraints by associating weights to the formulas in the knowledge base. With activity observations, the defined MLN is grounded and turned into a dynamic Bayesian network (DBN) to infer object type probabilities. We expedite inference by decomposing the MLN into smaller separate domains that relates to the active activity. Experiment results are presented with our testbed smart home environment.
Paper 192: Anatomy-based registration of isometrically transformed surfaces using geodesic area functionals
A novel method for registration of isometrically transformed surfaces is introduced. The isometric transformation is locally decomposed into a sequence of low order transformations after manual analysis and partition of the template surface into its elementary parts. The proposed method employs geodesic moments, first, to find matching corresponding key points, and second, to generate matching regions for each of the object's parts. The local transformation is estimated using second order moments of the corresponding regions. The method operation is demonstrated on the TOSCA dog object
Paper 193: Digital Image Tamper Detection Based on Multimodal fusion of Residue Features
In this paper, we propose a novel formulation involving fusion of noise and quantization residue features for detecting tampering or forgery in video sequences. We reiterate the importance of feature selection techniques in conjunction with fusion to enhance the tamper detection accuracy. We examine three different feature selection techniques, the independent component analysis (ICA), fisher linear discriminant analysis (FLD) and canonical correlation analysis (CCA) for achieving a more discriminate subspace for extracting tamper signatures from quantization and noise residue features. The evaluation of proposed residue features, the feature selection techniques and their subsequent fusion for copy-move tampering emulated on low bandwidth Internet video sequences, show a significant improvement in tamper detection accuracy with fusion formulation.
Paper 195: A New System for Event Detection from Video Surveillance Sequences
In this paper, we present an overview of a hybrid approach for event detection from video surveillance sequences that has been de- veloped within the REGIMVid project. This system can be used to in- dex and search the video sequence by the visual content. The platform provides moving object segmentation and tracking, High-level feature extraction and video event detection.We describe the architecture of the system as well as providing an overview of the descriptors supported to date. We then demonstrate the usefulness of the toolbox in the context of feature extraction, events learning and detection in large collection of video surveillance dataset.
Paper 196: The Extraction Of Venation From Leaf Images By Evolved Vein Classifiers And Ant Colony Algorithms
Leaf venation is an important source of data for research in comparative plant biology. This paper presents a method for evolving classifiers capable of extracting the venation from leaf images. Quantitative and qualitative analysis of the classifier produced is carried out. The results show that the method is capable of the extraction of near complete primary and secondary venations with relatively little noise. For comparison, a method using ant colony algorithms is also discussed.
Paper 197: Learning to Adapt: a Method For Automatic Tuning of Algorithm Parameters
Most computer vision algorithms have parameters that must be hand-selected using expert knowledge. These parameters hinder the use of large computer vision systems in real-world applications. In this work, a method is presented for automatically and continuously tuning the parameters of algorithms in a real-time modular vision system. In the training phase, a human expert teaches the system how to adapt the algorithm parameters based on training data. During operation, the system measures features from the inputs and outputs of each module and decides how to modify the parameters. Rather than learning good parameter values in absolute terms, incremental changes are modelled based on relationships between algorithm inputs and outputs. These increments are continuously applied online so that parameters stabilise to suitable values. The method is demonstrated on a three-module people-tracking system for video surveillance.
Paper 199: A New Perceptual Edge Detector in Color Images
In this paper we propose a new perceptual edge detector based on anisotropic linear filtering and local maximization. The novelty of this approach resides in the mixing of ideas coming both from perceptual grouping and directional recursive linear filtering. We obtain new edge operators enabling very precise detection of edge points which are involved in large structures. This detector has been tested successfully on various image types presenting difficult problems for classical edge detection methods.
Paper 202: Canny Edge Detection Using Bilateral Filter on Real Hexagonal Structure
Edge detection plays an important role in image processing area. This paper presents a Canny edge detection method based on bilateral filtering which achieves better performance than single Gaussian filtering. In this form of filtering, both spatial closeness and intensity similarity of pixels are considered in order to preserve important visual cues provided by edges and reduce the sharpness of transitions in intensity values as well. In addition, the edge detection method proposed in this paper is achieved on sampled images represented on a real hexagonal structure. Due to the compact and circular nature of the hexagonal lattice, a better quality edge map is obtained on the hexagonal structure than common edge detection on square structure. Experimental results using proposed methods exhibit also the faster speed of detection on hexagonal structure.
Paper 203: Pseudo-Morphological Image Diffusion using the Counter-Harmonic Paradigm
Relationships between linear and morphological scale-spaces have been considered by various previous works. The aim of this paper is to study how to generalize the diffusion-based approaches in order to introduce nonlinear filters which effects mimic morphological dilation and erosion. A methodology based on the counter-harmonic mean is adopted here. Details of numerical implementation are discussed and results are provided to illustrate behaviour of various studied cases: isotropic, nonlinear and coherence-enhanced diffusion. We also rediscover the classical link between Gaussian scale-space and dilation/erosion scale-spaces based on quadratic structuring functions.
Paper 204: Statistical Rail Surface Classification based on 2D and 2.5D Image Analysis
We present an approach to high-resolution rail surface analysis combining 2D image texture classification and 2.5D analysis of surface disruptions. Detailed analysis of images of rail surfaces is used to observe the condition of rails and as a precaution to avoid rail breaks and further damage. Single rails are observed by a color line scan camera at high resolution of approximately 0.2 millimeters and under special illumination in order to enable 2.5D image analysis. Gabor filter banks are used for 2D texture description and classes are modeled by Gaussian mixtures. A Bayesian classifier, which also incorporates background knowledge, is used to differentiate between surface texture classes. Classes which can be related to surface disruption are derived from the analysis of the anti-correlation properties between two color channels. Images are illuminated by two light sources mounted at different position and operating at different wavelengths. Results for data gathered in the Vienna metro system are presented.
Paper 205: Shape And Texture Based Plant Leaf Classification
This article presents a novel method for classification of plants using their leaves. Most plant species have unique leaves which differ from each other by characteristics such as the shape, colour, texture and the margin. The method introduced in this study proposes to use two of these features: the shape and the texture. The shape-based method will extract the contour signature from every leaf and then calculate the dissimilarities between them using the Jeffrey-divergence measure. The orientations of edge gradients will be used to analyse the macro-texture of the leaf. The results of these methods will then be combined using an incremental classification algorithm.
Paper 207: Non-maximum Suppression Using fewer than Two Comparisons per Pixel
Non-Maximum Suppression (NMS) is the task of finding all local maxima in an image. This is often solved using gray-scale image dilation, which requires at least 6 comparisons per pixel in 2-D. We present two solutions that use fewer than 2 comparisons per pixel with little memory overhead. The first algorithm locates 1-D peaks along the image's scan-line and compares each of these peaks against its 2-D neighborhood in a spiral scan order. The second algorithm selects local maximum candidates from the maxima of non-overlapping blocks of one-fourth the neighborhood size. Both algorithms run considerably faster than current best methods in the literature when applied to feature point detection. Matlab code of the proposed algorithms is provided for evaluation purposes.
Paper 209: Augmented Reality with Human Body Interaction Based on Monocular 3D Pose Estimation
We present an augmented reality interface with markerless human body interaction. It consists of 3D motion capture of the human body and the processing of 3D human poses for augmented reality applications. A monocular camera is used to acquire the images of the user's motion for 3D pose estimation. In the proposed technique, a graphical 3D human model is first constructed. Its projection on a virtual image plane is then used to match the silhouettes obtained from the image sequence. By iteratively adjusting the 3D pose of the graphical 3D model with the physical and anatomic constraints of the human motion, the human pose and the associated 3D motion parameters can be uniquely identified. The obtained 3D pose information is then transferred to the reality processing subsystem and used to achieve the marker-free interaction in the augmented environment. Experimental results are presented using a head mounted display.
Paper 211: Face Recognition using Contourlet Transform and Multidirectional Illumination from a Computer Screen
Images of a face under arbitrary distant point light source illuminations can be used to construct its illumination cone or a linear subspace that represents the set of facial images under all possible illuminations. However, such images are difficult to acquire in everyday life due to limitations of space and light intensity. This paper presents an algorithm for face recognition using multidirectional illumination generated by close and extended light sources, such as the computer screen. The contourlet coefficients of training faces at multiple scales and orientations are calculated and projected separately to PCA subspaces and stacked to form feature vectors. These vectors are projected once again to a linear subspace and used for classification. During testing, similar features are calculated for a query face and matched with the training data to find its identity. Experiments were performed using in house data comprising 4347 images of 106 subjects and promising results were achieved. The proposed algorithm was also tested on the extended Yale B and CMU-PIE databases for comparison of results with existing techniques.
Paper 212: Surface Reconstruction of Wear in Carpets by Using a Wavelet Edge Detector
Carpet manufacturers have wear labels assigned to their products by human experts who evaluate carpet samples subjected to accelerated wear in a test device. There is considerable industrial and academic interest in going from human to automated evaluation, which should be less cumbersome and more objective. In this paper, we present image analysis research on videos of carpet surfaces scanned with a 3D laser. The purpose is obtaining good depth images for an automated system that should have a high percentage of correct assessments for a wide variety of carpets. The innovation is the use of a wavelet edge detector to obtain a more continuously defined surface shape. The evaluation is based on how well the algorithms allow a good linear ranking and a good discriminance of consecutive wear labels. The results show an improved linear ranking for most carpet types, for two carpet types the results are quite significant.
Paper 213: A GPU-Accelerated Real-Time NLMeans Algorithm for Denoising Color Video Sequences
The NLMeans filter, originally proposed by Buades et al., is a very popular filter for the removal of white Gaussian noise, due to its simplicity and excellent performance. The strength of this filter lies in exploiting the repetitive character of structures in images. However, to fully take advantage of the repetitivity a computationally extensive search for similar candidate blocks is indispensable. In previous work, we presented a number of algorithmic acceleration techniques for the NLMeans filter for still grayscale images. In this paper, we go one step further and incorporate both temporal information and color information into the NLMeans algorithm, in order to restore video sequences. Starting from our algorithmic acceleration techniques, we investigate how the NLMeans algorithm can be easily mapped onto recent parallel computing architectures. In particular, we consider the graphical processing unit (GPU), which is available on most recent computers. Our developments lead to a high-quality denoising filter that can process DVD-resolution video sequences in real-time on a mid-range GPU.
Paper 215: Combining Geometric Edge Detectors for Feature Detection
We propose a novel framework for the analysis and modeling of discrete edge filters, based on the notion of signed rays. This framework will allow us to easily deduce the geometric and localisation properties of a family of first order filters, and use this information to design custom filter banks for specific applications. As an example, a set of angle-selective corner detectors is constructed for the detection of buildings in video sequences. This clearly illustrates the merit of the theory for solving practical recognition problems.
Paper 217: Gender Classification on Real-Life Faces
Gender recognition is one of fundamental tasks of face image analysis. Most of the existing studies have focused on face images acquired under controlled conditions. However, real-world applications require gender classification on real-life faces, which is much more challenging due to significant appearance variations in unconstrained scenarios. In this paper, we investigate gender recognition on real-life faces using the recently built database, the Labeled Faces in the Wild (LFW). Local Binary Patterns (LBP) is employed to describe faces, and Adaboost is used to select the discriminative LBP features. We obtain the performance of 94.44% by applying Support Vector Machine (SVM) with the boosted LBP features. The public database used in this study makes future benchmark and evaluation possible.
Paper 218: A Template Matching And Ellipse Modeling Approach To Detecting Lane Markers
Lane detection is an important element of most driver assistance applications. A new lane detection technique that is able to withstand some of the common issues like illumination changes, surface irregularities, scattered shadows, and presence of neighboring vehicles is presented in this paper. At first, inverse perspective mapping and color space conversion is performed on the input image. Then, the images are cross-correlated with a collection of predefined templates to find candidate lane regions. These regions then undergo connected components analysis, morphological operations, and elliptical projections to approximate positions of the lane markers. The implementation of the Kalman filter enables tracking lane markers on curved roads while RANSAC helps improve estimates by eliminating outliers. Finally, a new method for calculating errors between the detected lane markers and ground truth is presented. The developed system showed good performance when tested with real-world driving videos containing variations in illumination, road surface, and traffic conditions.
Paper 219: Trabecular Bone Anisotropy Characterization using 1D Local Binary Patterns
This paper presents a new method to characterize the texture of gray level bone radiographic images. The technique is inspired from the Local Binary Pattern descriptor which has been classically applied on two dimensional (2D) images. Our algorithm is a derived solution for the 1D projected fields of the 2D images. The method requires a series of preprocessing of images. A clinical study is led on two populations of osteoporotic and control patients. The results show the ability of our technique to better discriminate the two populations than the classical LBP method. Moreover, they show that the structural organization of bone is more anisotropic for the osteoporotic cases than that of the control cases in accordance with the natural evolution of bone tissue linked to osteoporosis.
Paper 220: Video Quality Analysis for Concert Video Mashup Generation
Videos recorded by the audience in a concert provide natural and lively views from different angles. However, such recordings are generally incomplete and suffer from low signal quality due to poor lighting conditions and use of hand- held cameras. It is our objective to create an enriched video stream by combining high-quality segments from multiple recordings, called mashup. In this paper, we describe techniques for quality measurements of video, such as blockiness, blurriness, shakiness and brightness. These measured values are merged into an overall quality metric that is applied to select high-quality segments in generating mashups. We compare our mashups, generated using the quality metric for segment selection, with manually and randomly created mashups. The results of a subjective evaluation show that the perceived qualities of our mashups and the manual mashups are comparable, while they are both significantly higher than the random mashups.
Paper 222: Long-Range Inhibition in Reaction-Diffusion Algorithms Designed for Edge Detection and Stereo Disparity Detection
The present paper demonstrates the significance of long-range inhibition in reaction-diffusion algorithms designed for edge detection and stereo disparity detection. In early visual systems, the long-range inhibition plays an important role in brightness perception. The most famous illusory perception due to the long-range inhibition is the Mach bands effect, which is observed in a visual system of an animal and also in the human visual system. The long-range inhibition also appears in the computer vision algorithm utilising the difference of two Gaussian filters for edge detection. Upon evidence implying analogy between brightness perception and stereo depth perception, several psychologists have suggested that such the long-range inhibition works not only in the brightness perception, but also in the depth perception. We previously proposed biologically motivated reaction-diffusion algorithms designed for edge detection and stereo disparity detection. Thus, we show that the long-range inhibition also plays an important role in both of the reaction-diffusion algorithms through experimental study. Results of the study provide a new idea of improving performance of the reaction-diffusion stereo algorithm.
Paper 223: Salient-SIFT for Image Retrieval
Local descriptors have been wildly explored and utilized in image retrieval because of their transformation invariance. In this paper, we propose an improved set of features extarcted from local descriptors for more effective and efficient image retrieval. We propose a salient region selection method to detect human's Region Of Interest (hROI) from an image, which incorporates the Canny edge algorithm and the convex hull method into Itti's saliency model for obtaining hROI's. Our approach is a purely bottom-up process with better robustness. The salient region is used as a window to select the most distinctive features out of the Scale-Invariant Feature Transform (SIFT) features. Our proposed SIFT local descriptors is termed as salient-SIFT features. Experiment results show that the salient-SIFT features can characterize the human perception well and achieve better image retrieval performance than the original SIFT descriptors while the computational complexity is greatly reduced.
Paper 225: Recognizing Human Actions by using Spatio-Temporal Motion Descriptors
This paper presents a novel tool for detecting human actions in stationary surveillance camera videos. In the proposed method there is no need to detect and track the human body or to detect the spatial or spatio-temporal interest points of the events. Instead our method computes single-scale spatio-temporal descriptors to characterize the action patterns. Two different descriptors are evaluated: histograms of optical flow directions and histograms of frame difference gradients. The integral video method is also presented to improve the performance of the extraction of these features. We evaluated our methods on two datasets: a public dataset containing actions of persons drinking and a new dataset containing stand up events. According to our experiments both detectors are suitable for indoor applications and provide a robust tool for practical problems such as moving background, or partial occlusion.
Paper 227: Design of a Real-Time Embedded Stereo Smart Camera
This paper describes the architecture of a new smart vision system called BiSeeMos. This smart camera is designed for stereo vision purposes and the implementation of a simple dense stereo vision algorithm. The architecture has been designed for dedicated parallel algorithms in using a high performance FPGA. This chip provides the user with useful features for vision processing as integrated RAM blocks, embedded multipliers, phase locked loops and plenty of logic elements. In this paper, a description of our architecture and a comparison versus others works is done. A dense stereo vision algorithm has been implemented on the platform using the Census method.
Paper 229: Optimal Trajectory Space Finding for Nonrigid Structure From Motion
The deformation in nonrigid structure from motion can be modeled either in shape domain or in time domain. Here, we view the deformation in time domain, model the trajectory of each 3D point as a linear combination of trajectory bases, and present a novel method to automatically find the trajectory bases based on orthographic camera assumption. In this paper, a linear relation is explicitly derived between 2D projected trajectory and 3D trajectory bases. With this formulation, an approximation is formulated for finding 3D trajectory bases which cast the trajectory bases finding into a problem of eigenvectors' searching. Using the approximated trajectory bases as a start point, an EM-like algorithm is proposed which refine the trajectory bases and the corresponding coefficients. The proposed method demonstrates satisfactory results on both the synthetic and real data.
Paper 232: Hit-or-Miss Transform in Multivariate Images
The Hit-or-Miss transform (HMT) is a well-known morphological operator for template matching in binary images. A novel approach for HMT for multivariate images is introduced in this paper. The generic framework is a generalization of binary case based on a $h$-supervised ordering formulation which leads to reduced orderings. In particular, in this paper we focus on the application of HMT for target detection on high-resolution images. The visual results of the experiments show the performance of proposed approach.
Paper 233: Topological SLAM Using Omnidirectional Images: Merging Feature Detectors and Graph-Matching
Image feature extraction and matching is useful in many areas of robotics such as object and scene recognition, autonomous navigation, SLAM and so on. This paper describes a new approach to the problem of matching features and its application to scene recognition and topological SLAM. For that purpose we propose a prior image segmentation into regions in order to group the extracted features in a graph so that each graph defines a single region of the image. This image segmentation considers that the left part of the image is the continuation of the right part. The matching process will take into account the features and the structure (graph) using the GTM algorithm. Then, using this method of comparing images, we propose an algorithm for constructing topological maps. During the experimentation phase we will test the robustness of the method and its ability constructing topological maps. We have also introduced a new hysteresis behavior in order to solve some problems found in construction of the graph.
Paper 234: An Analysis of the Road Signs Classification Based on the Higher-Order Singular Value Decomposition of the Deformable Pattern Tensors
The paper presents a framework for classification of rigid objects in digital images. It consists of a generator of the geometrically deformed prototypes and an ensemble of classifiers. The role of the former is to provide a sufficient training set for subsequent classification of deformed objects in real conditions. This is especially important in cases of a limited number of available prototype exemplars. Classification is based on the Higher-Order Singular Value Decomposition of tensors composed from the sets of deformed prototypes. Construction of such deformable tensors is flexible and can be done independently for each object. They can be obtained either from a single prototype, which is then affinely deformed, or from many real exemplars, if available. The method was tested in the task of recognition of the prohibition road signs. Experiments with real traffic scenes show that the method is characteristic of high speed and accuracy for objects seen under different viewpoints. Implementation issues of tensor decompositions are also discussed
Paper 236: High Definition Feature Map for GVF Snake by Using Harris Function
In image segmentation the gradient vector flow snake model is widely used. For concave curvatures snake model has good convergence capabilities, but poor contrast or saddle corner points may result in a loss of contour. We have introduced a new external force component and an optimal initial border, approaching the final boundary as close as possible. We apply keypoints defined by corner functions and their corresponding scale to outline the envelope around the object. The Gradient Vector Flow (GVF) field is generated by the eigenvalues of Harris matrix and/or the scale of the feature point. The GVF field is featured by new functions characterizing the edginess and cornerness in one function. We have shown that the max(0; log[max(lambda1; lambda2)]) function fulfills the requirements for any active contour definitions in case of difficult shapes and background conditions. This new GVF field has several advantages: smooth transitions are robustly taken into account, while sharp corners and contour scragginess can be perfectly detected.
Paper 239: Automated Segmentation of Endoscopic Images Based on Local Shape-Adaptive Filtering and Color Descriptors
This paper presents a novel technique for automatic segmentation of wireless capsule endoscopic images. The main contribution resides in the integration of three computational blocks: 1) local polynomial approximation algorithm which finds locally-adapted neighborhood of each pixel; 2) color texture analysis which describes each pixel by a vector of numerical attributes that reflect this pixel local neighborhood characteristcs; and 3) cluster analysis (k-means) for grouping pixels into homgeneous regions based on their color information. The proposed approach leads to a robust segmentation procedure which produces fine segments well matched to the image contents.
Paper 240: Speeding Up Structure From Motion on Large Scenes using Parallelizable Partitions
Structure from motion based 3D reconstruction takes a lot of time for large scenes which consist of thousands of input images. We propose a method that speeds up the reconstruction of large scenes by partitioning it into smaller scenes, and then recombining those. The main benefit here is that each subscene can be optimized in parallel. We present a widely usable subdivision method, and show that the difference between the result after partitioning and recombination, and the state of the art structure from motion reconstruction on the entire scene, is negligible.
Paper 241: An Appearance-Based Prior for Hand Tracking
Reliable hand detection and tracking in passive 2D video still remains a challenge. Yet the consumer market for gesture-based interaction is expanding rapidly and surveillance systems that can deduce fine-grained human activities involving hand and arm postures are in high demand. In this paper, we present a hand tracking method that does not require reliable detection. We built it on top of "Flocks of Features" which combines grey-level optical flow, a "flocking" constraint, and a learned foreground color distribution. By adding probabilistic (instead of binary classified) detections based on grey-level appearance as an additional image cue, we show improved tracking performance despite rapid hand movements and posture changes. This helps overcome tracking difficulties in texture-rich and skin-colored environments, improving performance on a 10-minute collection of video clips from 75% to 86%.
Paper 242: Subjective Evaluation Of Image Quality Measures For White Noise Distorted Images
Image Quality Assessment has diverse applications. A number of Image Quality measures are proposed, but none is proved to be true representative of human perception of image quality. We have subjectively investigated spectral distance based and human visual system based image quality measures for their effectiveness in representing the human perception for images corrupted with white noise. Each of the 160 images with various degrees of white noise is subjectively evaluated by 50 human subjects, resulting in 8000 human judgments. On the basis of evaluations, image independent human perception values are calculated. The perception values are plotted against spectral distance based and human visual system based image quality measures. The performance of quality measures is determined by graphical observations and polynomial curve fitting, resulting in best performance by Human Visual System Absolute norm.
Paper 244: Constraint Optimisation for Robust Image Matching with Inhomogeneous Photometric Variations and Affine Noise
While modelling spatially uniform or low-order polynomial contrast and offset changes is mostly a solved problem, there has been limited progress in models which could represent highly inhomogeneous photometric variations. A recent quadratic programming (QP) based matching allows for almost arbitrary photometric deviations. However this QP-based approach is deficient in one substantial respect: it can only assume that images are aligned geometrically as it knows nothing about geometry in general. This paper improves on the QP-based framework by extending it to include a robust rigid registration layer thus increasing both its generality and practical utility. The proposed method shows up to 4 times improvement in the quadratic matching score over a current state-of-the-art benchmark.
Paper 245: Evaluation of Human Detection Algorithms in Image Sequences
This paper deals with the general evaluation of human detection algorithms. We first present the algorithms implemented within the CAPTHOM project dedicated to the development of a vision-based system for human detection and tracking in an indoor environment using a static camera. We then show how a global evaluation metric we developped for the evaluation of understanding algorithms taking into account both localization and recognition precision of each single interpretation result, can be a useful tool for industrials to guide them in the elaboration of suitable and optimized algorithms.
Paper 246: Noise-Robust Method for Image Segmentation
Segmentation of noisy images is one of the most challenging problems in image analysis and any improvement of segmentation methods can highly influence the performance of many image processing applications. In automated image segmentation, the fuzzy c-means (FCM) clustering has been widely used because of its ability to model uncertainty within the data, applicability to multi-modal data and fairly robust behaviour. However, the standard FCM algorithm does not consider any information about the spatial image context and is highly sensitive to noise and other imaging artefacts. Considering above mentioned problems, we developed a new FCM-based approach for the noise-robust fuzzy clustering and we present it in this paper. In this new iterative algorithm we incorporated both spatial and feature space information into the similarity measure and the membership function. We considered that spatial information depends on the relative location and features of the neighbouring pixels. The performance of the proposed algorithm is tested on synthetic image with different noise levels and real images. Experimental quantitative and qualitative segmentation results show that our method efficiently preserves the homogeneity of the regions and is more robust to noise than other FCM-based methods.
Paper 247: SUNAR: Surveillance Network Augmented by Retrieval
The paper deals with Surveillance Network Augmented by Retrieval (SUNAR) system - an information retrieval based wide area (video) surveillance system being developed as a free software at FIT BUT. It contains both standard and experimental techniques evaluated by NIST at the AVSS 2009 Multi-Camera Tracking Challenge and SUNAR performed comparably well.
In brief, SUNAR is composed of three basic modules - video processing, retrieval and the monitoring interface. Computer Vision Modules are based on the OpenCV Library for object tracking extended by feature extraction and network communication capability similar to MPEG-7. Information about objects and the area under surveillance is cleaned, integrated, indexed and stored in Video Retrieval Modules. They are based on the PostgreSQL database extended to be capable of similarity and spatio-temporal information retrieval, which is necessary for both non-overlapping surveillance camera system as well as information analysis and mining in a global context.
Paper 248: Fast Depth Saliency From Stereo For Region-Based Artificial Visual Attention
Depth is an important feature channel for natural vision organisms that helps in focusing attention on important locations of the viewed scene. Artificial visual attention systems require a fast estimation of depth to construct a saliency map based upon distance from the vision system. Recent studies on depth perception in biological vision indicate that disparity is computed using object detection in the brain. The proposed method exploits these studies and determines the shift that objects go through in the stereo frames using data regarding their borders. This enables efficient creation of depth saliency map for artificial visual attention. Results of the proposed model have shown success in selecting those locations from stereo scenes that are salient for human perception in terms of depth.
Paper 249: Combined Retrieval Strategies for Images With and Without Distinct Objects
This paper presents the design of an all-season image retrieval system. The system handles the images with and without distinct object(s) using different retrieval strategies. Firstly, based on the visual contrasts and spatial information of an image, a neural network is trained to pre-classify an image as distinct-object or no-distinct-object category by using the Back Propagation Through Structure (BPTS) algorithm. In the second step, an image with distinct object(s) is processed by an attention-driven retrieval strategy emphasizing distinct objects. On the other hand, an image without distinct object(s) (e.g., a scenery images) is processed by a fusing-all retrieval strategy. An improved performance can be obtained by using this combined approach.
Paper 250: New Saliency Point Detection and Evaluation Methods for Finding Structural Differences in Remote Sensing Images of Long Time-Span Samples
The paper introduces a novel methodology to find changes in remote sensing image series. Some remotely sensed areas are scanned frequently to spot relevant changes, and several repositories contain multi-temporal image samples for the same area. The proposed method finds changes in images scanned by a long time-interval difference in very different lighting and surface conditions. The presented method is basically an exploitation of Harris saliency function and its derivatives for finding featuring points among image samples. To fit together the definition of keypoints and their active contour around them, we have introduced the Harris corner detection as an outline detector instead of the simple edge functions.
We also demonstrate a new local descriptor by generating local active contours. Saliency points support the boundary hull definition of objects, constructing by graph based connectivity detection and neighborhood description. This graph based shape descriptor works on the saliency points of the difference and in-layer features. We prove the method in finding structural changes on remote sensing images.