Advanced Concepts for Intelligent Vision Systems
Sept. 4-7 2012
Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Acivs 2012 Abstracts
Paper 184: Machine Vision Solutions for the Forest Industry
This invited talk summarizes computer vision based research and applications for forest industry developed by Machine and Pattern Recognition Laboratory (MVPR) at Lappeenranta University of Technology (LUT) in Finland. The main focus is on the paper and board making industry. Our approach is application-oriented based on practical industrial needs. Typically industrial manufacturing consists of several process steps. At each step it is important to recognize important phenomena affecting production, to measure these phenomena, and finally, to analyze these measurements for the further control of the production. In this presentation it is shown how machine vision can be used for vision-based quality management in the papermaking industry. The objective of the research is the overall management of the whole papermaking process and the quality assessment of the paper-based end product before and after printing. The general goal is that the production is resource-efficient and environmentally sound, using less raw material, water, and energy.
Paper is a challenging media to use due to paper characteristics which affect very much printing quality. Thus, it is important to predict the quality of printing on paper or board, especially in case of images. It is necessary that printed materials look good enough to a consumer. For example, advertisement must obtain positive attention and a high-quality journal to be comfortable to read. Thus, a paper manufacturer should know which kind of quality it offers to a printing house. The quality should not be too high or too low but just sufficient for a known purpose, i.e., so called the wanted quality. Solving this problem leads to the need of the quality assessment before printing and after printing. In the both cases the visual quality assessment is usually done by manually or semiautomatically either observing manufacturing processes or test prints. In this presentation, machine vision solutions are considered where the quality prediction is performed using automatic image processing and analysis systems, without the human interaction, used in the different process steps of pulping, papermaking, and printing. The results obtained from industrial research projects consist of on-line control solutions in industrial manufacturing, off-line laboratory level tests, and frameworks for modeling connections between human perception and physical measurements based on the overall visual quality index of an image or on the regions of interest in an image.
Paper 185: Vision Realistic Rendering
Vision-realistic rendering (VRR) is the computer generation of synthetic images to simulate a subject's vision, by incorporating the characteristics of a particular individualâ€™s entire optical system. Using measured aberration data from a Shack-Hartmann wavefront aberrometry device, VRR modifies input images to simulate the appearance of the scene for the individual patient. Each input image can be a photograph, synthetic image created by computer, frame from a video, or standard Snellen acuity eye chart -- as long as there is accompanying depth information. An eye chart is very revealing, since it shows what the patient would see during an eye examination, and provides an accurate picture of his or her vision. Using wavefront aberration measurements, we determine a discrete blur function by sampling at a set of focusing distances, specified as a set of depth planes that discretize the three-dimensional space. For each depth plane, we construct an object-space blur filter. VRR methodolgy comprises several steps: (1) creation of a set of depth images, (2) computation of blur filters, (3) stratification of the image, (4) blurring of each depth image, and (5) composition of the blurred depth images to form a single vision-simulated image.
VRR provides images and videos of simulated vision to enable a patient's eye doctor to see the specific visual anomalies of the patient. In addition to blur, VRR could reveal to the doctor the multiple images or distortions present in the patient's vision that would not otherwise be apparent from standard visual acuity measurements. VRR could educate medical students as well as patients about the particular visual effects of certain vision disorders (such as keratoconus and monocular diplopia) by enabling them to view images and videos that are generated using the optics of various eye conditions. By measuring PRK/LASIK patients pre- and post-op, VRR could provide doctors with extensive, objective, information about a patient's vision before and after surgery. Potential candiates contemplating surgery could see simulations of their predicted vision and of various possible visual anomalies that could arise from the surgery, such as glare at night. The current protocol, where patients sign a consent form that can be difficult for a layperson to understand fully, could be supplemented by the viewing of a computer-generated video of simulated vision showing the possible visual problems that could be engendered by the surgery.
Paper 186: Anomaly detection in machine perception systems
Anomaly detection in engineering systems is cast as a problem of detecting outliers to the distribution of observations representing a state of normality. We focus on anomaly detection in machine perception. We argue that in addition to outlier detection, anomaly detection in machine perception systems requires other detection mechanisms. They include incongruence detection, data quality assessment, decision confidence gauging, and model drift detection. These mechanisms are elaborated and their application illustrated on a problem of anomaly detection in a sports video interpretation system.
Paper 104: Detection of Near-duplicate Patches in Random Images using Keypoint-based Features
Detection of similar fragments in unknown images is typically based on the hypothesize-and-verify paradigm. After the keypoint correspondences are found, the configuration constraints are used to identify clusters of similar and similarly transformed keypoints. This method is computationally expensive and hardly applicable to large databases. As an alternative, we propose novel affine-invariant TERM features characterizing geometry of groups of elliptical keyregions so that similar patches can be found by feature matching only. The paper overviews TERM features and reports experimental results confirming their high performances in image matching. A method combining visual words based on TERM descriptors with SIFT words is particularly recommended. Because of its low complexity, the proposed method can be prospectively used with visual databases of large sizes.
Paper 105: State-Driven Particle Filter for Multi-Person Tracking
Multi-person tracking is an active research topic that can be exploited in applications such as driver assistance, surveillance, multimedia and human-robot interaction. With the help of human detectors, particle filters offer a robust method able to filter noisy detections and provide temporal coherence. However, some traditional problems such as occlusions with other targets or with the scene, temporal drifting, or even the detection of lost targets, still remain unsolved, making the performance of the tracking systems decrease. Some authors propose to overcome these problems using heuristics not well explained and formalized in the papers, for instance by defining exceptions to the model updating depending on tracks overlapping. In this paper we propose to formalize these events by the use of a state-graph, defining the current state of the track (e.g., potential, tracked, occluded or lost) and the transitions between states in an explicit way. This approach has the advantage of linking track actions such as the online model updating to the track state, which gives flexibility to the system. It provides an explicit representation to adapt the multiple parallel trackers depending on the context, i.e., each track can make use of a specific filtering strategy, dynamic model, number of particles, etc. depending on its state. We implement this technique in a single-camera multi-person tracker and test it in public video sequences.
Paper 106: Cross-Channel Co-Occurrence Matrices for Robust Characterization of Surface Disruptions in 2.5D Rail Image Analysis
We present a new robust approach to the detection of rail surface disruptions in high-resolution images by means of 2.5D image analysis. The detection results are used to determine the condition of rails as a precaution to avoid breaks and further damage. Images of rails are taken with color line scan cameras at high resolution of about 0.2 millimeters under specific illumination to enable 2.5D image analysis. Pixel locations fulfilling the anti-correlation property between two color channels are detected and integrated over regions of general background deviations using so called cross-channel co-occurrence matrices, a novel variant of co-occurrence matrices introduced as part of this work. Consequently, the detection of rail surface disruptions is achieved with high precision, whereas the unintentional elimination of valid detections in the course of false and irrelevant detection removal is reduced. In this regard, the new approach is more robust than previous methods.
Paper 107: Quality Assurance for Document Image Collections in Digital Preservation
Maintenance of digital image libraries requires to frequently asses the quality of the images to engage preservation measures if necessary. We present an approach to image based quality assurance for digital image collections based on local descriptor matching. We use spatially distinctive local keypoints of contrast enhanced images and robust symmetric descriptor matching to calculate affine transformations for image registration. Structural similarity of aligned images is used for quality assessment. The results show, that our approach can efficiently asses the quality of digitized documents including images of blank paper.
Paper 108: DSP Embedded Smart Surveillance Sensor with Robust SWAD-based Tracker
Smart video analytics algorithms can be embedded within surveillance sensors for fast in-camera processing. This paper presents a DSP embedded video analytics system for object and people tracking, using a PTZ camera. The tracking algorithm is based on adaptive template matching and it employs a novel Sum of Weighted Absolute Differences. The video analytics is implemented on the DSP board DM6437 EVM and it automatically controls the PTZ camera, to keep the target central to the field of view. The EVM is connected to the network and the tracking algorithm can be remotely activated, so that the PTZ enhanced with the DSP embedded video analytics becomes a smart surveillance sensor. The system runs in real-time and simulation results demonstrate that the described SWAD outperforms other template matching measures in terms of efficiency and accuracy.
Paper 109: System Identification : 3D Measurement using Structured Light System
The problem of 3D reconstruction from 2D captured images is solved using a set of cocentric circular light patterns. Once the number of light sources and cameras, their location and the orientations, and the sampling density (the number of circular patterns) are determined, we propose a novel approach to representation of the reconstruction problem as system identification. Akin to system identification using the relationship between input and output, to develop an efficient 3D functional camera system, we identify the reconstruction system by choosing / defining input and output signals appropriately. One algorithm states that an input and an output are defined as projected circular patterns and 2D captured image (overlaid with deformed circular patterns) respectively. Another one is that a 3D target and the captured 2D image are defined as the input and the output respectively, leading to a problem of input estimation by demodulating an output (received) signal. The former approach identifies the system from the ratio of output to input, and is akin to a modulation-demodulation theory, the latter identifies the reconstruction system by estimating the input signal. This paper proposes the approach to identification of reconstruction system, and also substantiates the algorithm by showing results using inexpensive and simple experimental setup.
Paper 110: Particle Swarm Optimization with Soft Search Space Partitioning for Video-Based Markerless Pose Tracking
This paper proposes a new algorithm called soft partitioning particle swarm optimization (SPPSO), which performs video-based markerless human pose tracking by optimizing a fitness function in a 31-dimensional search space. The fitness function is based on foreground segmentation and edges. SPPSO divides the optimization into two stages that exploit the hierarchical structure of the model. The first stage only optimizes the most important parameters, whereas the second is a global optimization which also refines the estimates from the first stage. Experiments with the publicly available Lee walk dataset showed that SPPSO performs better than the annealed particle filter at a frame rate of 20 fps, and equally well at 60 fps. The better performance at the lower frame rate is attributed to the explicit exploitation of the hierarchical model structure.
Paper 113: Improving Histogram of Oriented Gradients with image segmentation: application to human detection
In this paper we improve the histogram of oriented gradients (HOG), a core descriptor of state-of-the-art object detection, by the use of higher-level information coming from image segmentation. The idea is to re-weight the descriptor while computing it without increasing its size. The benefits of the proposal are two-fold: (i) to improve the performance of the detector by enriching the descriptor information and (ii) take advantage of the information of image segmentation, which in fact is likely to be used in other stages of the detection system such as candidate generation or refinement.
We test our technique in the INRIA person dataset, which was originally developed to test HOG, embedding it in a human detection system. The well-known segmentation method, mean-shift (from smaller to larger super-pixels), and different methods to re-weight the original descriptor (constant, region- luminance, color or texture-dependent) has been evaluated. We achieve performance improvements of 4,47% in detection rate through the use of differences of color between contour pixel neighborhoods as re-weighting function.
Paper 114: The Mean Boundary Curve of Anatomical Objects
In this paper, we develop an algorithm to compute the median shape of a collection of planar curves for the computation of the median shape of a collection of organs. We first define the relative distortion of a pair of curves using curvatures of curves. Then, we derive the mean of curves as the curve which minimises the total distortion of a collection of shapes.
Paper 115: Rectangular Decomposition of Binary Images
The contribution deals with the most important methods for decomposition of binary images into union of rectangles. The overview includes run-length encoding and its generalization, decompositions based on quadtrees, on the distance transform, and a theoretically optimal decomposition based on maximal matching in bipartite graphs.
We experimentally test their performance in binary image compression and in convolution calculation and compare their computation times and success rates.
Paper 116: Selective Color Image Retrieval Based on the Gaussian Mixture Model
In this paper a novel technique of color based image retrieval is proposed. The image is represented by Gaussian mixtures of the set of histograms corresponding to the spatial location of the color regions within the image. The proposed approach enables to express user's needs concerning the specified color arrangements of the retrieved images, in form of the colors belonging to the eleven basic color groups along with their spatial locations. The solution proposed in this paper utilizes the mixture modeling of the information of each set of the color channels. Experimental results show that the proposed method is efficient and more flexible, when specific user's requirements are considered than other methods dealing with the problem of image retrieval based on spatial color arrangements.
Paper 117: Saliency Filtering of SIFT detectors: application to CBIR
The recognition of object categories is one of the most challenging problems in computer vision field. It is still an open problem, especially in content based image retrieval (CBIR).When using analysis algorithm, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated to manage huge amount of generated data. In human, the mechanisms of evolution have generated the visual attention system which selects the most important information in order to reduce both cognitive load and scene understanding ambiguity. In computer science, most powerful algorithms use local approaches as bag-of-features or sparse local features. In this article, we propose to evaluate the integration of one of the most recent visual attention model in one of the most efficient CBIR method. First, we present these two algorithms and the database used to test results. Then, we present our approach which consists in pruning interest points in order to select a certain percentage of them (40% to 10% ). This filtering is guided by a saliency map provided by a visual attention system. Finally, we present our results which clearly demonstrate that interest points used in classical CBIR methods can be drastically pruned without seriously impacting results. We also demonstrate that we have to smartly filter learning and training data set to obtain such results.
Paper 118: Estimation and Prediction of the Vehicle's Motion based on Visual Odometry and Kalman Filter
The movement of the vehicle is an useful information for different applications, such as driver assistant systems or autonomous vehicles. This information can be known by different methods, for instance, by using a GPS or by means of the visual odometry. However, there are some situations where both methods do not work correctly. For example, there are areas in urban environments where the signal of the GPS is not available, as tunnels or streets with high buildings. On the other hand, the algorithms of computer vision are affected by outdoor environments, and the main source of difficulties is the variation in the ligthing conditions. A method to estimate and predict the movement of the vehicle based on visual odometry and Kalman filter is explained in this paper. The Kalman filter allows both filtering and prediction of vehicle motion, using the results from the visual odometry estimation.
Paper 120: A Supervised Learning Framework for Automatic Prostate Segmentation in Trans Rectal Ultrasound Images
Heterogeneous intensity distribution inside the prostate gland, significant variations in prostate shape, size, inter dataset contrast variations, and imaging artifacts like shadow regions and speckle in Trans Rectal Ultrasound (TRUS) images challenge computer aided automatic or semi-automatic segmentation of the prostate. In this paper, we propose a supervised learning schema based on random forest for automatic initialization and propagation of statistical shape and appearance model. Parametric representation of the statistical model of shape and appearance is derived from principal component analysis (PCA) of the probability distribution inside the prostate and PCA of the contour landmarks obtained from the training images. Unlike traditional statistical models of shape and intensity priors, the appearance model in this paper is derived from the posterior probabilities obtained from random forest classification. This probabilistic information is then used for the initialization and propagation of the statistical model. The proposed method achieves mean Dice Similarity Coefficient (DSC) value of 0.96+/-0.01, with a mean segmentation time of 0.67+/-0.02 seconds when validated with 24 images from 6 datasets with considerable shape, size, and intensity variations, in a leave-one-patient-out validation framework. The model achieves statistically significant t-test p-value<0.0001 in mean DSC and mean mean absolute distance (MAD) values compared to traditional statistical models of shape and intensity priors.
Paper 123: Hardware Implementation of a Configurable Motion Estimator for Adjusting the Video Coding Performances
Despite the diversity of video compression standard, the motion estimation still remains a key process which is used in most of them. Moreover, the required coding performances (bit-rate, PSNR, image spatial resolution,etc.) depend obviously of the application, the environment and the network communication. The motion estimation can therefore be adapted to fit with these performances. Meanwhile, the real time encoding is required in many applications. In order to reach this goal, we propose in this paper a hardware implementation of the motion estimator which enables the integer motion search algorithms to be modied and the fractional search and variable block size to be selected and adjusted. Hence this novel architecture, especially designed for FPGA targets, proposes high-speed processing for a configuration which supports the variable size blocks and quaterpel reffinement, as described in H.264.
Paper 127: Hand Posture Classification by Means of a New Contour Signature
This paper deals with hand posture recognition. Thanks to an adequate setup, we afford a database of hand photographs. We propose a novel contour signature, obtained by transforming the image content into several signals. The proposed signature is invariant to translation, rotation, and scaling. It can be used for posture classification purposes. We generate this signature out of photographs of hands: experiments show that the proposed signature provides good recognition results, compared to Hu moments and Fourier descriptors.
Paper 128: 3D Parallel Thinning Algorithms Based on Isthmuses
Thinning is a widely used technique to obtain skeleton-like shape features (i.e., centerlines and medial surfaces) from digital binary objects. Conventional thinning algorithms preserve endpoints to provide important geometric information relative to the object to be represented. An alternative strategy is also proposed that preserves isthmuses (i.e., generalization of curve/surface interior points). In this paper we present ten 3D parallel isthmus-based thinning algorithm variants that are derived from some sufficient conditions for topology preserving reductions.
Keywords: Shape analysis, Feature extraction, Skeletons, Thinning algorithms, Topology preservation.
Paper 129: Kernel Similarity based AAMs for face recognition
Illumination and facial pose conditions have an explicit effect on the performance of face recognition systems, caused by the complicated non-linear variation between feature points and views. In this paper, we present a Kernel similarity based Active Appearance Models (KSAAMs) in which we use a Kernel Method to replace Principal Component Analysis (PCA) which is used for feature extraction in Active Appearance Models. The major advantage of the proposed approach lies in a more efficient search of non-linear varied parameter under complex face illumination and pose variation conditions. As a consequence, images illuminated from different directions, and images with variable poses can easily be synthesized by changing the parameters found by KSAAMs. From the experimental results, the proposed method provides higher accuracy than classical Active Appearance Model for face alignment in a point-to-point error sense.
Paper 130: The Sampling Pattern Cube - a Representation and Evaluation Tool for Optical Capturing Systems
Knowledge about how the light field is sampled through a camera system gives the required information to investigate interesting camera parameters. We introduce a simple and handy model to look into the sampling behavior of a camera system. We have applied this model to single lens system as well as plenoptic cameras. We have investigated how camera parameters of interest are interpreted in our proposed model-based representation. This model also enables us to make comparisons between capturing systems or to investigate how variations in an optical capturing system affect its sampling behavior.
Paper 132: GPU Optimization of Convolution for Large 3-D Real Images
In this paper, we propose a method for computing convolution of large 3-D images with respect to real signals. The convolution is performed in a frequency domain using a convolution theorem. Due to properties of real signals, the algorithm can be optimized so that both time and the memory consumption are halved when compared to complex signals of the same size. Convolution is decomposed in a frequency domain using the decimation in frequency (DIF) algorithm. The algorithm is accelerated on a graphics hardware by means of the CUDA parallel computing model, achieving up to 10x speedup with a single GPU over an optimized implementation on a quad-core CPU.
Paper 133: Approximate Regularization for Structural Optical Flow Estimation
We address the problem of maximum a posteriori (MAP) estimation of optical flow with a geometric prior from gray-value images. We estimate simultaneously the optical flow and the corresponding surface -- the structural optical flow (SOF) -- subject to three types of constraints: intensity constancy, geometric, and smoothness constraints. Our smoothness constraints restrict the unknowns to locally coincide with a set of finitely parameterized admissible functions. The geometric constraints locally enforce consistency between the optical flow and the corresponding surface. Our theory amounts to a discrete generalization of regularization defined in terms of partial derivatives. The point-wise regularizers are efficiently implemented with linear run-time complexity in the number of discretization points. We demonstrate the applicability of our method by example computations of SOF from photographs of human faces.
Paper 134: A New Level-Set Based Algorithm for Bimodal Depth Segmentation
In this paper, a new algorithm for bimodal depth segmentation is presented. The method separates the background and the planar objects of arbitrary shapes lying in a certain height above the background using the information from the stereo image pair (more exactly, the background and the objects may lie on two distinct general planes). The problem is solved as a problem of minimising a functional. A new functional is proposed for this purpose that is based on evaluating the mismatches between the images, which contrasts with the usual approaches that evaluate the matches. We explain the motivation for such an approach. The minimisation is carried out by making use of the Euler-Lagrange equation and the level-set function. The experiments show the promising results on noisy synthetic images as well as on real-life images. An example of the practical application of the method is also presented.
Paper 135: Water Region Detection Supporting Ship Traffic Identification in Port Surveillance
In this paper, we present a robust and accurate water region detection technique developed for supporting ship identification. Due to the varying appearance of water body and frequent intrusion of ships, a region-based recognition is proposed. We segment the image into perceptually meaningful segments and find all water segments using a sampling-based Support Vector Machine (SVM). The algorithm is tested on 6 different port surveillance sequences and achieves a pixel classification recall of 97.5% and precision of 96.4%. We also apply our water region detection to support the task of multiple ship detection. Combined with our cabin detector, it successfully removes 74.6% false detections generated in the cabin detection process. A slight decrease of 5% in the recall value is compensated by a significant improvement of 15% in precision.
Paper 136: Semi-Variational Registration of Range Images by Non-Rigid Deformations
We present a semi-variational approach for accurate registration of a set of range images. For each range image we estimate a transformation composed of a similarity and a free-form deformation in order to obtain a smoothly stitched surface. The resulting three-dimensional model has no jumps or sharp transitions in the place of stitching. We use the presented approach for accurate human head reconstruction from a set of facets subsequently captured from different views and computed independently. A joint energy for both types of transformations is formulated, which involves several regularization constraints defined according to a specification of the resulting surface. A strategy for reweighting the impact of correspondences is presented to improve stability and convergence of the approach. We demonstrate the applicability of our method on several representative examples.
Paper 138: Real-time Dance Pattern Recognition Invariant to Anthropometric and Temporal Differences
We present a cascaded real-time system that recognizes dance patterns from 3D motion capture data. In a first step, the body trajectory, relative to the motion capture sensor, is matched. In a second step, an angular representation of the skeleton is proposed to make the system invariant to anthropometric differences relative to the body trajectory. Coping with non-uniform speed variations and amplitude discrepancies between dance patterns is achieved via a sequence similarity measure based on Dynamic Time Warping (DTW). A similarity threshold for recognition is automatically determined. Using only one good motion exemplar (baseline) per dance pattern, the recognition system is able to find a matching candidate pattern in a continuous stream of data, without prior segmentation. Experiments show the proposed algorithm reaches a good trade-off between simplicity, speed and recognition rate. An average recognition rate of 86.8% is obtained in real-time.
Paper 139: Overlapping Local Phase Feature (OLPF) for Robust Face Recognition in Surveillance
As a non-invasive biometric method, face recognition in surveillance is a very challenging problem because of the concurrence of conditions, such as under the variable illumination with uncontrolled pose and movement in low-resolution of subject. In this paper, we present a robust human face recognition system for surveillance. Unlike traditional recognition system which detect face region directly, we use a Cascade Head-Shoulder Detector (CHSD) and a trained human body model to find the face region in an image. To recognize human face, an efficient feature, Overlapping Local Phase Feature (OLPF), is proposed, which is robust to pose and blurring without adversely affecting discrimination performance. To describe the variations of faces, Adaptive Gaussian Mixture Model (AGMM) is proposed which can describe the distributions of the face images. Since AGMM does not need the topology of face, the proposed method is resistant to the face detection errors caused by wrong or no alignment. Experimental results demonstrate the robustness of our method on public dataset as well as real data from surveillance camera.
Paper 141: Hand Posture Recognition with Multiview Descriptors
Preservation of asepsis in operating rooms is essential for limiting the contamination of patients by hospital-acquired infections. Strict rules hinder surgeons from interacting directly with any sterile equipement, requiring the intermediary of an assistant or a nurse. Such indirect control may prove itself clumsy and slow up the performed surgery. Gesture-based Human-Computer Interfaces show a promising alternative to assistants and could help surgeons in taking direct control over sterile equipements in the future without jeopardizing asepsis.
This paper presents the experiments we led on hand posture feature selection and the obtained results. State-of-the-art description methods classified in four different categories (i.e. local, semi-local, global and geometric description approaches) have been selected to this end. Their recognition rates when combined with a linear Support Vector Machine classifier are compared while attempting to recognize hand postures issued from an ad-hoc database. For each descriptor, we study the effects of removing the background to simulate a segmentation step and the importance of a correct hand framing in the picture. Obtained results show all descriptors benefit to various extents from the segmentation step. Geometric approaches perform best, followed closely by Dalal et al.'s Histogram of Oriented Gradients.
Paper 142: Classifying Plant Leaves From Their Margins Using Dynamic Time Warping
Most plant species have unique leaves which differ from each other by characteristics such as the shape, colour, texture and the margin. Details of the leaf margin are an important feature in comparative plant biology, although they have largely overlooked in automated methods of classification. This paper presents a new method for classifying plants according to species, using only the leaf margins. This is achieved by utilizing the dynamic time warping (DTW) algorithm. A margin signature is extracted and the leaf's insertion point and apex are located. Using these as start points, the signatures are then compared using a version of the DTW algorithm. A classification accuracy of over 90% is attained on a dataset of 100 different species.
Paper 143: Utilizing The Hungarian Algorithm For Improved Classification Of High-Dimension Probability Density Functions In An Image Recognition Problem
A method is presented for the classification of images described using high-dimensional probability density functions (pdfs). A pdf is described by a set of n points sampled from its distribution. These points represent feature vectors calculated from windows sampled from an image. A mapping is found, using the Hungarian algorithm, between the set of points describing a class, and the set for a pdf to be classified, such that the distance that points must be moved to change one set into the other is minimized. The method uses these mappings to create a classifier that can model the variation within each class. The method is applied to the problem of classifying plants based on images of their leaves, and is found to outperform several existing methods.
Paper 145: Gradual Iris Code Construction from Close-up Eye Video
This work deals with dynamic iris biometry using video, which is increasingly gaining interest for its flexibility in the framework of biometric portals. We propose several improvements for "real-time" dynamic iris biometry in order to build gradually an iris code of high quality by selecting on-the-fly the best iris images as they appear during acquisition. In particular, tracking is performed using an optimally-tuned Kalman's filter, i.e. a Kalman's filter with state and observation matrices specifically learned to follow the movement of a pupil. Experiments on four videos acquired with an IR-sensitive low-cost webcam show reduced computation time with a slight but significant gain in accuracy when compared to the classical Kalman tracker.
The second main contribution is to combine iris codes of images within the video stream providing the "best quality" iris texture. The so-obtained fuzzy iris codes clearly exhibit areas with high confidence and areas with low one due to eyelashes and eyelids. Hence, these areas involve an im- precision in detecting iris and pupil. Such uncertainty can be further exploited for identification.
Paper 147: Simultaneous Segmentation and Filtering via Reduced Graph Cuts
Recently, optimization with graph cuts became very attractive but generally remains limited to small-scale problems due to the large memory requirement of graphs, even when restricted to binary variables. Unlike previous heuristics which generally fail to fully capture details, another band-based method was proposed for reducing these graphs in image segmentation. This method provides small graphs while preserving thin structures but do not offer low memory usage when the amount of regularization is large. This is typically the case when images are corrupted by an impulsive noise. In this paper, we overcome this situation by embedding a new parameter in this method to both further reducing graphs and filtering the segmentation. This parameter avoids any post-processing steps, appears to be generally less sensitive to noise variations and offers a good robustness against noise. We also provide an empirical way to automatically tune this parameter and illustrate its behavior for segmenting grayscale and color images.
Paper 150: Depth from Vergence and Active Calibration for Humanoid Robots
In human eyes, many clues are used to perceive depth. For nearby tasks involving eye-hand coordination, depth from vergence is a strong cue. In our research on humanoid robots we study binocular robotic eyes that can pan and tilt and perceive depth from stereo, as well as depth from vergence by fixing both eyes on a nearby object. In this paper, we report on a convergent robot vision set-up: Firstly, we describe the mathematical model for convergent vision system. Secondly, we introduce an algorithm to estimate the depth of an object under focus. Thirdly, as the centers of rotation of the eye motors do not align with the center of image planes, we develop an active calibration algorithm to overcome this problem. Finally, we examine the factors that have impact on the depth error. The results of experiments and tests show the good performance of our system and provide insight into depth from vergence.
Paper 151: Modified Bilateral Filter for the Restoration of Noisy Color Images
In the paper a novel technique of noise removal in color images is presented. The proposed filter design is a modification of the bilateral denosing scheme, which considers the similarity of color pixels and their spatial distance. However, instead of direct calculation of the dissimilarity measure, the cost of a connection through a digital path joining the central pixel of the filtering window and its neighbors is determined. The filter output, like in the standard bilateral filter, is calculated as a weighted average of the pixels which are in the neighborhood relation with the center of the filtering window, and the weights are functions of the minimal connection costs. Experimental results prove that the new denoising method yields significantly better results than the bilateral filter in case of color images contaminated by strong mixed Gaussian and impulsive noise.
Paper 161: Multi-View Gait Fusion for Large Scale Human Identification in Surveillance Videos
In this paper we propose a novel multi-view feature fusion of gait biometric information in surveillance videos for large scale human identification. The experimental evaluation on low resolution surveillance video images from a publicly available database showed that the combined LDA- MLP technique turns out to be a powerful method for capturing identity specific information from walking gait patterns. The multi- view fusion at feature level allows complementarities of multiple camera views in surveillance scenarios to be exploited for improvement of identity recognition performance.
Paper 164: Detection of HF First-Order Sea Clutter and Its Splitting Peaks with Image Feature: Results in Strong Current Shear Environment
Strong current shear environment always results in the twisty and splitted sea clutter along the range dimension in the range-Doppler spectral map. A sea clutter detection method with image feature is proposed. With 2-D image features in range-Doppler spectrum, the trend of first-order sea echoes is extracted as indicative information by a multi-scale filter. Detection rules for both single and splitting first-order sea echoes are given based on the characteristic knowledge combining the indicative information with the global characteristics such as amplitude, symmetry, continuity, etc. Compared with the classical algorithms, the proposed method can detect and locate the first- order sea echo in the HF band more accurately especially in the environment with targets/clutters smearing. Experiments with real data in strong current shear environment verify the validity of the algorithm.
Paper 165: Object recognition using Radon transform based RST parameter estimation
In this paper, we propose a practical parameter recovering approach, for similarity geometric transformations using only the Radon transform and its extended version on [0,2pi]. The derived objective function is exploited as a similarity measure to perform an object recognition system. Comparison results with common and powerful shape descriptors testify the effectiveness of the proposed method in recognizing binary images, RST transformed, distorted, occluded or noised.
Paper 171: Information-Gain View Planning for Free-Form Object Reconstruction with a 3D ToF Camera
Active view planning for gathering data from an unexplored 3D complex scenario is a hard and still open problem in the computer vision community. In this paper, we present a general task-oriented approach based on an information-gain maximization that easily deals with such a problem. Our approach consists of ranking a given set of possible actions, based on their task-related gains, and then executing the best-ranked action to move the required sensor.
An example of how our approach behaves is demonstrated by applying it over 3D raw data for real-time volume modelling of complex-shaped objects. Our setting includes a calibrated 3D time-of-flight (ToF) camera mounted on a 7 degrees of freedom (DoF) robotic arm. Noise in the sensor data acquisition, which is too often ignored, is here explicitly taken into account by computing an uncertainty matrix for each point, and refining this matrix each time the point is seen again. Results show that, by always choosing the most informative view, a complete model of a 3D free-form object is acquired and also that our method achieves a good compromise between speed and precision.
Paper 174: Correction, Stitching and Blur Estimation of micro-Graphs Obtained at High Speed
Micro structures of surface are considered to be effective in identifying the damage mechanisms. The industry uses computer vision to auto detect misalignment of the components as it is a contactless tool. However, in scientific investigations micro structures obtained online at high speed has to be analyzed. In this work the change detection of a specimen rotating at a high speed studied online using image processing techniques in micro graphs which provides a clear insight about the dimensional changes. The specimen under study is made from polymer composite which has contact with a steel wheel and rotates at a high speed. The blur as a measure of dimensional change of the polymer composite can be identified due to the change in focus. The micro structure images were dark and span a very small region of the surface due to high speed image acquisition, short shutter time and magnification of the microscope. Thus, preprocessing procedures like image enhancement, stitching and registration are performed. Then 15 blur estimation methods are applied to the stitched images. The results of three methods present a good correlation with dimensional change provided by a stylus instrument.
Paper 175: Recovering Projective Transformations between Binary Shapes
Binary image registration has been addressed by many authors recently however most of the proposed approaches are restricted to affine transformations. In this paper a novel approach is proposed to estimate the parameters of a general projective transformation (also called homography) that aligns two shapes. Recovering such projective transformations is a fundamental problem in computer vision with various applications. While classical approaches rely on established point correspondences the proposed solution does not need any feature extraction, it works only with the coordinates of the foreground pixels. The two-step method first estimates the perspective distortion independently of the affine part of the transformation which is recovered in the second step. As experiments on synthetic as well on real images show that the proposed method less sensitive to the strength of the deformation than other solutions. The efficiency of the method has also been demonstrated on the traffic sign matching problem.
Paper 176: Active Visual-based Detection and Tracking of Moving Objects from Clustering and Classification methods
This paper describes a method proposed for the detection, the tracking and the identification of mobile objects, detected from a mobile camera, typically a camera embedded on a robot. A global architecture is presented, using only vision, in order to solve simultaneously several problems: the camera (or vehicle) Localization, the environment Mapping and the Detection and Tracking of Moving Objects. The goal is to build a convenient description of a dynamic scene from vision: what is static? What is dynamic? where is the robot? how do other mobile objects move? It is proposed to combine two approaches; first a Clustering method allows to detect static points, to be used by the SLAM algorithm and dynamic ones, to segment and estimate the status of mobile objects. Second a classification approach allows to identify objects of known classes in image regions. These two approaches are combined in an active method based in a Motion Grid in order to select actively where to look for mobile objects. The overall approach is evaluated with real data acquired indoor and outdoor from a camera embedded on a mobile robot.
Paper 177: Improving Image Acquisition : A Fish-Inspired Solution
In this paper, we study the rendering of images with a new mosaic/color filter array (CFA) called the Burtoni mosaic. This mosaic is derived from the retina of the African cichlid fish Astatotilapia burtoni. To evaluate the effect of the Burtoni mosaic on the quality of the rendered images, we use two quality measures in the Fourier domain which are the resolution error and the aliasing error. In our model, no demosaicing algorithm is used, which makes it independent of such algorithms. We also use 11 semantic sets of color images in order to highlight the images classes that are well fitted for the Burtoni mosaic in the process of image acquisition. We have compared the Burtoni mosaic with the Bayer CFA and with an optimal CFA proposed by Hao et al. Experiments have shown that the Burtoni mosaic gives the best performances for images of 9 semantic sets which are the high frequency, aerial, indoor, face, aquatic, bright, dark, step and line classes.
Paper 178: Evaluating the Effects of MJPEG Compression on Motion Tracking in Metro Railway Surveillance
Video content analytics is being increasingly employed for the security surveillance of mass-transit systems. The growing number of cameras, the presence of legacy networks, the limited bandwidth of wireless links, are some of the issues which highlight the importance of evaluating the performance of motion tracking against different levels of video compression. In this paper, we report the results of such an evaluation considering false-negative and false-positive metrics applied to videos captured from cameras installed in a real metro-railway environment. The evaluation methodology is based on the manual generation of the Ground Truth on selected videos at growing levels of MJPEG compression, and on its comparison with the Algorithm Result automatically generated by the Motion Tracker. The computation of reference performance metrics is automated by a tool developed in Matlab. Results are discussed with respect to the main causes of false detections, and hints are provided for further industrial applications.
Paper 180: Annotating Images with Suggestions - User Study of a Tagging System
This paper explores the concept of image-wise tagging. It introduces a web-based user interface for image annotation, and a novel method for modeling dependencies of tags using Restricted Boltzmann Machines which is able to suggest probable tags for an image based on previously assigned tags. According to our user study, our tag suggestion methods improve both user experience and annotation speed. Our results demonstrate that large datasets with semantic labels (such as in TRECVID Semantic Indexing) can be annotated much more efficiently with the proposed approach than with current class-domain-wise methods, and produce higher quality data.
Paper 181: Classification of Hyperspectral Data over Urban Areas based on Extended Morphological Profile with Partial Reconstruction
Extended morphological profiles with reconstruction are widely used in the classification of very high resolution hyperspectral data from urban areas. However, morphological profiles constructed by morphological openings and closings with reconstruction can lead to some undesirable effects. Objects expected to disappear at a certain scale remain present when using morphological openings and closings by reconstruction. In this paper, we apply extended morphological profiles with partial reconstruction (EMPP) to the classification of high resolution hyperspectral images from urban areas. We first used feature extraction to reduce the dimensionality of the hyperspectral data, as well as reduce the redundancy within the bands, then constructed EMPP on features extracted by PCA, independent component analysis and kernel PCA for the classification of high resolution hyperspectral images from urban areas. Experimental results on real urban hyperspectral image demonstrate that the proposed EMPP built on kernel principal components gets the best results, particularly in the case with small training sample sizes.
Paper 182: 3D Shape Recovery from Focus Using LULU Operators
Extracting the shape of an object is one of the important tasks to be performed in many vision applications. One of the difficult challenges in 3D shape extraction is the roughness of the surfaces of objects. Shape from focus (SFF) is a shape recovery method that reconstructs the shape of an object from a sequence of images taken from the same viewpoint but with different focal lengths. This paper proposes the use of LULU operators as a preprocessing step to improve the signal-to-noise ratio in the estimation of 3D shape from focus. LULU operators are morphological filters that are used for their structure preserving properties. The proposed technique is tested on simulated and real images separately, as well as in combination with traditional SFF methods such as sum modified Laplacian (SML), and gray level variance (GLV). The proposed technique is tested in the presence of impulse noise with different noise levels. Based on the quantitative and qualitative experimental results it is shown that the proposed techniques is more accurate in focus value extraction and shape recovery in the presence of noise.
Paper 183: Entropy based Supervised Merging for Visual Categorization
Bag Of visual Words (BoW) is widely regarded as the standard representation of visual information present in the images and is broadly used for retrieval and concept detection in videos. The generation of visual vocabulary in the BoW framework generally includes a quantization step to cluster the image features into a limited number of visual words. This quantization achieved through unsupervised clustering does not take any advantage of the relationship between the features coming from images belonging to similar concept(s), thus enlarging the semantic gap. We present a new dictionary construction technique to improve the BoW representation by increasing its discriminative power. Our solution is based on a two step quantization: we start with k-means clustering followed by a bottom-up supervised clustering using features' label information. Results on the TRECVID 2007 data show improvements with the proposed construction of the BoW. We equally give upperbounds of improvement over the baseline for the retrieval rate of each concept using the best supervised merging criteria.