Advanced Concepts for Intelligent Vision Systems
Oct. 28-31 2013
City Park Hotel, Poznan, Poland
Acivs 2013 Abstracts
Paper 217: Hierarchical analysis of hyperspectral images
After decades of use of multispectral remote sensing, most of the major space agencies now have new programs to launch hyperspectral sensors, recording the reflectance information of each point on the ground in hundreds of narrow and contiguous spectral bands. The spectral information is instrumental for the accurate analysis of the physical component present in one scene. But, every rose has its thorns : most of the traditional signal and image processing algorithms fail when confronted to such high dimensional data (each pixel is represented by a vector with several hundreds of dimensions).
In this talk, we will start by a general presentation of the challenges and opportunities offered by hyperspectral imaging systems in a number of applications.
We will then explore these issues with a hierarchical approach, briefly illustrating the problem of spectral unmixing and of super-resolution, then moving on to pixel-wise classification (purely spectral classification and then including contextual features). Eventually, we will focus on the extension to hyperspectral data of a very powerful image processing analysis tool: the Binary Partition Tree (BPT). It provides a generic hierarchical representation of images and consists of the two following steps:
- construction of the tree : one starts from the pixel level and merge pixels/regions progressively until the top of the hierarchy (the whole image is considered as one single region) is reached. To proceed, one needs to define a model to represent the regions (for instance: the average spectrum - but this is not a good idea) and one also needs to define a similarity measure between neighbouring regions to decide which ones should be merged first (for instance the euclidean distance between the model of each region - but this is not a good idea either). This step (construction of the tree) is very much related to the data.
- the second step is the pruning of the tree: this is very much related to the considered application. The pruning of the tree leads to one segmentation. The resulting segmentation might not be any of the result obtained during the iterative construction of the tree. This is where this representation outperforms the standard approaches. But one may also perform classification, or object detection (assuming an object of interest will appear somewhere as one node of the tree, the game is to define a suitable criterion, related to the application, to find this node).
Results are presented on various hyperspectral images.
Paper 218: Hierarchical Compositional Representations for Computer Vision
Paper 219: Free-Viewpoint Video
The term "free-viewpoint video" refers to representations of a visual scene that allow a viewer to navigate virtually through a scene. Such a representation mostly consists of a number of views and depth maps, and may be used to synthesize virtual views for arbitrary points on a trajectory of navigation. Therefore, a viewer is able to see the scene from an arbitrary chosen viewpoint, like using a virtual camera. The views might be monoscopic but stereoscopic views provide more realistic navigation through a scene. Significant recent interest in free-viewpoint video is stimulated by potential applications that include not only interactive movies and television coverage of sports, in particular of boxing, sumo, judo or wrestling, but also interactive theatre and circus performances, courses and user manuals. In particular, "free-viewpoint television" is already a subject of extensive research. In this lecture, we will consider acquisition of multiview video and the corresponding preprocessing that includes system calibration, geometrical rectification and color correction. Further, we will proceed to production of depth information for multiview video. After this step, we get the "multiview plus depth" (MVD) representation that needs to be transmitted. The emerging MVD compression technologies will be briefly reported together with the corresponding international standardization projects. At last, the virtual view synthesis will be discussed. The "free-viewpoint video" technology is also closely related to technology of autostereoscopic displays that are used to present 3D video to many viewers who do not need to wear special glasses. Displayed are many different views, possibly between 30 and 170, that are mostly synthesized in a computer connected to the display. There also arise issues of 3D virtual-view-based quality assessment and 3D visual attention. The "free-viewpoint video" is a "hot" research topic and the lecture will be illustrated by the recent research results.
Paper 107: IMM-based Tracking and Latency Control with off-the-shelf IP PTZ Camera
Networked Pan-Tilt-Zoom cameras (PTZ) seem to replace static ones in videosurveillance areas as they are easier to deploy, with a larger field of view and can take high resolution pictures of targets thanks to their zoom. However, current algorithms combining tracking and camera control do not take into account order executing latency and motion delays caused by off-the-shelf cameras. In this paper, we suggest a new motion control strategy that manages inherent delays by an Interacting Multiple Models prediction of the target motion and an online evaluation of this prediction.
Paper 108: Evaluation of Traffic Sign Recognition Methods Trained on Synthetically Generated Data
Most of today’s machine learning techniques requires large manually labeled data. This problem can be solved by using synthetic images. Our main contribution is to evaluate methods of traffic sign recognition trained on synthetically generated data and show that results are comparable with results of classifiers trained on real dataset. To get a representative synthetic dataset we model different sign image variations such as intra-class variability, imprecise localization, blur, lighting, and viewpoint changes. We also present a new method for traffic sign segmentation, based on a nearest neighbor search in the large set of synthetically generated samples, which improves current traffic sign recognition algorithms.
Paper 110: Real-time Face Pose Estimation in Challenging Environments
A novel low-computation discriminative feature representation is introduced for face pose estimation in video context. The contributions of this work lie in the proposition of new approach which supports automatic face pose estimation with no need to manual initialization, able to handle different challenging problems without affecting the computational complexity (~58 milliseconds per frame). We have applied Local Binary Patterns Histogram Sequence (LBPHS) on Gaussian and Gabor feature pictures to encode salient micro-patterns of multi-view face pose. Relying on LBPHS face representation, an SVM classifier was used to estimate face pose. Two series of experiments were performed to prove that our proposed approach, being simple and highly automated, can accurately and effectively estimate face pose. Additionally,experiments on face images with diverse resolutions prove that LBPHS features are efficient to low-resolution images, which is critical challenge in real-world applications where only low-resolution frames are available.
Paper 111: Planar Segmentation by Time-of-Flight Cameras
This article presents a new feature for detecting planarity when using range images on the principle of Time of Flight (ToF). We derive homogeneous linear conditions for Time-of-Flight images of 3 or 4 points to be the projection of collinear or coplanar points respectively.
The crucial part in our equations is played by the D/d-ratios of the ToF measurement D(u,v) and the "internal radial distance" (IRD) d(u,v) for each pixel (u,v). The knowledge of the IRD d may be a consequence of an accurate lateral calibration of the camera, but this IRD can also be directly obtained. Consequently, the proposed planarity condition holds in principle for uncalibrated camera's.
The elegance, efficiency and simplicity of our coplanarity constraint is illustrated in experiments on planarity tests and the segmentation of planar regions.
Paper 114: Partial Near-duplicate Detection in Random Images by a Combination of Detectors
Detection of partial near-duplicates (e.g. similar objects) in random images continues to be a challenging problem. In particular, scalability of existing methods is limited because keypoint correspondences have to be confirmed by the configuration analysis for groups of matched keypoints. We propose a novel approach where pairs of images containing partial near-duplicates are retrieved if ANY number of keypoint matches is found between both images (keypoint descriptions are augmented by some geometric characteristics of keypoint neighborhoods). However, two keypoint detectors (Harris-Affine and Hessian-Affine) are independently applied, and only results confirmed by both detectors are eventually accepted. Additionally, relative locations of keypoint correspondences retrieved by both detectors are analyzed and (if needed) outlines of the partial near-duplicates can be extracted using a keypoint-based co-segmentation algorithm. Altogether, the approach has a very low complexity (i.e. it is scalable to large databases) and provides satisfactory performances. Most importantly, precision is very high, while recall (determined primarily by the selected keypoint description and matching approaches) remains at acceptable level.
Paper 115: A New Color Image Database TID2013: Innovations and Results
A new database of distorted color images called TID2013 is designed and described. In opposite to its predecessor, TID2008, this database contains images with five levels of distortions instead of four used earlier and a larger number of distortion types (24 instead of 17). The need for these modifications is motivated and new types of distortions are briefly considered. Information on experiments already carried out in five countries with the purpose of obtaining mean opinion score (MOS) is presented. Preliminary results of these experiments are given and discussed. Several popular metrics are considered and Spearman rank order correlation coefficients between these metrics and MOS are presented and discussed. Analysis of the obtained results is performed and distortion types difficult for assessment by existing metrics are noted.
Paper 116: Automatic Monitoring of Pig Activity Using Image Analysis
The purpose of this study is to investigate the feasibility and validity of an automated image processing method to detect the activity status of pigs. Top-view video images were captured for forty piglets, housed ten per pen. Each pen was monitored by a top-view CCD camera. The image analysis protocol to automatically quantify activity consisted of several steps. First, in order to localise the pigs, ellipse fitting algorithms were employed. Subsequently, activity was calculated by subtracting image background and comparing binarised images. To validate the results, they were compared to labelled behavioural data ('active' versus 'inactive'). This is the first study to show that activity status of pigs in a group can be determined using image analysis with an accuracy of 89.8 %. Since activity status is known to be associated with issues such as lameness, careful monitoring can give an accurate indication of the health and welfare of pigs.
Paper 117: The Objective Evaluation of Image Object Segmentation Quality
In this paper, a novel objective quality metric is proposed for image individual object segmentation. We analyze four types of segmentation errors, and verify experimentally that besides quantity, area and contour, the distortion of object content is another useful segmentation quality index. Our metric evaluates the similarity between ideal result and segmentation result by measuring these distortions. The metric has been tested on our subjectively-rated image segmentation database and demonstrated a good performance in matching subjective ratings.
Paper 118: Distance Estimation with a Two or Three Aperture SLR Digital Camera
When a camera is modified by placing two or more displaced apertures with color filters within the imaging system, it is possible to estimate the distances of objects from the camera and to create 3-d images. In this paper, we develop the key equations necessary to estimate the distance of an object and discuss the feasibility of such a system for distance estimation in applications such as robot vision, human computer interfaces, intelligent visual surveillance, 3-d image acquisition, and intelligent driver assistance systems. In particular, we discuss how accurately these distances may be estimated and describe how distance estimation may be performed in real-time using an appropriately modified video camera.
Paper 120: A Mobile Imaging System for Medical Diagnostics
Microscopy for medical diagnostics requires expensive equipment as well as highly trained experts to operate and interpret the observed images. We present a new, easy to use, mobile diagnostic system consisting of a direct imaging microlens array and a mobile computing platform for diagnosing parasites in clinical samples. Firstly, the captured microlens images are reconstructed using a light field rendering method. Then, OpenCL accelerated classification utilizing local binary pattern features is performed. A speedup of factor 4.6 was achieved for the mobile computing platform CPU (AMD C-50) compared with the GPU (AMD 6250). The results show that a relatively inexpensive system can be used for automatically detecting eggs of the Schistosoma parasite. Furthermore, the system can be also used to diagnose other parasites and thinlayer microarray samples containing stained tumor cells.
Paper 123: Training with Corrupted Labels to Reinforce a Probably Correct Teamsport Player Detector
While the analysis of foreground silhouettes has become a key component of modern approach to multi-view people detection, it remains subject to errors when dealing with a single viewpoint. Besides, several works have demonstrated the benefit of exploiting classifiers to detect objects or people in images, based on local texture statistics. In this paper, we train a classifier to differentiate false and true positives among the detections computed based on a foreground mask analysis. This is done in a sport analysis context where people deformations are important, which makes it important to adapt the classifier to the case at hand, so as to take the teamsport color and the background appearance into account. To circumvent the manual annotation burden incurred by the repetition of the training for each event, we propose to train the classifier based on the foreground detector decisions. Hence, since the detector is not perfect, we face a training set whose labels might be corrupted. We investigate a set of classifier design strategies, and demonstrate the effectiveness of the approach to reliably detect sport players with a single view.
Paper 124: Learning and Propagation of Dominant Colors for Fast Video Segmentation
Color segmentation is an essential problem in image processing. While most of the recent works focus on the segmentation of individual images, we propose to use the temporal color redundancy to segment arbitrary videos. In an initial phase, a k-medoids clustering is applied on histogram peaks observed on few frames to learn the dominant colors composing the recorded scene. In a second phase, these dominant colors are used as reference colors to speed up a color-based segmentation process and, are updated on-the-fly when the scene changes. Our evaluation first shows that the proprieties of k-medoids clustering make it well suited to learn the dominant colors. Then, the efficiency and the effectiveness of the proposed method are demonstrated and compared to standard segmentation benchmarks. This assessment reveals that our approach is more than 250 times faster than the conventional mean-shift segmentation, while preserving the segmentation accuracy.
Paper 125: Acquisition of Agronomic Images with Sufficient Quality by Automatic Exposure Time Control and Histogram Matching
Agronomic images in Precision Agriculture are most times used for crop lines detection and weeds identification; both are a key issue because specific treatments or guidance require high accuracy. Agricultural images are captured in outdoor scenarios, always under uncontrolled illumination. CCD-based cameras, acquiring these images, need a specific control to acquire images of sufficient quality for greenness identification from which the crop lines and weeds are to be extracted. This paper proposes a procedure to achieve images with sufficient quality by controlling the exposure time based on image histogram analysis, completed with histogram matching. The performance of the proposed procedure is verified against testing images.
Paper 127: An Efficient Normal-Error Iterative Algorithm for Line Triangulation
In this paper, we address the problem of line triangulation, which is to find the position of a line in space given its three projections taken with cameras with known camera matrices. Because of measurement error in line extraction, the problem becomes difficult so that it is necessary to estimate a 3D line to optimally fit measured lines. In this work, the normal errors of measured line are presented to describe the measurement error and based on their statistical property a new geometric distance optimality criterion is constructed. Furthermore, a simple iterative algorithm is proposed to obtain suboptimal solution of the optimality criterion, which ensures that the solution satisfies the trifocal tensor constraint. Experiments show that our iterative algorithm can achieve the estimation accuracy comparable with the Gold Standard algorithm, but its computational load is substantially reduced.
Paper 129: A Resource Allocation Framework for Adaptive Selection of Point Matching Strategies
This report investigates how to track an object based on the matching of points between pairs of consecutive video frames. The approach is especially relevant to support object tracking in close-view video shots, as for example encountered in the context of the Pan-Tilt-Zoom (PTZ) camera autotracking problem. In contrast to many earlier related works, we consider that the matching metric of a point should be adapted to the signal observed in its spatial neighborhood, and introduce a cost-benefit framework to control this adaptation with respect to the global target displacement estimation objective. Hence, the proposed framework explicitly handles the trade-off between the point-level matching metric complexity, and the contribution brought by this metric to solve the target tracking problem. As a consequence, and in contrast with the common assumption that only specific points of interest should be investigated, our framework does not make any a priori assumption about the points that should be considered or ignored by the tracking process. Instead, it states that any point might help in the target displacement estimation, provided that the matching metric is well adapted. Measuring the contribution reliability of a point as the probability that it leads to a crorrect matching decision, we are able to define a global successful target matching criterion. It is then possible to minimize the probability of incorrect matching over the set of possible (point,metric) combinations and to find the optimal aggregation strategy. Our preliminary results demonstrate both the effectiveness and the effciency of our approach.
Paper 130: Perspective Multiscale Detection of Vehicles for Real-time Forward Collision Avoidance Systems
This paper presents a single camera vehicle detection technique for forward collision warning systems suitable to be integrated in embedded platforms. It combines the robustness of detectors based on classification methods with an innovative perspective multi-scale procedure to scan the images that dramatically reduces the computational cost associated with robust detectors. In our experiments we compare different implementation classifiers in search for a trade- off between the real-time constraint of embedded platforms and the high detection rates required by safety applications.
Paper 131: Robust Multi-Camera People Tracking Using Maximum Likelihood Estimation
This paper presents a new method to track multiple persons reliably using a network of smart cameras. The task of tracking multiple persons is very challenging due to targets' non-rigid nature, occlusions and environmental changes. Our proposed method estimates the positions of persons in each smart camera using a maximum likelihood estimation and all estimates are merged in a fusion center to generate the final estimates. The performance of our proposed method is evaluated on indoor video sequences in which persons are often occluded by other persons and/or furniture. The results show that our method performs well with the total average tracking error as low as 10.2 cm. We also compared performance of our system to a state-of-the-art tracking system and find that our method outperforms in terms of both total average tracking error and total number of object loss.
Paper 135: Efficient Low Complexity SVC Video Transrater with Spatial Scalbility
In this paper we propose a new H.264 SVC transrating architecture for spatially scalable SVC compressed video streams. The algorithm is low complexity based, it applies to spatially scalable pre-encoded video streams and allows fine bit rate granularity while keeping highest spatial resolution. Simulation results demonstrate that transcoded bit streams produce satisfying picture quality even at bit rate reductions up to 66%. The comparison with MGS compressed video streams shows that the proposed transrating algorithm offers satisfying performances compared to MGS when bit rate reduction remains limited. Moreover quality scalability is obtained thanks to our algorithm even if the SVC compressed video bitstream has not been processed using MGS scalability right from the start.
Paper 136: A Perception-Based Interpretation of the Kernel-based Object Tracking
This paper investigates the advantages of using simple rules of human perception in object tracking. Specifically, human visual perception (HVP) will be used in the definition of both target features and the similarity metric to be used for detecting the target in subsequent frames. Luminance and contrast will play a crucial role in the definition of target features, whereas recent advances in the relations between some classical concepts of information theory and the way human eye codes image information will be used in the definition of the similarity metric. The use of HVP rules in a well known object tracking algorithm, allows us to increase its efficacy in following the target and to considerably reduce the computational cost of the whole tracking process. Some tests also show the stability and the robustness of a perception-based object tracking algorithm also in the presence of other moving elements or target occlusion for few subsequent frames.
Paper 141: A Novel Graph Based Clustering Technique for Hybrid Segmentation of Multi-Spectral Remotely Sensed Images
This paper proposes a novel unsupervised graph based clustering method for the purpose of hybrid segmentation of multi-spectral satellite images. In hybrid image segmentation framework, the source image is initially (over)segmented while preserving the fine image details. A region merging strategy has to be adopted next for further refinement. Here mean-shift (MS) based technique has been considered for initially segmenting the source image as it performs edge preserving smoothing beforehand hence eliminates noise. The objects found after this step are merged together in a low-level image feature space using the proposed graph based clustering algorithm. A graph topology combining k-nearest-neighbor (KNN) and minimum spanning tree has been considered on which the proposed iterative algorithm has been applied to eliminate the edges which span different clusters. It results in a set of connected components where each component represents a separate cluster. Comparison with two other hybrid segmentation techniques establishes the comparable accuracies of the proposed framework.
Paper 142: Flexible Multi-modal Graph-Based Segmentation
This paper aims at improving the well-known local variance segmentation method by adding extra signal modi and specific processing steps. As a key contribution, we extend the uni-modal segmentation method to perform multi-modal analysis, such that any number of signal modi available can be incorporated in a very flexible way. We have found that the use of a combined weight of luminance and depth values improves the segmentation score by 6.8%, for a large and challenging multi-modal dataset. Furthermore, we have developed an improved uni- modal texture-segmentation algorithm. This improvement relies on a clever choice of the color space and additional pre- and post-processing steps, by which we have increased the segmentation score on a challenging texture dataset by 2.1%. This gain is mainly preserved when using a different dataset with worse lighting conditions and different scene types.
Paper 145: Object Recognition and Modeling using Sift Features
In this paper we present a technique for object recognition and modelling based on local image features matching. Given a complete set of views of an object the goal of our technique is the recognition of the same object in an image of a cluttered environment containing the object and an estimate of its pose. The method is based on visual modeling of objects from a multi-view representation of the object to recognize. The first step consists of creating object model, selecting a subset of the available views using SIFT descriptors to evaluate image similarity and relevance. The selected views are then assumed as the model of the object and we show that they can effectively be used to visually represent the main aspects of the object. Recognition is done making comparison between the image containing an object in generic position and the views selected as object models. Once an object has been recognized the pose can be estimated searching the complete set of views of the object. Experimental results are very encouraging using both a private dataset we acquired in our lab and a publicly available dataset.
Paper 147: Extended GrabCut for 3D and RGB-D Point Clouds
GrabCut is a renowned algorithm for image segmentation. It exploits iteratively the combinatorial minimization of energy function as introduced in graph-cut methods, to achieve background foreground classification with fewer user’s interaction. In this paper it is proposed to extend GrabCut to carry out segmentation on RGB-D point clouds, based both on appearance and geometrical criteria. It is shown that an hybrid GrabCut method combining RGB and D information, is more efficient than GrabCut based only on RGB or D images.
Paper 148: VTApi: an Efficient Framework for Computer Vision Data Management and Analytics
VTApi is an open source application programming interface designed to fulfill the needs of specific distributed computer vision data and metadata management and analytic systems and to unify and accelerate their development. It is oriented towards processing and efficient management of image and video data and related metadata for their retrieval, analysis and mining with the special emphasis on their spatio-temporal nature in real-world conditions. VTApi is a free extensible framework based on progressive and scalable open source software as OpenCV for high- performance computer vision and data mining, PostgreSQL for efficient data management, indexing and retrieval extended by similarity search and integrated with geography/spatio-temporal data manipulation.
Paper 149: Incremental Principal Component Analysis-based Sparse Representation for Face Pose Classification
This paper proposes an Adaptive Sparse Representation pose Classification (ASRC) algorithm to deal with face pose estimation in occlusion, bad illumination and low-resolution cases. The proposed approach classifies different poses, the appearance of face images from the same pose being modelled by an online eigenspace which is built via Incremental Principal Component Analysis. Then the combination of the eigenspaces of all pose classes are used as an over-complete dictionary for sparse representation and classification. However, the big amount of training images may lead to build an extremely large dictionary which will decelerate the classification procedure. To avoid this situation, we devise a conditional update method that updates the training eigenspace only with the misclassified face images. Experimental results show that the proposed method is very robust when the illumination condition changes very dynamically and image resolutions are quite poor.
Paper 150: Recognizing Conversational Interaction based on 3D Human Pose
In this paper, we examine whether 3D pose features can be used to both learn and recognize different conversational interactions. For example, can we distinguish from these cues whether two people are most likely discussing work or a holiday experience? We believe this to be the first work devoted to this subject and show that this task is indeed possible with a promising degree of accuracy using the features derived from pose features which are extracted from Kinect sensor. Both generative and discriminative methods are investigated, and supervised learning approach is employed to classify the testing sequences to seven different conversational scenarios.
Paper 154: An Enhanced Weighted Median Filter for Noise Reduction in SAR Interferograms
In this paper, we describe a new filtering method based on the weighted median filter and the Lopez and Fabregas noise reduction algorithm operating in the wavelet domain. It is developed for the reduction of the impulse phase noise in synthetic aperture radar interferograms (InSAR). Our contribution to the classic weighted median filter consists of using the InSAR coherence map to generate the weights. While the developed approach prioritizes the high-coherence areas to compute the median filter outputs, the computation of the weights depends on the coherence values within the used window. The developed algorithm is then tested on a simulated data set as well as a set of Radarsat-2 raw data acquired over the region of Mahdia in Tunisia. The results validation is considered through computing the unwrapped phase of the filtered interferogram by using the SNAPHU algorithm.
Paper 156: Spherical Center-Surround for Video Saliency Detection using Sparse Sampling
This paper presents a technique for detection of eminent(salient) regions in an image sequence. The method is inspired by the biological studies on human visual attention systems and is grounded on the famous center-surround theory. It hypothesis that an item (center) is dissimilar to its surrounding. A spherical representation is proposed to estimate amount of salience. It enables the method to integrate computation of temporal and spatial contrast features. Efficient computation of the proposed representation is made possible by sparse sampling the surround which result in an efficient spatiotemporal comparison. The method is evaluated against a recent benchmark methods and is shown to outperform all of them.
Paper 157: Efficient Detection and Tracking of Road Signs based on Vehicle Motion and Stereo Vision
The road signs provide important information about road and traffic to drivers for safety driving. These signs include not only common traffic signs but also the information about unexpected obstacles and road constructions. Accurate detection and identification of road signs is one of the research topics in vehicle vision area. In this paper we propose a stereo vision technique to automatically detect and track road signs in a video sequence which is acquired from a stereo vision camera mounted on a vehicle. First, color information is used to initially detect the candidates of road signs. Second, the Support Vector Machine (SVM) is used to select true signs from the candidates. Once a road sign is detected in a video frame, it is tacked from the next frame until disappeared. The 2-D position of the detected sign on the next frame is predicted by the motion of the vehicle. Here, the vehicle motion means the 3-D Euclidean motion acquired by using a stereo matching method. Finally, the predicted 2-D position of the sign is corrected by the template matching of a scaled sign template in the near regions of the predicted position. Experimental results show that the proposed method can detect and track road signs successfully. Error comparisons with two different detection and tracking methods are shown.
Paper 158: A Key-Pose Similarity Algorithm for Motion Data Retrieval
Analysis of human motion data is an important task in many research fields such as sports, medicine, security, and computer animation. In order to fully exploit motion databases for further processing, effective and efficient retrieval methods are needed. However, such task is difficult primarily due to complex spatio-temporal variances of individual human motions and the rapidly increasing volume of motion data. In this paper, we propose a universal content-based subsequence retrieval algorithm for indexing and searching motion data. The algorithm is able to examine database motions and locate all their sub-motions that are similar to a query motion example. We illustrate the algorithm usability by indexing motion features in form of joint-angle rotations extracted from a real-life 68-minute human motion database. We analyse the algorithm time complexity and evaluate retrieval effectiveness by comparing the search results against user-defined ground truth. The algorithm is also incorporated in an online web application facilitating query definition and visualization of search results.
Paper 162: Semantic Approach in Image Change Detection
Change detection is a main issue in various domains, and especially for remote sensing purposes. Indeed, plethora of geospatial images are available and can be used to update geographical databases. In this paper, we propose a classification-based method to detect changes between a database and a more recent image. It is based both on an efficient training point selection and a hierarchical decision process. This allows to take into account the intrinsic heterogeneity of the objects and themes composing a database while limiting false detection rates. The reliability of the designed framework method is first assessed on simulated data, and then successfully applied on very high resolution satellite images and two land-cover databases.
Paper 166: Optimizing Contextual-based Optimum-Forest Classification through Swarm Intelligence
Several works have been conducted in order to improve classification problems. However, a considerable amount of them do not consider the contextual information in the learning process, which may help the classification step by providing additional information about the relation between a sample and its neighbourhood. Recently, a previous work have proposed a hybrid approach between Optimum-Path Forest classifier and Markov Random Fields (OPF-MRF) aiming to provide contextual information for this classifier. However, the contextual information was restricted to a spatial/temporal-dependent parameter, which has been empirically chosen in that work. We propose here an improvement of OPF-MRF by modelling the problem of finding such parameter as a swarm-based optimization task, which is carried out Particle Swarm Optimization and Harmony Search. The results have been conducted over the classification of Magnetic Resonance Images of the brain, and the proposed approach seemed to find close results to the ones obtained by an exhaustive search for this parameter, but much faster for that.
Paper 168: Moving Object Detection System in Aerial Video Surveillance
Moving object detection in aerial video, in which the camera is moving, is a complicated task. In this paper, we present a system to solve this problem by using scale invariant feature transform(SIFT) and Kalman Filter. Moving objects are detected by a feature point tracking method based on SIFT extraction and matching algorithm. In order to increase the precision of detection, some pre-processing methods are added to the surveillance system such as video stabilization and canny edge detection. Experimental results indicate that the suggested method of moving object detection can be achieved with a high detection ratio.
Paper 169: Visual Data Encryption for Privacy Enhancement in Surveillance Systems
In this paper a methodology for employing reversible visual encryption of data is proposed. The developed algorithms are focused on privacy enhancement in distributed surveillance architectures. First, motivation of the study performed and a short review of preexisting methods of privacy enhancement are presented. The algorithmic background, system architecture along with a solution for anonymization of sensitive regions of interest are described. An analysis of efficiency of the developed encryption approach with respect to visual stream resolution and the number of protected objects is performed. Experimental procedures related to stream processing on a single core, single node and multiple nodes of the supercomputer platform are also provided. The obtained results are presented and discussed. Moreover, possible future improvements of the methodology are suggested.
Paper 171: The Divide and Segment Method for Parallel Image Segmentation
Remote sensing images with large spatial dimensions are usual. Besides, they also include a diversity of spectral channels, increasing the volume of information. To obtain valuable information from remote sensing data, computers need higher amounts of memory and more efficient processing techniques. The first process in image analysis is segmentation, which identifies regions in images. Therefore, segmentation algorithms must deal with large amounts of data. Even with current computational power, certain image sizes may exceed the memory limits, which ask for different solutions. An alternative to overcome such limits is to employ the well-known divide and conquer strategy, by splitting the image into chunks, and segmenting each one individually. However, it arises the problem of merging neighboring chunks and keeping the homogeneity in such regions. In this work, we propose an alternative to divide the image into chunks by defining noncrisp borders between them. The noncrisp borders are computed based on Dijkstra algorithm, which is employed to find the shortest path between detected edges in the images. By applying our method, we avoid the postprocessing of neighboring regions, and therefore speed up the final segmentation.
Paper 173: Unsupervised Segmentation for Transmission Imaging of Carbon Black
During the last few years, the development of nanomaterials increases in many fields of sciences (biology, material, medicine…) to control physical-chemical properties. Among these materials, carbon black is the oldest one and is widely used as reinforcement filler in rubber products. Nevertheless, the interaction between nanoparticles and polymer matrix is poorly understood. In other words carbon black aggregate’s characteristics are usually obtained by poorly official indirect analyses. This article presents an image processing chain allowing subsequent characterization of the carbon black aggregates. A database of several hundred samples of carbon black images has been collected using transmission electron microscopy. A significant selection of images has been manually expertised for ground truth. Using supervised evaluation criteria, a comparative study is performed with state-of-the-art carbon black segmentation algorithms, highlighting the good performances of the proposed algorithm.
Paper 176: Small Target Detection Improvement in Hyperspectral Image
Target detection is an important issue in the hyperspectral image (HSI) processing eld. However, current spectral-identication-based target detection algorithms are sensitive to the noise and most denoising algorithms cannot preserve small targets, therefore it is neces- sary to design a robust detection algorithm that can preserve small targets. This paper utilizes the recently proposed multidimensional wavelet packet transform with multiway Wiener lter (MWPT-MWF) to improve the target detection eciency of HSI with small targets in the noise environment. The performances of the our method are exemplied using simulated and real-world HSI.
Paper 177: Fast Road Network Extraction from Remotely Sensed Images
This paper addresses the problem of fast, unsupervised road network extraction from remotely sensed images. We develop an approach that employs a fixed-grid, localized Radon transform to extract a redundant set of line segment candidates. The road network structure is then extracted by introducing interactions between neighbouring segments in addition to a data-fit term, based on the Bhattacharyya distance. The final configuration is obtained using simulated annealing via a Markov chain Monte Carlo iterative procedure. The experiments demonstrate a fast and accurate road network extraction on high resolution optical images of semi- urbanized zones, which is further supported by comparisons with several benchmark techniques.
Paper 178: Computational Methods for Selective Acquisition of Depth Measurements: an Experimental Evaluation
Acquisition of depth and texture with vision sensors finds numerous applications for objects modeling, man-machine interfaces, or robot navigation. One challenge resulting from rich textured 3D datasets resides in the acquisition, management and processing of the large amount of data generated, which often preempts full usage of the information available for autonomous systems to make educated decisions. Most subsampling solutions to reduce dataset’s dimension remain independent from the content of the model and therefore do not optimize the balance between the richness of the measurements and their compression. This paper experimentally evaluates the performance achieved with two computational methods that selectively drive the acquisition of depth measurements over regions of a scene characterized by higher 3D features density, while capitalizing on the knowledge readily available in previously acquired data. Both techniques automatically establish which subsets of measurements contribute most to the representation of the scene, and prioritize their acquisition. The algorithms are validated on datasets acquired from two different RGB-D sensors.
Paper 179: Tracking of a Handheld Ultrasonic Sensor for Corrosion Control on Pipe Segment Surfaces
The article describes a combination of optical 3-d measurement technique and ultrasonic testing for areawide acquisition of wall thickness on piping segments. It serves the purpose to match automatically ultrasonic measurements to their 3-d positions at the examined pipe segment. The sensor, which is normally occluded by hand, is equipped with a cap with a number of attached LEDs. A model based approach is presented to track these LEDs and to visualise the measured wall thickness at a three-dimensional surface. Therefore the model of the cap is fitted to the segmented image data, so that the 3-d position of the ultrasonic sensor can be derived from computed model orientation. Our approach works robustly and can build the basis for further applications in real industrial environments.
Paper 180: High Precision Restoration Method for Non-Uniformly Warped Images
This paper proposes a high accuracy image restoration technique to restore a quality image from the atmospheric turbulence degraded video sequence of a static scenery. This approach contains two major steps. In the first step, we employ a coarse-to-fine optical flow estimation technique to register all the frames of the video to a reference frame and determine the shift maps. In the second step, we use an iterative First Register Then Average And Subtract (iFRTAAS) method to correct the geometric distortions of the reference frame. We present a performance comparison between our proposed method and existing statistical method in terms of restoration accuracy. Simulation experiments show that our proposed method provides higher accuracy with substantial gain in processing time.
Paper 181: Automatic User-specific Avatar Parametrisation and Emotion Mapping
Automatic user performance-driven avatar animation is still a challenging task in computer vision due to unknown facial feature shifting during shown expressions. In this paper an automatic approach for user-specic 3D model generation and on-line expression classication based on a-priori knowledge as well as image and video analysis methods is presented. User-specic avatar generation from low-cost acquisition devices is implemented via intelligent combination of facial proportions knowledge and image analysis algorithms. Subsequently the user-specic avatar model builds up the basis for correct on-line expression classication with the help of o-line calculated facial feature shifting values. Consequently with this approach partial occlusions of a presented emotion do not hamper expression identication based on the symmetrical structure of human faces. Thus, a markerless, automatic and easy to use performance-driven avatar animation approach is presented.
Paper 182: Noise Robustness Analysis of Point Cloud Descriptors
In this paper, we investigate the effect of noise on 3D point cloud descriptors. Various types of point cloud descriptors have been introduced in the recent years due to advances in computing power, which makes processing point cloud data more feasible. Most of these descriptors describe the orientation difference between pairs of 3D points in the object and represent these differences in a histogram. Earlier studies dealt with the performances of different point cloud descriptors; however, no study has ever discussed the effect of noise on the descriptors performances. This paper presents a comparison of performance for nine different local and global descriptors amidst 10 varying levels of Gaussian and impulse noises added to the point cloud data. The study showed that 3D descriptors are more sensitive to Gaussian noise compared to impulse noise. Surface normal based descriptors are sensitive to Gaussian noise but robust to impulse noise. While descriptors which are based on point’s accumulation in a spherical grid are more robust to Gaussian noise but sensitive to impulse noise. Among global descriptors, view point features histogram (VFH) descriptor gives good compromise between accuracy, stability and computational complexity against both Gaussian and impulse noises. SHOT (signature of histogram of orientations) descriptor is the best among the local descriptors and it has good performance for both Gaussian and impulse noises.
Paper 183: Painting Scene Recognition Using Homogenous Shapes
This paper addresses the problem of semantic analysis of paintings by automatic detection of the represented scene type. The solution comes as an incipient effort to fill the gap already stated in the literature between the low level computational analysis and the high level semantic dependent human analysis of paintings. Inspired by the way humans perceive art, we first decompose the image in homogenous regions, follow by a step of region merging, in order to obtain a painting description by the extraction of perceptual features of the dominant objects within the scene. These features are used in a classification process that discriminates among 5 possible scene types on a database of 500 paintings.
Paper 184: Upper-Body Pose Estimation Using Geodesic Distances and Skin-Color
We propose a real-time capable method for human pose estimation from depth and color images that does not need any pre-trained pose classiers. The pose estimation focuses on the upper body, as it is the relevant part for a subsequent ges- ture and posture recognition and therefore the basis for a real human-machine-interaction. Using a graph-based representa- tion of the 3D point cloud, we compute geodesic distances be- tween body parts. The geodesic distances are independent of pose and allow the robust determination of anatomical land- marks which serve as input to a skeleton tting process using inverse kinematics. In case of degenerated graphs, landmarks are tracked locally with a meanshift algorithm based on skin color probability.
Paper 185: A New Approach for Hand Augmentation based on Patch Modelling
In this paper, a novel patch-based modelling approach is proposed to augment the 3D models over the non-planar structure of hand. We present a robust real-time framework which first segment the hand using skin color information. To offer the subjects flexible interface (i.e., no restriction of covered arm), segmented skin-region is refined through distance descriptor which gives the hand's palm. Next, we measure the geometrical components from the hand's structure and compute the hand's curvature to detect the fingertips. The palm centroid and detected fingertips contribute to the patch-based modelling for which optimal path is derived for every detected fingertip. Further, we compute the patches using the path points for every two neighbor detected fingers and then the camera poses are estimated for these patches. These different camera poses are finally combined to generate a single camera pose, and then 3D models are augmented on non-planar hand geometry. The experimental results show that our proposed approach is capable to detect hand, palm centroid and fingertips, as well as integrate the generated patches for augmented reality application in real situations which proves its applicability and usability in the domain of Human Computer Interaction.
Paper 189: Modelling Line and Edge Features Using Higher-Order Riesz Transforms
The 2D complex Riesz transform is an extension of the Hilbert transform to images. It can be used to model local image structure as a superposition of sinusoids, and to construct 2D steerable wavelets. In this paper we propose to model local image structure as the superposition of a 2D steerable wavelet at multiple amplitudes and orientations. These parameters are estimated by applying recent developments in super resolution theory. Using 2D steerable wavelets corresponding to line or edge segments then allows for the underlying structure of image features such as junctions and edges to be determined.
Paper 190: Restoration of Blurred Binary Images Using Discrete Tomography
Enhancement of degraded images of binary shapes is an important task in many image processing applications, e.g. to provide appropriate image quality for optical character recognition. Although many image restoration methods can be found in the literature, most of them are developed for grayscale images. In this paper we propose a novel binary image restoration algorithm. As a first step, it restores the projections of the shape using 1-dimensional deconvolution, then reconstructs the image from these projections using a discrete tomography technique. The method does not require any parameter setting or prior knowledge like an estimation of the signal-to-noise ratio. Numerical experiments on a synthetic dataset show that the proposed algorithm is robust to the level of the noise. The efficiency of the method has also been demonstrated on real out-of-focus alphanumeric images.
Paper 191: Adaptive Two Phase Sparse Representation Classifier for Face Recognition
Sparse Representation Classifier proved to be a powerful classifier that is more and more used by computer vision and signal processing communities. On the other hand, it is very computationally expensive since it is based on an L1 minimization. Thus, it is not useful for scenarios demanding a rapid decision or classification. For this reason, researchers have addressed other coding schemes that can make the whole classifier very efficient without scarifying the accuracy of the original proposed SRC. Recently, two-phase coding schemes based on classic Regularized Least Square were proposed. These two phase strategies can use different schemes for selecting the examples that should be handed over to the next coding phase. However, all of them use a fixed and predefined number for these selected examples making the performance of the final classifier very dependent on this ad hoc choice. This paper introduces three strategies for adaptive size selection associated with Two Phase Test Sample Sparse Representation classifier. Experiments conducted on three face datasets show that the introduced scheme can outperform the classic two-phase strategies. Although the experiments were conducted on face datasets, the proposed schemes can be useful for a broad spectrum of pattern recognition problems.
Paper 194: Real-time Depth Map Based People Counting
People counting is an important task in video surveillance applications. It can provide statistic information for shopping centers and other public buildings or knowledge of the current number of people in a building in a case of an emergency. This paper describes a real-time people counting system based on a vertical Kinect depth sensor. Processing pipeline of the system includes depth map improvement, a novel approach to head segmentation, and continuous tracking of head segments. The head segmentation is based on an adaptation of the region-growing segmentation approach with thresholding. The tracking of segments combines minimum-weighted bipartite graph matchings and prediction of object movement to eliminate inaccuracy of segmentation. Results of evaluatation realized on datasets from a shopping center (more than 23 hours of recordings) show that the system can handle almost all real-world situations with high accuracy.
Paper 195: An indoor RGB-D Dataset for Evaluation of Robot Navigation Algorithms
The paper presents a RGB-D dataset for development and evaluation of mobile robot navigation systems. The dataset was registered using a WiFiBot robot equipped with a Kinect sensor. Unlike the presently available datasets, the environment was specifically designed for the registration with the Kinect sensor. Moreover, it was ensured that the registered data is synchronized with the ground truth position of the robot. The presented dataset will be made publicly available for research purposes.
Paper 197: GPU-accelerated Human Motion Tracking using Particle Filter combined with PSO
This paper discusses how to combine particle filter (PF) with particle swarm optimization (PSO) to achieve better tracking. Owing to multi-swarm based mode seeking the algorithm is capable of maintaining multimodal probability distributions and the tracking accuracy is far better to accuracy of PF or PSO. We propose parallel resampling scheme for particle filtering running on GPU. We show the efficiency of the parallel PF-PSO algorithm on 3D model based human motion tracking. The 3D model is rasterized in parallel and single thread processes one column of the image. Such level of parallelism allows us to efficiently utilize the GPU resources and to perform tracking of the full human body at rates of 15 frames per second. The GPU achieves an average speedup of 7.5 over the CPU. For marker-less motion capture system consisting of four calibrated and synchronized cameras, the computations were conducted on four CPU cores and four GTX GPUs on two cards.
Paper 198: A Modification of Diffusion Distance for Clustering and Image Segmentation
Measuring the distances is an important problem in many image-segmentation algorithms. The distance should tell whether two image points belong to a single or, respectively, to two different image segments. The simplest approach is to use the Euclidean distance. However, measuring the distances along the image manifold seems to take better into account the facts that are important for segmentation. Geodesic distance, i.e. the shortest path in the corresponding graph or k shortest paths can be regarded as the simplest way how the distances along the manifold can be measured. At a first glance, one would say that the resistance and diffusion distance should provide the properties that are even better since all the paths along the manifold are taken into account. Surprisingly, it is not often true. We show that the high number of paths is not beneficial for measuring the distances in image segmentation. On the basis of analysing the problems of diffusion distance, we introduce its modification, in which, in essence, the number of paths is restricted to a certain chosen number. We demonstrate the positive properties of this new metrics.
Paper 199: Tree Symbols Detection for Green Space Estimation
Geodetic base maps are very detailed sources of information. However, such maps are created for specialists and incomprehensible to non-professionals. An example of information that can be useful for citizen is change of urban green spaces. Such spaces, valuable for a local society can be destroyed by developers or a local government. Therefore, a monitoring of green areas is an important task that can be done on the basis of maps from Geodetic Documentation Centres. Unfortunately, the most popular form of digital documentations is a bitmap.
This work presents a feasibility study of green areas estimation from scanned maps. The solution bases on symbols detection. Two kinds of symbols (coniferous and deciduous trees) are recognised by the following algorithm. Dots from centres of symbols are detected and their neighbourhood is extracted. Specific features are calculated as an input for neural networks that detect tree symbols. The accuracy of the detection is 90 percent, which is good enough to estimate green areas.
Paper 200: Magnitude Type Preserving Similarity Measure for Complex Wavelet Based Image Registration
Most of the similarity measures, currently used for image registration, aim to model the relation between the intensities of correspondent pixels. This is true even for some of the similarity measures that are not directly defined on the intensities of the images to be registered, but on some transformed version of those images. A potential problem with this approach is that it relies on the values of intensities, for which in most of the times it is too difficult to predict their pattern of variation. A way to circumvent this problem is to define a similarity measure on the domain of the magnitudes of complex wavelet coefficients, as the magnitudes are less affected by noise than the intensities. This property of robustness to noise allows to predict a certain behavior of the corresponding magnitudes, namely that they will preserve their type. This means that large (small) magnitudes from the complex wavelet transform of one image will correspond to large (small) magnitudes in the complex wavelet transform of the other image. Starting from this constancy in the behavior of complex wavelet magnitudes, we propose a new similarity measure that has sub pixel accuracy, robustness to noise and is faster than the most related known similarity measure.
Paper 201: Semantic Concept Detection Using Dense Codeword Motion
When detecting semantic concepts in video, much of the existing research in content-based classification uses keyframe information only. Particularly the combination between local features such as SIFT and the Bag of Words model is very popular with TRECVID participants. The few existing motion and spatiotemporal descriptors are computationally heavy and become impractical when applied on large datasets such as TRECVID. In this paper, we propose a way to efficiently combine positional motion obtained from optic flow in the keyframe with information given by the Dense SIFT Bag of Words feature. The features we propose work by spatially binning motion vectors belonging to the same codeword into separate histograms describing movement direction (left, right, vertical, zero, etc.). Classifiers are mapped using the homogeneous kernel map techinque for approximating the chi2 kernel and then trained efficiently using linear SVM. By using a simple linear fusion technique we can improve the Mean Average Precision of the Bag of Words DSIFT classifier on the TRECVID 2010 Semantic Indexing benchmark from 0.0924 to 0.0972, which is confirmed to be a statistically significant increase based on standardized TRECVID randomization tests.
Paper 203: Hierarchical Layered Mean Shift Methods
Many image processing tasks exist and segmentation is one of them. We are focused on the mean-shift segmentation method. Our goal is to improve its speed and reduce the over-segmentation problem that occurs with small spatial bandwidths. We propose new mean-shift method called Hierarchical Layered Mean Shift. It uses hierarchical preprocessing stage and stacking hierarchical segmentation outputs together to minimise the over-segmentation problem.
Paper 206: Hidden Markov Models for Modeling Occurrence Order of Facial Temporal Dynamics
The analysis of facial expression temporal dynamics is of great importance for many real-world applications. Furthermore, due to the variability among individuals and different contexts, the dynamic relationships among facial features are stochastic. Systematically capturing such temporal dependencies among facial features and incorporating them into the facial expression recognition process is especially important for interpretation and understanding of facial behaviors. The base system in this paper uses Hidden Markov Models (HMMs) and a new set of derived features from geometrical distances obtained from detected and automatically tracked facial points. We propose here to transform numerical representation which is in the form of multi time series to a symbolic representation in order to reduce dimensionality, extract the most pertinent information and give a meaningful representation to human. Experiments show that new and interesting results have been obtained from the proposed approach.
Paper 207: Performance Evaluation of Video Analytics for Surveillance on-board Trains
Real-time video-surveillance systems are nowadays widespread in several applications, including public transportation. In those applications, the use of automatic video content analytics (VCA) is being increasingly adopted to support human operators in control rooms. However, VCA is only effective when its performances are such to reduce the number of false positive alarms below acceptability thresholds while still detecting events of interest. In this paper, we report the results of the evaluation of a VCA system installed on a rail transit vehicle. With respect to fixed installations, on-board ones feature specific constraints on camera installation, obstacles, environment, etc. Several VCA performance evaluation metrics have been considered, both frame-based and object-based, computed by a tool developed in Matlab. We compared the results obtained using a commercial VCA system with the ones produced by an open-source one, showing the higher performance of the former in all test conditions
Paper 208: Human Motion Capture using Data Fusion of Multiple Skeleton Data
Joint advent of affordable color and depth sensors and super-realtime skeleton detection, has produced a surge of research on Human Motion Capture. They provide a very important key to communication between Man and Machine. But the design was willing and closed-loop interaction, which allowed approximations and mandates a particular sensor setup. In this paper, we present a multiple sensor-based approach, designed to augment the robustness and precision of human joint positioning, based on delayed logic and filtering, of skeleton detected on each sensor.
Paper 209: Person Detection with a Computation Time Weighted AdaBoost and Heterogeneous Pool of Features
In this paper, a boosted cascade person detection framework with heterogeneous pool of features is presented. The boosted cascade construction and feature selection is carried out using a modified AdaBoost that takes computation time of features into consideration. The final detector achieves a low Miss Rate of 0.06 at 0.001 False Positive Per Window on the INRIA public dataset while achieving an average speed up of 1.8x on the classical variant.
Paper 210: Globally Segmentation using Active Contours and Belief Functions
We study the active contours (AC) based globally segmentation for vector-valued image incorporating both statistical and evidential knowledge. The proposed method combine both iBelief Functions (BFs) and probabistic function in the same framework. In this formulation, all features issued from vector-valued image are integrated in inside/outside descriptors to drive the segmentation process based AC. In this formulation, the imprecision caused by the weak contrast and noise between inside and outside descriptors issued from the multiple channels is controled by the BFs as weighted parameters. We demonstrated the performance of our segmentation algorithm using some challenging color biomedical images
Paper 215: Minimum Memory Vectorisation of Wavelet Lifting
With the start of the widespread use of discrete wavelet transform the need for its effective implementation is becoming increasingly more important. This work presents a novel approach to discrete wavelet transform through a new computational scheme of wavelet lifting. The presented approach is compared with two other. The results are obtained on a general purpose processor with 4-fold SIMD instruction set (such as Intel x86-64 processors). Using the frequently exploited CDF 9/7 wavelet, the achieved speedup is about 3 times compared to naive implementation.