Acivs 2016 Abstracts

This page is regenerated automatically every 60 minutes.

Regular papers

Paper 105: Gradients versus Grey Values for Sparse Image Reconstruction and Inpainting-Based Compression

Author(s): Markus Schneider, Pascal Peter, Sebastian Hoffmann, Joachim Weickert, Enric Meinhardt-Llopis

Interpolation methods that rely on partial differential equations can reconstruct images with high quality from a few prescribed pixels. A whole class of compression codecs exploits this concept to store images in terms of a sparse grey value representation. Recently, Brinkmann et al. (2015) have suggested an alternative approach: They propose to store gradient data instead of grey values. However, this idea has not been evaluated and its potential remains unknown. In our paper, we compare gradient and grey value data for homogeneous diffusion inpainting w.r.t. two different aspects: First, we evaluate the reconstruction quality, given a comparable amount of data of both kinds. Second, we assess how well these sparse representations can be stored in compression applications. To this end, we establish a framework for optimising and encoding the known data. It allows a fair comparison of both the grey value and the gradient approach. Our evaluation shows that gradient-based reconstructions avoid visually distracting singularities involved in the reconstructions from grey values, thus improving the visual fidelity. Surprisingly, this advantage does not carry over to compression due to the high sensitivity to quantisation.

Paper 107: Global Bilateral Symmetry Detection Using Multiscale Mirror Histograms

Author(s): Mohamed Elsayed Elawady, Cecile Barat, Christophe Ducottet, Philippe Colantoni

In recent years, there has been renewed interest in bilateral symmetry detection in images. It consists in detecting the main bilateral symmetry axis inside artificial or natural images. State-of-the-art methods combine feature point detection, pairwise comparison and voting in Hough-like space. In spite of their good performance, they fail to give reliable results over challenging real-world and artistic images. In this paper, we propose a novel symmetry detection method using multi-scale edge features combined with local orientation histograms. An experimental evaluation is conducted on public datasets plus a new aesthetic- oriented dataset. The results show that our approach outperforms all other concurrent methods.

Paper 112: Neural Network Boundary Detection for 3D Vessel Segmentation

Author(s): Robert Palmer, Xianghua Xie

Conventionally, hand-crafted features are used to train machine learning algorithms, however choosing useful features is not a trivial task as they are very much data-dependent. Given raw image intensities as inputs, supervised neural networks (NNs) essentially learn useful features by adjusting the weights of its nodes using the back-propagation algorithm. In this paper we investigate the performance of NN architectures for the purpose of boundary detection, before integrating a chosen architecture in a data-driven deformable modelling framework for full segmentation. Boundary detection performed well, with boundary sensitivity of > 88% and specificity of > 85% for highly obscured and diffused lymphatic vessel walls. In addition, the vast majority of all boundary-classified pixels were in the immediate vicinity of the ground truth boundary. When integrated into a 3D deformable modelling framework it produced an area overlap with the ground truth of > 98%, and both point-to-mesh and Hausdorff distance errors were less than other approaches. To this end it has been shown that NNs are suitable for boundary detection in deformable modelling, where object boundaries are obscured, diffused and low in contrast.

Paper 115: A Simple Human Activity Recognition Technique using DCT

Author(s): Aziz Khelalef, Fakhreddine Ababsa, Nabil Benoudjit

In this paper, we present a simple new human activity recognition method using discrete cosine transform (DCT). The scheme uses the DCT transform co-efficients extracted from silhouettes as descriptors (features) and performs frame-by-frame recognition, which make it simple and suitable for real time applica-tions. We carried out several tests using radial basis neural network (RBF) for classification, a comparative study against stat-of-the-art methods shows that our technique is faster, simple and gives higher accuracy performance comparing to discrete transform based techniques and other methods proposed in literature.

Paper 116: Hand Gesture Recognition using Infrared Imagery Provided by Leap Motion Controller

Author(s): Tomás Mantecón, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso García

Hand gestures are one of the main alternatives for Human-Computer Interaction. For this reason, a hand gesture recognition system using near- infrared imagery acquired by a Leap Motion sensor is proposed. The recognition system directly characterizes the hand gesture by computing a global image descriptor, called Depth Spatiograms of Quantized Patterns, without any hand segmentation stage. To deal with the high dimensionality of the image descriptor, a Compressive Sensing framework is applied, obtaining a manageable image feature vector that almost preserves the original information. Finally, the resulting reduced image descriptors are analyzed by a set of Support Vectors Machines to identify the performed gesture independently of the precise hand location in the image. Promising results have been achieved using a new hand-based near-infrared database.

Paper 119: Horizon line detection from fisheye images using color local image region descriptors and Bhattacharyya coefficient-based distance

Author(s): Youssef El merabet, Yassine Ruichek, Saman Ghaffarian, Zineb Samir, Tarik Boujiha, Raja Touahni, Rochdi Messoussi

Several solutions allowing to compensate the lack of performance of GNSS (Global Navigation Satellites Systems) occurring when operating in constrained environments (dense urbain areas) have emerged in recent years. Characterizing the environment of reception of GNSS signals using a fisheye camera oriented to the sky is one of these relevant solutions. The idea consists in determining LOS (Line-Of-Sight) satellites and NLOS (Nonline-Of-Sight) satellites by classifying the content of acquired images into two regions (sky and not-sky). In this paper, aimed to make this approach more effective, we propose a region-based image classification technique through Bhattacharyya coefficient-based distance and local image region descriptors. The proposed procedure is composed of four major steps: (i) A simplification step that consists in simplifying the acquired image with an appropriate couple of colorimetric invariant and exponential transform. (ii) The second step consists in segmenting the simplified image in different regions of interest using Statistical Region Merging segmentation method. (iii) In the third step, the segmented regions are characterized with a number of local color image region descriptors. (iv) The fourth step introduces the supervised $mathcal{MSRC}$ (Maximal Similarity Based Region Classification) method by using Bhattacharyya coefficient-based distance to classify the characterized regions into sky and non sky regions. Experimental results prove the robustness and performance of the proposed procedure according to the proposed group of color local image region descriptors.

Paper 120: Joint Segmentation Of Myocardium On Multi State Spect Images

Author(s): Marc Filippi, Michel Desvignes, Anastasia Bozok, Gilles Barone-Rochette, Daniel Fagret, Laurent Riou, Catherine Ghezzi

This paper presents a level set segmentation of the myocardium, endocardium and epicardium surfaces of the heart from 2D SPECT rest and stress perfusion images of the same patient to compute a heterogeneity index. Cardiac SPECT images have low resolution, low signal to noise ratio and lack of anatomical information. So accurate segmentation is difficult. The proposed method adds joint constraints of shape, parallelism and intensity in a level-set framework to simultaneously extract myocardium from rest and stress images. Results are compared to classical level-set segmentation.

Paper 121: Parallel Hough space image generation method for real time lane detection

Author(s): Hee-Soo Kim, Seung-Hae Beak, Soon-Yong Park

This paper proposes a new parallelization method to generate Hough space images for real time lane detection, using the new NVIDIA Jetson TK1 board. The computation cost in Standard Hough Transform is relatively high due to its higher amount of unnecessary operations. Therefore, in this paper, we introduce an enhanced Hough image generation method to reduce computation time for real time lane detection purposes, and reduce all the unnecessary operations exist in the Standard method. We implemented our proposed method in both CPU and GPU based platforms and compared the processing speeds with the Standard method. The experiment results induce that the proposed method runs 10 times faster than the existing method in CPU platform, whereas 60 times faster in the GPU platform.

Paper 122: A Novel Decentralised System Architecture for Multi-Camera Target Tracking

Author(s): Gaetano Di Caterina, Trushali Doshi, John Soraghan, Lykourgos Petropoulakis

Target tracking in a multi-camera system is an active and challenging research that in many systems requires video synchronisation and knowledge of the camera set-up and layout. In this paper a highly flexible, modular and decentralised system architecture is presented for multi-camera target tracking with relaxed synchronisation constraints among camera views. Moreover, the system does not rely on positional information to handle camera hand-off events. As a practical application, the system itself can, at any time, automatically select the best target view available, to implicitly solve occlusion. Further, to validate the proposed architecture, an extension to a multi-camera environment of the colour-based IMS-SWAD tracker is used. The experimental results show that the tracker can successfully track a chosen target in multiple views, in both indoor and outdoor environments, with non-overlapping and overlapping camera views.

Paper 123: Intramolecular FRET efficiency measures for time-lapse fluorescence microscopy images

Author(s): Mark Holden

Here we investigate quantitative measures of Foerster resonance energy transfer (FRET) efficiency that can be used to quantify protein-protein interactions using fluorescence microscopy images of living cells. We adopt a joint intensity space approach and develop a parametric shot noise model to estimate the fractional uncertainty of FRET efficiency on a per pixel basis. We evaluate our metrics rigorously by simulating photon emission events corresponding to typical conditions and demonstrate advantages of our metrics over the conventional ratiometric one. In particular, our measure is linear, normalised and has greater tolerance to low SNR characteristic of FRET fluorescence microscopy images.

Paper 126: Predicting Image Aesthetics with Deep Learning

Author(s): Simone Bianco, Luigi Celona, Paolo Napoletano, Raimondo Schettini

In this paper we investigate the use of a deep Convolutional Neural Network (CNN) to predict image aesthetics. To this end we fine- tune a canonical CNN architecture, originally trained to classify objects and scenes, by casting the image aesthetic prediction as a regression problem. We also investigate whether image aesthetic is a global or local attribute, and the role played by bottom-up and top-down salient regions to the prediction of the global image aesthetic. Experimental results on the canonical Aesthetic Visual Analysis (AVA) dataset show the robust- ness of the solution proposed, which outperforms the best solution in the state of the art by almost 17% in terms of Mean Residual Sum of Squares Error (MRSSE).

Paper 127: Automatic Image Splicing Detection Based On Noise Density Analysis In Raw Images

Author(s): Thibault Julliand, Vincent Nozick, Hugues Talbot

Image splicing is a common manipulation which consists in copying part of an image in a second image. In this paper, we exploit the variation in noise characteristics in spliced images, caused by the difference in camera and lighting conditions during the image acquisition. The proposed method automatically gives a probability of alteration for any area of the image, using a local analysis of noise density. We consider both Gaussian and Poisson noise components to modelize the noise in the image. The efficiency and robustness of our method is demonstrated on a large set of images generated with an automated splicing.

Paper 129: Breast Shape Parametrization through Planar Projections

Author(s): Giovanni Gallo, Dario Allegra, Yaser Gholizade Atani, Filippo Milotta, Filippo Stanco, Giuseppe Catanuto

In the last years, 3D scanning has replaced the low tech approach to acquire direct anthropometric measurements. These new methodologies provide a detailed digital model of the body and allow analysis of more complex information like volume, shape, curvature, and so on. The possibility to acquire the shape of soft tissues, such as the female human breast, has attracted the interest breast surgery specialists. The main aim of this work is to propose an innovative strategy to automatically analyze 3D breast shape in order to describe them within a quantitative well defined framework. In particular we propose a scanning procedure for a proper acquisition of breast surfaces by using the handheld scanner Structure Sensor, as well as a framework to process 3D digital data to extract the shape information. The proposed method consists in two main parts: firstly, the acquired digital 3D surfaces are projected in a 2D space and a set of 17 geometrical landmarks are extracted; then by exploiting Thin Plate Splines and Principal Components Analysis the original data are summarised and the breast shape is described by a small set of numerical parameters.

Paper 130: Decreasing Time Consumption of Microscopy Image Segmentation through Parallel Processing on the GPU

Author(s): Joris Roels, Jonas De Vylder, Yvan Saeys, Bart Goossens, Wilfried Philips

The computational performance of graphical processing units (GPUs) has improved significantly. Achieving speedup factors of more than 50x compared to single-threaded CPU execution are not uncommon due to parallel processing. This makes their use for high throughput microscopy image analysis very appealing. Unfortunately, GPU programming is not straightforward and requires a lot of programming skills and effort. Additionally, the attainable speedup factor is hard to predict, since it depends on the type of algorithm, input data and the way in which the algorithm is implemented. In this paper, we identify the characteristic algorithm and data-dependent properties that significantly relate to the achievable GPU speedup. We find that the overall GPU speedup depends on three major factors: 1) the coarse-grained parallelism of the algorithm, 2) the size of the data and 3) the computation/memory transfer ratio. This is illustrated on two types of well-known segmentation methods that are extensively used in microscopy image analysis: SLIC superpixels and high-level geometric active contours. In particular, we find that our used geometric active contour segmentation algorithm is very suitable for parallel processing, resulting in acceleration factors of 50x for 0.1 megapixel images and 100x for 10 megapixel images.

Paper 132: Coral reef fish detection and recognition in underwater videos by supervised machine learning : Comparison between Deep Learning and hog+svm methods

Author(s): Sebastien Villon, Marc Chaumont, Gérard Subsol, David Mouillot, Sébastien Villéger, Thomas Claverie

In this paper, we present two supervised image processing method to detect and recognize coral reef fishes in underwater HD videos. The first method relies on a traditional two steps approach: the extraction of HOG features, and the use of an SVM classifier. The second method is based on Deep Learning. We compare the results of the two methods on real data and discuss their strengths and weaknesses.

Paper 133: A Real-time Eye Gesture Recognition System Based on Fuzzy Inference System for Mobile Devices Monitoring

Author(s): Hanene Elleuch, Ali Wali, Anis Samet, Adel M. Alimi

In this paper, we proposed a new system of mobile human-computer interaction based on eye gestures. This system aims to control and command mobile devices through the eyes for the purpose of providing an intuitive communication with these devices and a flexible usage with all contexts that a user can be situated. This system is based on a real-time streaming video captured from the front-facing camera without needing any additional equipment. The algorithm aims in the first time to detect user's face and their eyes in the second time. The eyes gesture recognition is based on fuzzy inference system. We deployed this algorithm on an android-based tablet and we asked 8 volunteers to test it. The obtained results proved that this system has promising and competitive results.

Paper 134: Spatially Varying Weighting Function-based Global and Local Statistical Active Contours. Application to X-ray Images

Author(s): Aicha Baya Goumeidane, Nafaa Nacereddine

Image segmentation is a crucial task in the image processing field. This paper presents a new region-based active contour which handles global information as well as local one, both based on the pixels intensities. The trade-off between these information is achieved by a spatially varying function computed for each contour node location. The application preliminary results of this method on computed tomography and X-ray images show outstanding and efficient object extraction.

Paper 135: Vegetation segmentation in cornfield images using bag of words

Author(s): Yerania Campos, Erik Rodner, Joachim Denzler, Humberto Sossa, Gonzalo Pajares

We provide an alternative methodology for vegetation segmentation in cornfield images. The process includes two main steps, which makes the main contribution of this approach: a) a low-level segmentation and b) a class label assignment using Bag of Words (BoW) representation in conjunction with a supervised learning framework. The experimental results show our proposal is adequate to extract green plants in images of maize fields. The accuracy for lassification is 95.3 percent which is comparable to values in current literature.

Paper 138: Fast Traffic Sign Recognition Using Color Segmentation and Deep Convolutional Networks

Author(s): Ali Youssef, Dario Albani, Daniele Nardi, Domenico Daniele Bloisi

The use of Computer Vision techniques for the automatic recognition of road signs is fundamental for the development of intelligent vehicles and advanced driver assistance systems. In this paper, we describe a procedure based on color segmentation, Histogram of Oriented Gradients (HOG), and Convolutional Neural Networks (CNN) for detecting and classifying road signs. Detection is speeded up by a pre-processing step to reduce the search space, while classification is carried out by using a Deep Learning technique. A quantitative evaluation of the proposed approach has been conducted on the well-known German Traffic Sign data set and on the novel Data set of Italian Traffic Signs (DITS), which is publicly available and contains challenging sequences captured in adverse weather conditions and in an urban scenario at night-time. Experimental results demonstrate the effectiveness of the proposed approach in terms of both classification accuracy and computational speed.

Paper 141: The Orlando Project: a 28 nm FDSOI Low Memory Embedded Neural Network ASIC

Author(s): Giuseppe Desoli, Valeria Tomaselli, Emanuele Plebani, Giulio Urlini, Danilo Pau, Viviana D'Alto, Tommaso Majo, Fabio De Ambroggi, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Nitin Chawla

The recent success of neural networks in various computer vision tasks open the possibility to add visual intelligence to mobile and wearable devices; however, the stringent power requirements are unsuitable for networks run on embedded PUs or GPUs. To address such challenges, STMicroelectronics developed the Orlando Project, a new and low power architecture for convolutional neural network acceleration suited for wearable devices. An important contribution to the energy usage is the storage and access to the neural network parameters. In this paper, we show that with adequate model compression schemes based on weight quantization and pruning, a whole AlexNet network can fit in the local memory of an embedded processor, thus avoiding additional system complexity and energy usage, with no or low impact on the accuracy of the network. Moreover, the compression methods work well across different tasks, e.g. image classification and object detection.

Paper 142: Factor Analysis of Dynamic Sequence with Spatial Prior for 2D Cardiac Spect Sequences Analysis

Author(s): Marc Filippi, Michel Desvignes, Eric Moisan, Catherine Ghezzi, Pascale Perret, Daniel Fagret

Unmixing is often a necessary step to analyze 2D SPECT image sequence. However, factor analysis of dynamic sequences (FADS), the commonly used method for unmixing SPECT sequences, suffers from non-uniqueness issue. Optimization-based methods were developed to overcome this issue. These methods are effective but need improvement when the mixing is important or with very low SNR. In this paper, a new objective function using soft spatial prior knowledge is developed. Comparison with previous methods, efficiency and robustness to the choice of priors are illustrated with tests on synthetic dataset. Results on 2D SPECT sequences with high level of noise are also presented and compared.

Paper 145: Soccer Player Detection with Only Color Features Selected Using Informed Haar-like Features

Author(s): Ryusuke Miyamoto, Takuro Oki

Player detection is important for tactical analysis, sports science, and video broadcasting, which is one of the practical applications of human detection. For human detection, filtered channel features shows better accuracy than methods based on deep learning. Considering the results on human detection, we constructed a detector having good balance between accuracy and computational speed for soccer players and using only color features to train a strong classifier. Experimental results using the PETS2003 dataset show that the proposed method can achieve about a 1.28 % miss rate at 0.1 FPPI, which is extremely good accuracy.

Paper 150: Person Re-identification in frontal gait sequences via Histogram of Optic flow Energy Image

Author(s): Athira Nambiar, Jacinto C. Nascimento, Alexandre Bernardino, Jose Santos-Victor

In this work, we propose a novel methodology of re-identifying people in frontal video sequences, based on a spatio-temporal representation of the gait based on optic flow features, which we call Histogram Of Flow Energy Image (HOFEI). Optic Flow based methods do not require the silhouette computation thus avoiding image segmentation issues and enabling online re- identication (Re-ID) tasks. Not many works addressed Re-ID with optic flow features in frontal gait. Here, we conduct an extensive study on CASIA dataset, as well as its application in a realistic surveillance scenario- HDA Person dataset. Results show, for the first time, the feasibility of gait re-identication in frontal sequences, without the need for image segmentation.

Paper 153: A Bayesian approach to linear unmixing in the presence of highly mixed spectra

Author(s): Bruno Figliuzzi, Santiago Velasco-Forero, Michel Bilodeau, Jesus Angulo

In this article, we present a Bayesian algorithm for endmember extraction and abundance estimation in situations where prior information is available for the abundances. The algorithm is considered within the framework of the linear mixing model. The novelty of this work lies in the introduction of bound parameters which allow us to introduce prior information on the abundances. The estimation of these bound parameters is performed using a simulated annealing algorithm. The algorithm is illustrated by simulations conducted on synthetic AVIRIS spectra and on the SAMSON dataset.

Paper 155: Key frames extraction based on local features for efficient video summarization

Author(s): Hana Gharbi, Mohamed Massaoudi, Sahbi Bahroun, Ezzeddine Zagrouba

Key frames are the most representative images of a video. They are used in different areas in video processing, such as indexing, retrieval and summarization. In this paper we propose a novel approach for key frames extraction based on local feature description. This approach will be used to sum-marize the salient visual content of videos. First, we start by generating a set of candidate keyframes. Then we detect interest points for all these candidate frames. After that we will compute repeatability between them and stock the repeatability values in a matrix. Finally we will model repeatability table by an oriented graph and the selection of keframe is inspired from shortest path algorithm A*. Realized experiments on challenging videos show the efficiency of the proposed method: it demonstrates that it is able to prevent the redundancy of the extracted key frames and maintain minimum requirements in terms of memory space.

Paper 158: A simple evaluation procedure for range camera measurement quality

Author(s): Boris Bogaerts, Rudi Penne, Seppe Sels, Bart Ribbens, Steve Vanlanduit

Range cameras suffer from both systematic and random errors. We present a procedure to evaluate both types of error separately in one test. To quantify the systematic errors, we use an industrial robot to provide a ground truth motion of the range sensor. We present an error metric that compares this ground truth motion with the calculated motion, using the range data of the range sensor. The only item present in the scene is a white plane that we move in different positions during the experiment. This plane is used to compute the range sensor motion for the purpose of systematic error measurement, as well as to quantify the random error of the range sensor. As opposed to other range camera evaluation experiments this method does not require any extrinsic system calibration, high quality ground truth test scene or complicated test objects. Finally, we performed the experiment for three common Time-of-flight (TOF) cameras: Kinect One, Mesa SR4500 and IFM 03D303 and compare their performance.

Paper 159: Accordion Representation based Multi-scale Covariance Descriptor for Multi-shot Person Re-identification

Author(s): Bassem Hadjkacem, Walid Ayedi, Mohamed Abid

Multi-shot person re-identification is a major challenge because of the large variations in a human’s appearance caused by different types of noise such as occlusion, viewpoint and illumination variations. In this paper, we presented the accordion representation based multi-scale covariance descriptor, called AR-MSCOV descriptor, which considers in the first step an image sequence containing a walking human to convert it in one image with the accordion representation. To better exploit the spatial and temporal correlation of the video sequence and to deal with the different types of noise, it applies quadtree decomposition and extracts multi-scale appearance features such as color, gradient and Gabor in a simple pass. This AR-MSCOV descriptor merges the static regions and captures the moving regions of interest. Therefore, it implicitly encodes the described human gait as a behavioral biometric with the appearance features through the accordion representation to reliably identify any person in motion. We evaluated the AR-MSCOV descriptor on the PRID 2011 multi-shot dataset and demonstrated a good performance in comparison with the current state-of-the-art.

Paper 165: Jensen Shannon divergence as reduced reference measure for image denoising

Author(s): Vittoria Bruni, Domenico Vitulano

This paper focuses on the use the Jensen Shannon divergence for guiding denoising. In particular, it aims at detecting those image regions where noise is masked; denoising is then inhibited where it is useless from the visual point of view. To this aim a reduced reference version of the Jensen Shannon divergence is introduced and it is used for determining a denoising map. The latter separates those image pixels that require to be denoised from those that have to be leaved unaltered. Experimental results show that the proposed method allows to improve denoising performance of some simple and conventional denoisers, in terms of both peak signal to noise ratio (PSNR) and structural similarity index (SSIM). In addition, it can contribute to reduce the computational effort of some performing denoisers, while preserving the visual quality of denoised images.

Paper 167: Visual Localization using Sequence Matching Based on Multi-feature Combination

Author(s): Yongliang Qiao, Cindy Cappelle, Yassine Ruichek

Visual localization in changing environment is one of the most challenging topics in computer vision and robotic community. The difficulty of this task is related to the strong appearance changes that occur in scenes due to presence of dynamic objects, weather or season changes. In this paper, we propose a new method which operates by matching query image sequences to an image database acquired previously (video acquired when the vehicle was traveling the environment). In order to improve matching accuracy, multi- feature is constructed by combining global GIST descriptor and local LBP descriptor to represent image sequence. Then, similarity measurement according to Chi-square distance is used for effective sequences matching. For experimental evaluation, we conducted study of the relationship between image sequence length and sequences matching performance.To show its effectiveness, the proposed method is tested and evaluated in four seasons outdoor environments. The results have shown improved precision-recall performance against state-of-the-art SeqSLAM algorithm.

Paper 170: Towards Automated Drone Surveillance in Railways: State-of-the-Art and Future Directions

Author(s): Francesco Flammini, Riccardo Naddei, Concetta Pragliola, Giovanni Smarra

The usage of UAV (Unmanned Aerial Vehicles), widely known as ‘drones’, is being increasingly investigated in a variety of surveillance scenarios. Being an emerging technology, several challenges still need to be tackled in order to make drones suitable in real applications with strict performance, dependability and privacy requirements. In particular, the monitoring of transit infrastructures represents one critical domain in which drones could be of huge help to reduce costs and possibly increase the granularity of surveillance. Furthermore, drones pave the way to the implementation of smart-sensing functionalities expanding current capabilities in railway monitoring, to support automation, safety of operations, prognostics and even forensic analyses. In this paper we provide a survey of current drone technology and their possible applications to automated railway surveillance, taking into account technical issues and environmental constraints. A current experimentation with drone intelligent video will be addressed, highlighting current results and future perspectives.

Paper 176: Combining Stacked Denoising Autoencoders and Random Forests for Face Detection

Author(s): Jingjing Deng, Xianghua Xie, Michael Edwards

Detecting faces in the wild is a challenging problem due to large visual variations introduced by uncontrolled facial expressions, head pose, illumination and so on. Employing strong classifier and designing more discriminative visual features are two main approaches to overcoming such difficulties. Notably, Deep Neural Network (DNN) based methods have been found to outperform most traditional detectors in a multitude of studies, employing deep network structures and complex training procedures. In this work, we propose a novel method that uses stacked denoising autoencoders (SdA) for feature extraction and random forests (RF) for object- background classification in a classical cascading framework. This architecture allows much simpler neural network structures, resulting in efficient training and detection. The proposed face detector was evaluated on two publicly available datasets and produced promising results.

Paper 177: Multimodal Registration of PET/MR Brain Images based on Adaptive Mutual Information

Author(s): Abir Baazaoui, Mouna Berrabah, Walid Barhoumi, Ezzeddine Zagrouba

Multimodal image registration remains a challenging task in medical image analysis, notably for PET/MR images since their combinations provide superior sensitivity and specificity, what improves the diagnosis quality. Mutual information (MI) is the commonly used multimodal image registration measure. Inasmuch as the traditional MI,based on Shannon entropy, does not integrate the spatial information such as edges and corners, an adaptation of MI is proposed in this work. The two main contributions are the incorporation of the spatial information through the curvelet transform and the avoiding of the binning problem using Gaussian probability density function. The objective behind this adaptation is to ignore the sensitivity to intensity permutations or pixel-to- pixel intensity transformations and to simultaneously handle the positive and negative intensity correlations. Realized experiments on PET/MR image datasets demonstrated the effectiveness of the proposed method for PET/MR image registration and showed its superiority over state-of-the-art methods.

Paper 182: Aerial detection in maritime scenarios using convolutional neural networks

Author(s): Gonçalo Cruz, Alexandre Bernardino

This paper presents a method to detect boats in a maritime surveillance scenario using a small aircraft. This method relies on Convolutional Neural Networks (CNNs) to perform robust detections even in the presence of distractors like wave crests and sun glare. The CNNs are pre-trained on large scale public datasets and then fine-tuned with domain specific images acquired in the maritime surveillance scenario. We study two variations of the method, with one being faster and the other one being more robust. The network's training procedure is described and the detection performance is evaluated in two different video sequences from UAV flights over the Atlantic ocean. The results are presented as precision-recall curves and computation time and are compared. We show experimentally that, as in many other domains of application, CNNs outperforms non-deep learning methods also in maritime surveillance scenarios.

Paper 186: R3P: Real-time RGB-D Registration Pipeline

Author(s): Hani Javan Hemmat, Egor Bondarev, Peter de With

Applications based on colored 3-D data sequences suffer from lack of efficient algorithms for transformation estimation and key points extraction to perform accurate registration and sensor localization either in the 2-D or 3-D domain. Therefore, we propose a real-time RGB-D registration pipeline, named R3P, presented in processing layers. In this paper, we present an evaluation of several algorithm combinations for each layer, to optimize the registration and sensor localization for specific applications. The resulting dynamic reconfigurability of R3P makes it suitable as a front-end system for any SLAM reconstruction algorithm. Evaluation results on several public datasets reveal that R3P delivers real-time registration with 59 fps and high accuracy with the relative pose error (for a time span of 40 frames) for rotation and translation of 0.5 degree and 8 mm, respectively. All the heterogeneous dataset and implementations are publicly available under an open-source license.

Paper 187: Vector Quantization Enhancement for Computer Vision Tasks

Author(s): remi trichet, Noel O'Connor

This paper augments the Bag-of-Word scheme in several respects: we incorporate a category label into the clustering process, build classifier-tailored codebooks, and weight codewords according to their probability to occur. A size-adaptive feature clustering algorithm is also proposed as an alternative to k-means. Experiments on the PASCAL VOC 2007 challenge validate the approach for classical hard-assignment as well as VLAD encoding.

Paper 188: Learning Approaches for Parking Lots Classification

Author(s): Daniele Di Mauro, Sebastiano Battiato, Giuseppe Patané, Marco Leotta, Daniele Maio, Giovanni Farinella

The paper exploits the problem of empty vs. nonempty parking lots classification from images acquired by public cameras throught the comparison between a classic supervised learning method and a semisupervised learning one. Both approaches are based on convolutional neural networks paradigm. Experimental results point out that supervised method outperform the semisupervised approach already when few samples are used for training.

Paper 191: Video event detection based non-stationary Bayesian networks

Author(s): Christophe Gonzales, Rim Romdhane, Séverine Dubuisson

In this paper, we propose an approach for detecting events online in video sequences. This one requires no prior knowledge, the events being defined as spatio-temporal breaks. For this purpose, we propose to combine non-stationary dynamic Bayesian networks (nsDBN) to model the scene and particle filter (PF) to track objects in the sequence. In this framework, an event corresponds to a significant difference between a new particle set provided by PF and the sampled density encoded by the nsDBN. Whenever an event is detected, the particle set is exploited to learn a new nsDBN representing the scene. Unfortunately, nsDBNs are designed for discrete random variables and particles are instantiations of continuous ones. We therefore propose to discretize them using a new discretization method well suited for nsDBNs. Our approach has been tested on real video sequences and allowed to detect two different events (forbidden stop and fight).

Paper 193: Optimized Connected Components Labeling with Pixel Prediction

Author(s): Costantino Grana, Lorenzo Baraldi, Federico Bolelli

In this paper we propose a new paradigm for connected components labeling, which employs a general approach to minimize the number of memory accesses, by exploiting the information provided by already seen pixels, removing the need to check them again. The scan phase of our proposed algorithm is ruled by a forest of decision trees connected into a single graph. Every tree derives from a reduction of the complete optimal decision tree. Experimental results demonstrated that on low density images our method is slightly faster than the fastest conventional labeling algorithms.

Paper 195: Hierarchical Fast Mean-Shift Segmentation in Depth Images

Author(s): Milan Šurkala, Radovan Fusek, Michael Holuša, Eduard Sojka

Head position and head pose detection systems are very popular in recent times, especially with the rise of 3D cameras like Microsoft Kinect and Intel RealSense. The goal is to recognize and segment a head in depth data. The systems could also detect the direction in which the head is pointing and we use these data to improve the gaze direction detection system and provide useful information to allow detectors to work properly. We present Hierarchical Fast Blurring Mean Shift algorithm that is able to extract these data from 3D images in real-time from above mentioned cameras. We also present some modifications for effective reduction of the mean-shift dataset during the computation that allow us to increase the precision of the method. We use a hierarchical approach to reduce the dataset throughout the computation and to improve the speed.

Paper 196: Robust Color Watermarking Method Based On Clifford Transform

Author(s): Maroua Affes, Malek Sellami, Faouzi Ghorbel

In this paper, we propose a new watermarking scheme resistant to geometric attacks and JPEG compression. This method uses Fourier Clifford Transform and Harris interest points. First, we detect all circular Harris interest regions. Then, using the Delaunay-tessellation-based triangle matching method, we define robust interest region. Finally, the watermark is embedded into Clifford coefficients of robust interest region.

Paper 197: Action-02MCF: A robust space-time Correlation Filter for Action Recognition in clutter and adverse lighting conditions

Author(s): Anwaar Ulhaq, Xiaoxia Yin, Yanchun Zhang, Iqbal Gondal

Human actions are spatio-temporal visual events and recognizing human actions in different conditions is still a challenging computer vision problem. In this paper, we introduce a robust feature based space-time correlation filter, called Action- 02MCF (0'zero-aliasing' 2M 'Maximum Margin') for recognizing human actions in video sequences. This filter combines (i) the sparsity of spatio-temporal feature space, (ii) generalization of maximum margin criteria, (iii) enhanced aliasing free localization performance of correlation filtering and (iv) rich context of maximally stable space-time interest points into a single classifier. Its rich multi-objective function provides robustness, generalization and recognition as a single package. Action-02MCF can simultaneously localize and classify actions of interest even in clutter and adverse imaging conditions. We evaluate the performance of our proposed filter for challenging human action datasets. Experimental results verify the performance potential of our action-filter compared to other correlation filtering based action recognition approaches.

Paper 198: An Image Quality Metric With Reference For Multiply Distorted Image

Author(s): Aladine Chetouani

In this paper, we propose a global framework to estimate the quality of multiply degraded images with reference (Full Reference approach). Our method is based on features fusion using a Support Vector Regression (SVR) model. The selected features are here some quality indexes obtained by comparing the reference image and its degraded version. Some of these features are based on Human Visual System (HVS), while some others are based on structural information or mutual information. The proposed method has been evaluated through the LIVE Multiply Distorted Image Quality Database, composed of 450 degraded images. The obtained results are compared to 12 recent image quality metrics.

Paper 200: 3D Planar RGB-D SLAM System

Author(s): Hakim ElChaoui ElGhor, David Roussel, Fakhreddine Ababsa, El-Houssine Bouyakhf

Applications such as Simultaneous Localisation and Mapping (SLAM) can greatly benefit from RGB-D sensor data to produce 3D maps of the environment as well as sensor's trajectory estimation. However, the resulting 3D points map can be cumbersome, and since indoor environments are mainly composed of planar surfaces, the idea is to use planes as building blocks for a SLAM process. This paper describes an RGB-D SLAM system benefiting from planes segmentation to generate lightweight 3D plane-based maps. Our goal is to produce reduced 3D maps composed solely of planes sections that can be used on platforms with limited memory and computation resources. We present the introduction of planar regions in a regular RGB-D SLAM system and evaluate the benefits regarding both resulting map and estimated camera trajectory.

Paper 203: Towards a generic m-svm parameters estimation using overlapping swarm intelligence for handwritten characters recognition

Author(s): Marwa Amara, Kamel Zidi, Khaled Ghedira

Support vector machines (SVM) is a statistical classification approach which has been successfully applied to solve various types of problems in pattern recognition. However, it has remained largely unexplored for Arabic recognition. It has been proved to be a good tool for multi-classification issues related to machine learning. But, the performance of the SVM depends solely upon the appropriate choice of parameters. Hence, particle swarm optimization (PSO) technique is employed in tuning SVM parameters. The proposed SVM-PSO model is used to solve the Arabic characters recognition problem. The selected models are compared in terms of the testing time and accuracy.

This study employs support vector machines in the Isolated Farsi/Arabic Character Database (IFHCDB) recognition. Experimental results have proven that PSO could be a good alternative for predicting SVM parameters.

Paper 204: Human Action Recognition Based on Temporal Pyramid of Key Poses Using RGB-D Sensors

Author(s): Enea Cippitelli, Ennio Gambi, Susanna Spinsante, Francisco Florez-Revuelta

Human action recognition is a hot research topic in computer vision, mainly due to the high number of related applications, such as surveillance, human computer interaction, or assisted living. Low cost RGB-D sensors have been extensively used in this field. They can provide skeleton joints, which represent a compact and effective representation of the human posture. This work proposes an algorithm for human action recognition where the features are computed from skeleton joints. A sequence of skeleton features is represented as a set of key poses, from which histograms are extracted. The temporal structure of the sequence is kept using a temporal pyramid of key poses. Finally, a multi-class SVM performs the classification task. The algorithm optimization through evolutionary computation allows to reach results comparable to the state-of-the-art on the MSR Action3D dataset.

Paper 205: Multi-layer Dictionary Learning for Image Classification

Author(s): Stefen Chan Wai Tim, Michèle Rombaut, Denis Pellerin

This paper presents a multi-layer dictionary learning method for classification tasks. The goal of the proposed multi-layer framework is to use the supervised dictionary learning approach locally on raw images in order to learn local features. This method starts by building a sparse representation at the patch-level and relies on a hierarchy of learned dictionaries to output a global sparse representation for the whole image. It relies on a succession of sparse coding and pooling steps in order to find an efficient representation of the data for classification. This method has been tested on a classification task with good results.

Paper 206: Intelligent Vision System for ASD Diagnosis and Assessment

Author(s): Marco Leo, Marco Del Coco, Pierluigi Calcagni, Pier Luigi Mazzeo, Paolo Spagnolo, Cosimo Distante

ASD diagnose and assessment make use of medical protocol validated by the scientific community that is still reluctant to new protocols introducing invasive technologies, as robots or wearable devices, whose influence on the therapy has not been deeply investigated. This work attempts to undertake the difficult challenge of embedding a technological level into the standardized ASD protocol known as Autism Diagnostic Observation Schedule (ADOS-2). An intelligent video system is introduced to compute, in an objective and automatic way, the evaluation scores for some of the tasks involved in the protocol. It make use of a hidden RGB-D device for scene acquisition the data of which feed a cascade of algorithmic steps by which people and objects are detected and temporally tracked and then extracted information is exploited by fitting a spatial and temporal model described by means of an ontology approach. The ontology metadata are finally processed to find a mapping between them and the behavioral tasks described in the protocol.

Paper 208: Visual Target Detection and Tracking in UAV EO/IR Videos by Moving Background Subtraction

Author(s): Francesco Tufano, Cesario Vincenzo Angelino, Luca Cicala

In the last years the CIRA designed many versions of on-board payload management software for Unmanned Aerial Vehicles, to be used in ISTAR (Intelligence, Surveillance, Target Acquisition and Reconnaissance) missions. A typical required function in these software suites is detection and tracking of moving ground vehicles. In this work, we propose a detection and tracking approach to moving objects that is suitable when the background is static in the real world and appears to be affected of global motion in the image plane. Each object is described as a set of SURF points enhanced with a related appearance model. Experiments on real world video sequences confirm the effectiveness of the proposed approach.

Paper 211: A Multiphase Level Set Method on Graphs for Hyperspectral Image Segmentation

Author(s): Kaouther Tabia, Xavier Desquesnes, Yves Lucas, Sylvie Treuillet

In this paper, we propose a new multiphase level set method for hyperspectral image segmentation. Our approach generalize the partial differential equation based front propagation concept initially introduced for gray image segmentation to images with huge wavelength dimension. Experimental results demonstrate the effectiveness of our method.

Paper 213: A Mobile Application for Leaf Detection in Complex Background using Saliency Maps

Author(s): Lorenzo Putzu, Cecilia Di Ruberto, Gianni Fenu

Plants are fundamental for human beings, so it's very important to catalogue and preserve all the plants species. Identifying an unknown plant species is not a simple task. The leaf analysis is one of the approach used for the plant species identification. This task can be completed also automatically by image processing techniques, able to analyse the leaf images and provide a classification based on prior information. Many methods have been proposed in literature in order to complete the whole cataloguing task, providing excellent classification results. Nevertheless, many of the proposed methods work only on images acquired in controlled lighting conditions and with uniform background. In this work we propose a mobile application for leaf analysis for the automatic identification of plant species. The application is mainly devoted to the identification and segmentation steps, resolving the main issues created by uncontrolled lighting conditions with very accurate results.

Paper 218: Content-based mammogram retrieval using mixed kernel PCA and curvelet transform

Author(s): Sami Dhahbi, Walid Barhoumi, Ezzeddine Zagrouba

Content-based image retrieval (CBIR) has recently emerged as a promising method to assist radiologists in diagnosing mammographic mass by displaying pathologically similar cases. In this paper, a CBIR system using curvelet transform and kernel principal component analysis (KPCA) is proposed. Thanks to its improved direction and edge representation abilities, curvelet transform first provides desirable mammographic features. Once the region of interest (ROI) is curvelet transformed, the KPCA is then applied and the first components are used as descriptors. Bearing in mind that neighbor points are the most important but faraway points may contain useful information in mammogram retrieval, we propose a new mixed kernel that overcomes the shortcoming of Gaussian kernels and emphasis neighbor points without neglecting faraway ones. The proposed mixed kernel is a mixture of two gaussian kernels with high and low sigma values. Experiments performed on a large dataset of mammograms showed the superiority of the proposed kernel over single gaussian kernels.

Paper 219: Combination of RGB-D Features for Head and Upper Body Orientation Classification

Author(s): Laurent Fitte-Duval, Alhayat Ali Mekonnen, Frédéric Lerasle

In Human-Robot Interaction (HRI), the intention of a person to interact with another agent (robot or human) can be inferred from his/her head and upper body orientation. Furthermore, additional information on the person's overall intention and motion direction can be determined with the knowledge of both orientations. This work presents an exhaustive evaluation of various combinations of RGB and depth image features with different classifiers. These evaluations intend to highlight the best feature representation for the body part orientation to classify, i.e, the person's head or upper body. Our experiments demonstrate that high classification performances can be achieved by combining only three families of RGB and depth features and using a multiclass SVM classifier.

Paper 221: A parametric algorithm for skyline extraction

Author(s): Mehdi Ayadi, Loreta Suta, Mihaela Scuturici, Serge Miguet, Chokri Ben Amar

This paper is dedicated to the problem of automatic skyline extraction in digital images. The study is motivated by the needs, ex- pressed by urbanists, to describe in terms of geometrical features, the global shape created by man-made buildings in urban areas. Skyline ex- traction has been widely studied for navigation of Unmanned Aerial Ve- hicles (drones) or for geolocalization, both in natural and urban contexts. In most of these studies, the skyline is defined by the limit between sky and ground objects, and can thus be resumed to the sky segmentation problem in images. In our context, we need a more generic definition of skyline, which makes its extraction more complex and even variable. The skyline can be extracted for different depths, depending on the interest of the user (far horizon, intermediate buildings, near constructions, ...), and thus requires a human interaction.

The main steps of our method are as follows: we use a Canny filter to extract edges and allow the user to interact with filter’s parameters. With a high sensitivity, all the edges will be detected, whereas with lower values, only most contrasted con- tours will be kept by the filter. From the obtained edge map, an upper envelope is extracted, which is a disconnected approximation of the sky- line. A graph is then constructed and a shortest path algorithm is used to link discontinuities. Our approach has been tested on several public domain urban and natural databases, and have proven to give better results that previously published method

Paper 224: Quaternion linear color edge-glowing filter using genetic algorithm

Author(s): Shagufta Yasmin, Stephen Sangwine

This paper presents a quaternion linear color edge-glowing filter, based on a zooming technique using a genetic algorithm (GA) and quaternion (hypercomplex) convolution, to create a mask of the proposed filter. The zooming technique helps to produce the glowing color edges in all directions, with only one mask, and the GA helps to find the coefficients of the filter mask. This was a challenge with previous mathematical frameworks. The proposed filter employs linear color vector filtering operations on color images. This converts the areas of smoothly-varying colors into black and generates glowing color edges in regions where color (but not intensity) edges occur in the image. The filter has been tested on different types of color images; the experimental results show that the proposed filter is a great advance towards the development of linear color vector image filtering. The computation time for the GA is about 1 hour, which is very reasonable. The novelty of this filter is that one mask is enough for producing glowing color edges in all directions.

Paper 225: Scalable Vision System for Mouse Homecage Ethology

Author(s): Ghadi Salem, Jonathan Krynitsky, Brett Kirkland, Eugene Lin, Aaron Chan, Simeon Anfinrud, Sarah Anderson, Marcial Garmendia-Cedillos, Rhamy Belayachi, Juan Alonso-Cruz, Joshua Yu, Anthony Iano-Fletcher, George Dold, Tom Talbot, Alexxai Kravitz, James Mitchell, Guanhang Wu, John Dennis, Monson Hayes, Kristin Branson, Thomas Pohida

In recent years, researchers and laboratory support companies have recognized the utility of automated profiling of laboratory mouse activity and behavior in the home-cage. Video-based systems have emerged as a viable solution for non-invasive mouse monitoring. Wider use of vision systems for ethology studies requires the development of scalable hardware seamlessly integrated with vivarium ventilated racks. Compact hardware combined with automated video analysis would greatly impact animal science and animal-based research. Automated vision systems, free of bias and intensive labor, can accurately assess rodent activity (e.g., well-being) and behavior 24-7 during research studies within primary home-cages. Scalable compact hardware designs impose constraints, such as use of fisheye lenses, placing greater burden (e.g., distorted image) on downstream video analysis algorithms. We present novel methods for analysis of video acquired through such specialized hardware. Our algorithms estimate the 3D pose of mouse from monocular images. We present a thorough examination of the algorithm training parameters' influence on system accuracy. Overall, the methods presented offer novel approaches for accurate activity and behavior estimation practical for large-scale use of vision systems in animal facilities.

Paper 226: Spatio-Temporal Features Learning with 3DPyraNet

Author(s): Ihsan Ullah, Alfredo Petrosino

A discriminative approach based on the 3DPyraNet model for spatio-temporal feature learning is proposed. In combination with a linear SVM classier, our model outperform state-of-the-art methods on two datasets (KTH, Weizmann). Whereas, shows comparable result with current best methods on third dataset (YUPENN). The features are compact, achieving 94.08%, 99.13%, and 94.67% accuracy on KTH, Weizmann, and YUPENN, respectively. The proposed model appears more suitable for spatio-temporal feature learning compared to traditional feature learning techniques; also, the number of parameters is far less than other 3DConvNets.

Paper 228: Automatic segmentation of tv news into stories using visual and temporal information

Author(s): Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia

In this paper we propose a new method for automatic storyboard segmentation of TV news using image retrieval techniques and content manipulation. Our framework performs: shot boundary detection, global key-frame representation, image re-ranking based on neighborhood relations and temporal variance of image locations in order to construct a unimodal cluster for anchor person detection and differentiation. Finally, anchor shots are used to form video scenes. The entire technique is unsupervised being able to learn semantic models and extract natural patterns from the current video data. The experimental evaluation performed on a dataset of 50 videos, totalizing more than 30 hours, demonstrates the pertinence of the proposed method, with gains in terms of recall and precision rates with more than 5-7% when compared with state of the art techniques.

Paper 233: Wavelet neural network initialization using LTS for DNA Sequence Classification

Author(s): Abdesselem dakhli, Wajdi bellil, Chokri ben Amar

In this paper, we present a new approach for DNA sequence classification. The proposed approach is based on using the Wavelet Neural Network (WNN) and the k-means algorithm. The satisfying performance of the Wavelet Neural Networks (WNN) depends on an appropriate determination of the WNN structure. Our approach uses the Least Trimmed Square (LTS) and the Gradient Algorithm (GA) to solve the architecture of the WNN. The initialization of the Wavelet Neural Network is solved by using the Least Trimmed Square (LTS) method, which is applied for selecting the wavelet candidates from the Multi Library of the Wavelet Neural Networks (MLWNN) for constructing the WNN. Besides, the Gradient Algorithm (GA) is implemented for training the WNN in our method. The GA is used to solve the structure and learning of the WNN. This algorithm is applied to adjust the parameters of WNN. The performance of the WNN is investigated by detecting the simulating and real signals in white noise. The proposed method has been able to optimize the wavelet neural network and classify the DNA sequences. In this study, the LTS model is compared to the three initialization algorithms: Residual Based Regressor Selection (RBRS), Stepwise Regressor Selection by Orthogonalization (SRSO) and Backward Elimination of Regressors (BER). The LTS algorithm is to find the regressors, which provide the most significant contribution to the approximation of error reduction. The advantage of the LTS algorithm is to select the candidate wavelet from the MLWNN. This wavelet can reduce the approximation error. This initialization problem can be efficiently solved by the LTS method in order to enhance the robustness. Our aim is to construct classifier method that gives highly accurate results. This classifier permits to classify the DNA sequence of organisms. The classification results are compared to other classifiers. The experimental results have shown that the WNN-LTS model outperformed the other classifier in terms of both the running time and clustering. In this paper, our system consists of three phases. The first one, which is called transformation, is composed of three sub steps; binary codification of DNA sequences, Fourier Transform and Power Spectrum Signal Processing. The second section is the approximation; it is empowered by the use of Multi Library Wavelet Neural Networks (MLWNN). Finally, the third section, which is called the classification of the DNA sequences, is realized by applying the k-means algorithm.

Paper 236: Collection of Visual Data in Climbing Experiments for Addressing the Role of Multi-Modal Exploration in Motor Learning Efficiency

Author(s): Adam Schmidt, Dominic Orth, Ludovic Seifert

Understanding how skilled performance in human endeavor is acquired through practice has benefited markedly from technologies that can track movements of the limb, body and eyes with reference to the environment. A significant challenge within this context is to develop time efficient methods for observing multiple levels of motor system activity throughout practice. Whilst, activity can be registered using video based systems, crossing multiple levels of analysis is a substantive problematic within the computer vision and human movement domains. The goal of this work is to develop a registration system to collect movement activity in an environment typical to those that individuals normally seek to participate (sports and physical activities). Detailed are the registration system and procedure to collect data necessary for studying skill acquisition processes during difficult indoor climbing tasks, practiced by skilled climbers. Of particular interest are the problems addressed in trajectory reconstruction when faced with limitations of the registration process and equipment in such unconstrained setups. These include: abrupt movements that violate the common assumption of the smoothness of the camera trajectory; significant motion blur and rolling shutter effects; highly repetitive environment consisting of many similar objects.

Paper 237: Fog Augmentation of Road Images for Performance Analysis of Traffic Sign Detection Algorithms

Author(s): Thomas Wiesemann, Xiaoyi Jiang

This paper studies the influence of fog on traffic sign detection algorithms used in intelligent driver assistance systems. Previous studies are all based on synthetic images. In this work we use instead real-life photos of different road situations for fog augmentation to investigate the performance of five detection methods. To obtain depth information about the scene a depth map is first estimated for every source image of the dataset. Different visibility distances are then simulated with Koschmieder's fog model and the implemented algorithms are applied on the resulting images. Among others, the analysis of the results shows that in foggy situations the performance of a HSI-based algorithm is not always better than that of a RGB-based method.

Paper 238: Statistical Modeling based Adaptive Parameter Setting for Random Walk Segmentation

Author(s): Ang Bian, Jiang Xiaoyi

Segmentation algorithms typically require some parameters and their optimal values are not easy to find. Training methods have been proposed to tune the optimal parameter values. In this work we follow an alternative goal of adaptive parameter setting. Considering the popular random walk segmentation algorithm it is demonstrated that the parameter used for the weighting function has a strong influence on the segmentation quality. We propose a statistical model based approach to automatically setting this parameter, thus adapting the segmentation algorithm to the statistic properties of an image. Experimental results are presented to demonstrate the usefulness of the proposed approach.

Paper 239: On-the-fly Architecture Design and Implementation of a Real-Time Stereovision System

Author(s): Mohamed B. M. Masmoudi, Chadlia Jerad, Rabah Attia

Stereovision is a way to reconstruct 3D information that is inspired from the basic mechanism of human eyes. When dealing with real-time stereo computation, the use of specialized hardware architecture becomes mandatory. Consequently, many works dealt with the implementation of this process using PFGA platforms, each one with a particular emphasis. This paper describes of novel architecture that optimizes the memory size to be used in a pipelined, pixel clock synchronized, stereo vision system. Consequently, this last provides the disparity map in real-time. The resulting work is a tiny architecture capable to process stereo video streams on-the-fly, without external memory storage for stereo pairs. This implementation is fully pipelined and covers the entire stereovision process. In addition, the hardware implementation of Hamming distance as well as the index computation were enhanced. The design is generic as the disparity window, the image size, the matching algorithm can be selected (Census or SAD). The hardware implementation shows better performance over previous studies.

Paper 241: Complex Image Processing Using Correlated Color Information

Author(s): Dan Popescu, Loretta Ichim, Diana Gornea, Florin Stoican

The paper presents a method for patch classification and remote image segmentation based on correlated color information. During the training phase, a supervised learning algorithm is considered. In the testing phase, we used the classifier built a priori to predict which class an input image sample belongs to. The tests showed that the most relevant features are contrast, energy and homogeneity extracted from the cooccurrence matrix between H and S components. Compared to gray level, the chromatic matrices improve the process of texture classification. For experimental results, the images were acquired by the aid of an unmanned aerial vehicle and represent various types of terrain. Two case studies have shown that the proposed method is more effective than considering separate color channels: flooded area and road segmentation. Also it is shown that the new algorithm provides a faster execution time than the similar one proposed.

Paper 242: Using PNU-based Techniques to Detect Alien Frames in Videos

Author(s): Giuseppe Cattaneo, Gianluca Roscigno, Andrea Bruno

In this paper we discuss about video integrity problem and specifically we analyze whether the method proposed by Fridrich et al. can be exploited for forensic purposes. In particular Fridrich et al. proposed a solution to identify the source camera given an input image. The method relies on the Pixel Non-Uniformity (PNU) noise produced by the sensor and existing in any digital image.

We first present a wider scenario related to video integrity. Then we focus on a particular case of video forgery where sequences of frames, recorded by a different camera (in short, alien frames), could be added to the original video.

By means of experimental evaluation in specific real world forensic scenarios we analyzed the accuracy degree that this method can achieve and we evaluated the critical conditions where the results are not enough reliable to be considered in courts.

The results show that the method is robust, and alien frames can be reliably detected provided that the source device (or its faithful fingerprint) is available. Nevertheless the discussed method applies to a rather limited concept of video integrity (alien frames detection) and more extensive solutions, able to cover a wider range of application scenarios, would be required as well.

Acivs 2016 Advanced Concepts for Intelligent Vision Systems

Oct. 24-27, 2016 Patria Palace Hotel, Lecce, Italy