Acivs 2017

Advanced Concepts for Intelligent Vision Systems

Venue

Sept. 18-21, 2017

The Grauwzusters Cloister, Antwerp, Belgium

Acivs logo Venue
LNCS

Acivs 2017 Abstracts

This page is regenerated automatically every 60 minutes.

Regular papers

Paper 107: Detection and Tracking of the Pores of the Lamina Cribrosa in Three Dimensional SD-OCT Data

Author(s): Florence Rossant, Kate Grieve, Stéphanie Zwillinger, Michel Paques

Glaucoma is one of the leading causes of blindness in the world. Although its physiopathology remains unclear, the deformations of the lamina cribrosa (LC), a three-dimensional porous structure through which all the nerve fibers from the retina pass to join the brain, are very likely to play a major role. We present in this article a method for the 3D reconstruction of the pores of the LC, i.e. of the axon pathways, from three dimensional SD-OCT data. This method is based on pore detection in one en-face plane and on pore tracking throughout the volume. To overcome difficulties due to the low signal to noise ratio, we model and integrate a priori knowledge regarding the structures to be segmented in all steps of our algorithm. The quantitative evaluation shows good results on a test set of 14 images, with 76% of the axonal paths truly detected and an RMSE between the automatic and the manual segmentations around 2 pixels.

Paper 108: Towards Condition Analysis for Machine Vision Based Traffic Sign Inventory

Author(s): Petri Hienonen, Lasse Lensu, Markus Melander, Heikki Kälviäinen

Automatic traffic sign inventory and simultaneous condition analysis can be used to improve road maintenance processes, decrease maintenance costs, and produce up-to-date information for future intelligent driving systems. The goal of this research is to combine automatic traffic sign detection and classification with traffic sign inventory and condition analysis. This paper considers the very challenging problem of traffic sign condition analysis which is currently performed manually by experts. The manual evaluation is time-consuming, expensive, and subjective. We propose a machine vision based method to determine the condition category of each detected sign. A new dataset containing close to 400 traffic signs with condition category annotations has been specifically collected for this research since there was no suitable data available. The experimental results indicate that the average performance of the method is close to the human performance.

Paper 109: Relative Camera Pose Estimation Using Convolutional Neural Networks

Author(s): Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu

This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras. The proposed network takes RGB images from both cameras as input and directly produces the relative rotation and translation as output. The system is trained in an end-to-end manner utilising transfer learning from a large scale classification dataset. The introduced approach is compared with widely used local feature based methods (SURF, ORB) and the results indicate a clear improvement over the baseline. In addition, a variant of the proposed architecture containing a spatial pyramid pooling (SPP) layer is evaluated and shown to further improve the performance.

Paper 110: Large-scale Camera Network Topology Estimation by Lighting Variation

Author(s): Michael Zhu, Anthony Dick, Anton van den Hengel

This paper proposes a scalable and robust algorithm to find connections between cameras in a large surveillance network, based solely on lighting variation. We show how to detect regions that are affected by lighting changes within each camera view, with limited data. Then, we establish the light-overlap connections and show that our algorithm can scale to hundreds of camera while maintaining high accuracy. We demonstrate our method on a campus network of 100 real cameras and 500 simulated cameras, and evaluate its accuracy and scalability.

Paper 111: Omnidirectional Localization in vSLAM with Uncertainty Propagation and Bayesian Regression

Author(s): David Valiente, Oscar Reinoso, Arturo Gil, Luis Payá, Mónica Ballesta

This article presents a visual localization technique based solely on the use of omnidirectional images, within the framework of mobile robotics. The proposal makes use of the epipolar constraint, adapted to the omnidirectional reference, in order to deal with matching point detection, which ultimately determines a motion transformation for localizing the robot. The principal contributions lay on the propagation of the current uncertainty to the matching. Besides, a Bayesian regression technique is also implemented, in order te reinforce the robustness. As a result, we provide a reliable adaptive matching, which proves its stability and consistency against non-linear and dynamic effects affecting the image frame, and consequently the final application. In particular, the search for matching points is highly reduced, thus aiding in the search and avoiding false correspondes. The final outcome is reflected by real data experiments, which confirm the benefit of these contributions, and also test the suitability of the localization when it is embedded on a vSLAM application.

Paper 112: Data Augmentation for Plant Classification

Author(s): Pornntiwa Pawara, Emmanuel Okafor, Lambert Schomaker, Marco Wiering

Data augmentation plays a crucial role in increasing the number of training images, which often aids to improve classification performances of deep learning techniques for computer vision problems. In this paper, we employ the deep learning framework and determine the effects of several data-augmentation (DA) techniques for plant classification problems. For this, we use two convolutional neural network (CNN) architectures, AlexNet and GoogleNet trained from scratch or using pre-trained weights. These CNN models are then trained and tested on both original and data-augmented image datasets for three plant classification problems: Folio, AgrilPlant, and the Swedish leaf dataset. We evaluate the utility of six individual DA techniques (rotation, blur, contrast, scaling, illumination, and projective transformation) and several combinations of these techniques, resulting in a total of 12 data-augmentation methods. The results show that the CNN methods with particular data-augmented datasets yield the highest accuracies, which also surpass previous results on the three datasets. Furthermore, the CNN models trained from scratch profit a lot from data augmentation, whereas the fine- tuned CNN models do not really profit from data augmentation. Finally, we observed that data-augmentation using combinations of rotation and different illuminations or different contrasts helped most for getting high performances with the scratch CNN models.

Paper 114: Is a Memoryless Motion Detection Truly Relevant for Background Generation with LaBGen?

Author(s): Benjamin Laugraud, Marc Van Droogenbroeck

The stationary background generation problem consists in generating a unique image representing the stationary background of a given video sequence. The LaBGen background generation method combines a pixel-wise median filter and a patch selection mechanism based on a motion detection performed by a background subtraction algorithm. In our previous works related to LaBGen, we have shown that, surprisingly, the frame difference algorithm provides the most effective motion detection on average. Compared to other background subtraction algorithms, it detects motion between two frames without relying on additional past frames, and is therefore memoryless. In this paper, we experimentally check whether the memoryless property is truly relevant for LaBGen, and whether the effective motion detection provided by the frame difference is not an isolated case. For this purpose, we introduce LaBGen-OF, a variant of LaBGen leverages memoryless dense optical flow algorithms for motion detection. Our experiments show that using a memoryless motion detector is an adequate choice for our background generation framework, and that LaBGen-OF outperforms LaBGen on the SBMnet dataset. We further provide an open-source C++ implementation of both methods at http://www.telecom.ulg.ac.be/labgen.

Paper 119: An Efficient Descriptor based on Radial Line Integration for Fast non Invariant Matching and Registration of Microscopy Images

Author(s): Anders Hast, Gustaf Kylberg, Ida-Maria Sintorn

Descriptors such as SURF and SIFT contain a framework for handling rotation and scale invariance, which generally is not needed when registration and stitching of images in microscopy is the focus. Instead speed and efficiency are more important factors. We propose a descriptor that performs very well for these criteria, which is based on the idea of radial line integration. The result is a descriptor that outperforms both SURF and SIFT when it comes to speed and the number of inliers, even for rather short descriptors.

Paper 122: Deep Learning on Underwater Marine Object Detection: A Survey

Author(s): Md Moniruzzaman, Syed Islam, Mohammed Bennamoun, Paul Lavery

Deep learning, also known as deep machine learning or deep structured learning based techniques, have recently achieved tremendous success in digital image processing for object detection and classification. As a result, they are rapidly gaining popularity and attention from the computer vision research community. There has been a massive increase in the collection of digital imagery for the monitoring of underwater ecosystems, including seagrass meadows. This growth in image data has driven the need for automatic detection and classification using deep neural network based classifiers. This paper systematically describes the use of deep learning for underwater imagery analysis within the recent past. The analysis approaches are categorized according to the object of detection, and the features and deep learning architectures used are highlighted. It is concluded that there is a great scope for automation in the analysis of digital seabed imagery using deep neural networks, especially for the detection and monitoring of seagrass.

Paper 123: A Two-Step Methodology for Human Pose Estimation Increasing the Accuracy and Reducing the Amount of Learning Samples Dramatically

Author(s): Samir Azrour, Sébastien Piérard, Pierre Geurts, Marc Van Droogenbroeck

In this paper, we present a two-step methodology to improve existing human pose estimation methods from a single depth image. Instead of learning the direct mapping from the depth image to the 3D pose, we first estimate the orientation of the standing person seen by the camera and then use this information to dynamically select a pose estimation model suited for this particular orientation. We evaluated our method on a public dataset of realistic depth images with precise ground truth joints location. Our experiments show that our method decreases the error of a state-of-the-art pose estimation method by 30%, or reduces the size of the needed learning set by a factor larger than 10.

Paper 126: Body Related Occupancy Maps for Human Action Recognition

Author(s): Sanne Roegiers, Gianni Allebosch, Peter Veelaert, Wilfried Philips

This paper introduces a novel spatial feature for human action recognition and analysis. The positions and orientations of body joints relative to a reference point are used to build an occupancy map of the 3D space that was occupied during the action execution. The joint data is acquired with the Microsoft Kinect v2 sensor and undergoes a pose invariant normalization process to eliminate body differences between different persons. The body related occupancy map (BROM) and its 2D views are used as feature input for a random forest classifier. The approach is tested on a self-captured database of 23 human actions for game-play. On this database a classification with an F1-score of 0.84 is achieved for the front view of the BROM from the complete skeleton.

Paper 127: InSAR coherence-dependent Fuzzy C-Means flood mapping using Particle Swarm Optimization

Author(s): Chayma Chaabani, Riadh Abdelfattah

Owing to the day and night imaging capability and the weather-independent acquisition of the Synthetic Aperture Radar systems, an environmental monitoring is now de finitely possible. This study introduces a fully automated flood mapping approach using the combination of SAR and Interferometric SAR information. In order to achieve an accurate delineation of the flooding extents, we are proposing an enhancement of the Fuzzy C- Means approach based on Particle Swarm Optimization. Indeed, the FCM membership update of this proposed clustering approach takes the advantages of the InSAR coherence spatial context information and the global optimization model of the PSO algorithm. The clustering results are presented using Envisat SAR data that were acquired before and after the flooding event of the Tunisian Mellegue river. To evaluate the separation and homogeneity performances of the proposed clustering approach, we are analyzing three fuzzy internal validity measures that involve the membership and dataset values information.

Paper 128: Anomaly Detection in Crowded Scenarios using Local and Global Gaussian Mixture Models

Author(s): Adrián Tomé, Luis Salgado

This paper presents an objective comparison between two approaches for anomaly detection in surveillance scenarios. Gaussian mixture models (GMM) are used in both cases: globally, with a unique model that covers the whole scene; and locally, with one model per spatial location. The two approaches follow a bottom-up" approach that avoids any object tracking and motion features extracted with a robust optical

ow method. Furthermore, we evaluate the contribution of each feature through a statistical tool called Correlation Feature Selection in order to assure the best performance. Evaluation is done in UCSD dataset, con- cluding that the global model o ers better results, outperforming similar anomaly detection approaches.

Paper 130: Adding GLCM Texture Analysis to a Combined Watershed Transform and Graph Cut Model for Image Segmentation

Author(s): Kauê Duarte, Marco Carvalho, Paulo Martins

Texture analysis is an important step in pattern recognition, image processing and computer vision systems. This work proposes an unsupervised approach to segment digital images combining the Watershed Transform and Normalized Cut in graphs (NCut) using texture information obtained from the Gray-Level Co- occurrence Matrix (GLCM). We corroborate the enhancement of image segmentation by means of the addition of texture analysis through several experiments carried out using the BSDS500 Berkeley dataset. For example, an improvement of 7% and 12% was found in relation to the Combined Watershed+NCut and Quadtree techniques,respectively. The overall performance of the proposed approach was indicated by the F-Measure through comparisons against other important segmentation methods.

Paper 131: Facial Expression Recognition using Local Region Specific Dense Optical Flow and LBP Features

Author(s): Deepak Ghimire, Sang Park, Mi Kim

Recognition of facial expression has many applications including human-computer interaction, human emotion analysis, personality development, cognitive science, health-care, virtual reality, image retrieval, etc. In this paper we propose a new method for recognition of facial expression using local region specific mean optical flow and local binary pattern feature descriptor with support vector machine classification. In general, facial expression recognition techniques divide the face into regular grid (holistic representation) and the facial features are extracted. However, in this paper we divide the face into domain specific local regions. At first a robust optical flow is utilized to get mean optical flow in different directions for each local region which considers both local statistic motion information and its spatial location. The features are used only from the key frames; which are detected based on maximal mean optical flow magnitude within a sequence w.r.to neutral frame. Now, the region specific local binary pattern is extracted from key frame and concatenated with mean optical flow features. The performance of the proposed facial expression recognition system has been validated on CK+ facial expression dataset.

Paper 134: 3D Shape from SEM Image Using Improved Fast Marching Method

Author(s): Lei Huang, Yuji Iwahori, Aili Wang, Bhuyan Manas

This paper proposes an improved fast marching method to recover 3D shape from a Scanning Electron Microscope (SEM) image as a Shape from Shading approach. First, the method uses the second-order finite difference and the information of diagonal grid points to obtain highly accurate solution. Then the method speeds up with increasing the number of the neighboring points, and changes the update mode to avoid sorting processing. Finally, the results were compared between proposed method and previous method via simulation and real SEM image. Experimental results show the proposed method gives the better and faster 3D shape.

Paper 139: A Robust Video Watermarking for Real-time Application

Author(s): Ines Bayoudh, Saoussen Ben Jabra, Ezzeddine Zagrouba

Video watermarking has been proposed to provide content protection, authentication, and indexing. The primordial compromise of watermarking was between robustness, invisibility and capacity. Unless, digital progression imposes new criteria as the processing time which became an important criterion for assessing and selecting the optimal technique. To satisfy these requirements a new approach based on Krawtchouk moments and DCT (Discrete Cosine Transform) is proposed. In fact, a defined part is firstly selected from the original frame in order to guarantee a real time treatment even with high definition video (HD). Secondly, Krawtchouk moment's matrix is extracted from the luminance component and then transformed using the 2-D DCT and a set of coefficients is selected to be marked. Finally, the marked frames are generated based on marked moments. Experimental results show that the proposed technique is robust against several attacks besides providing a high level of invisibility and a real time processing.

Paper 142: Fully Automated Facial Expression Recognition Using 3D Morphable Model and Mesh-Local Binary Pattern

Author(s): Hela Bejaoui, Haythem Ghazouani, Walid Barhoumi

With recent advances in artificial intelligence and pattern recognition, automatic facial expression recognition draws a great deal of interest. In this area, most of works involved 2D imagery. However, they present some challenges related to pose, illumination variation and self-occlusion. To deal with these problems, we propose to reconstruct the face in 3D space, from only one 2D image, using the 3D Morphable Model (3DMM). Thus, thanks to its robustness against pose and illumination variations, 3DMM offers high-resolution model and fast fitting functionality. Then, given the reconstructed 3D face, we extract a set of features, which are effective to describe shape changes and expression-related facial appearance, using Mesh-Local Binary Pattern (mesh-LBP). Obtained results proved the effectiveness of combining 3DMM and mesh-LBP for automatic facial expression recognition from 2D single image. In fact, to evaluate the proposed method against state-of-the-art methods, a comparative study shows that the method outperforms existing ones.

Paper 144: Optimal Tiling Strategy For Memory Bandwidth Reduction For Cnns

Author(s): Leonardo Cecconi, Sander Smets, Luca Benini, Marian Verhelst

Convolutional Neural Networks (CNNs), are nowadays present in many different embedded solutions. One of the biggest problems related to their execution is the memory bottleneck. In this work we propose an optimal double buffering tiling strategy, to reduce the memory bandwidth in the execution of deep CNN architecture, testing our model on one of the two cores of a Zynq(R)-7020 embedded platform. An optimal tiling strategy is found for each layer of the network, optimizing for lowest external memory to/from On-Chip memory bandwidth. Performance test results show an improvement in the total execution time of 50% (cache disabled / 34% cache enabled), compared to a non double buffered implementation. Moreover, a 5x lower external memory to/from On-Chip memory double buffering memory bandwidth is achieved, with respect to naive tiling settings. Furthermore it is shown that tiling settings for highest OCM usage do not generally lead to the lowest bandwidth scenario.

Paper 146: Visual Localization Based on Place Recognition Using Multi-feature Combination (D-λLBP++HOG)

Author(s): Yongliang Qiao, Cindy Cappelle, Tao Yang, Yassine Ruichek

This paper presents a visual localization method based on multi-feature fusion and disparity information using stereo images. We integrate disparity information into complete local binary features (λLBP) to obtain a robust global image description (D-λLBP). In order to represent the scene in depth, multi-feature fusion of D-λLBP and HOG features, provides valuable information and permits to decrease the effect of some typical problems in place recognition such as perceptual aliasing. It improves visual recognition performance by taking the advantage of depth, texture and shape information. In addition, for real-time visual localization, local sensitive hashing method (LSH) is used to compress the high dimensional multi- feature into binary vectors. It can thus speed up the process of image matching. To show its effectiveness, the proposed method is tested and evaluated using real datasets acquired in outdoor environments. Given the obtained results, our approach allows more effective visual localization compared with the state-of-the-art method FAB-MAP.

Paper 148: Multi-object Tracking Using Compressive Sensing Features in Markov Decision Process

Author(s): Tao Yang, Cindy Cappelle, Yassine Ruichek, Mohammed El Bagdouri

In this paper, we propose an approach which uses compressive sensing features to improve Markov Decision Process (MDP) tracking framework. First, we design a single object tracker which integrates compressive tracking into Tracking-Learning-Detection (TLD) framework to complement each other. Then we apply this tracker into the MDP tracking framework to improve the multi- object tracking performance. A discriminative model is built for each object and updated online. With the built discriminative model, the features used for data association are also enhanced. In order to validate our method, we first test the designed single object tracker with a common dataset. Then we use the validation set from the multiple object tracking (MOT) training dataset to analyze each part of our method. Finally, we test our approach in the MOT benchmark. The results show our approach improves the original method and performs superiorly against several state-of-the-art online multi-object trackers.

Paper 149: Statistical Radial Binary Patterns (SRBP) for Bark Texture Identification

Author(s): Safia Boudra, Itheri Yahiaoui, Ali Behloul

This paper presents a plant identification method based on the texture characterization of bark images. We propose a novel statistical radial binary pattern (SRBP) descriptor to encode the between-scale texture information within large neighbourhood areas using the statistical description of the grey scale intensity distribution. The proposed descriptor can efficiently encode the macro local structure. In addition, the proposed SRBP is computationally simple, rotation invariant and low-dimensional descriptor. We conduct comprehensive experiments on three different bark datasets to assess the performances of our approach. The experimental results show that our method achieves high identification rates outperforming different multi-scale LBP variants.

Paper 152: AMD Classification in Choroidal OCT using Hierarchical Texton Mining

Author(s): Dafydd Ravenscroft, Jingjing Deng, Xianghua Xie, Louise Terry, Tom Margrain, North North, Ashley Wood

In this paper, we propose a multi-step textural feature extraction and classification method, which utilizes the feature learning ability of Convolutional Neural Networks (CNN) to extract a set of low level primitive filter kernels, extracts spatial information using clustering and Local Binary Patterns (LBP) and then generalizes the discriminative power by forming a histogram based descriptor. It integrates the concept of hierarchical texton mining and data driven kernel learning into a uniform framework. The proposed method is applied to a practical medical diagnosis problem of classifying different stages of Age-Related Macular Degeneration (AMD) using a dataset comprising long-wavelength Optical Coherence Tomography (OCT) images of the choroid. The results demonstrate the feasibility of our method for classifying different AMD stages using the textural information of the choroidal region.

Paper 153: A Domain Independent Approach to Video Summarization

Author(s): Amanda Dash, Alexandra Branzan Albu

With the increase in media streaming content and consumer-level video creation, there is a high demand for automatic video summarization systems. This paper proposes a bottom-up approach for the automatic generation of dynamic video summaries. Our approach integrates motion and saliency analysis with temporal slicing to extract features from the video, and to further find candidate shots.

A shot similarity measure is proposed for constructing the dynamic summaries for candidate shots. From a practical perspective, our main contribution is the design of a video summarization system that is independent on the video domain. We show that the system performs equally well for domains at the extreme opposites of the domain spectrum, namely professionally edited videos and egocentric videos, without any prior information on the video contents.

Paper 154: Real Time Continuous Tracking of Dynamic Hand Gestures on a Mobile GPU

Author(s): Robert Prior, David Capson, Alexandra Branzan Albu

Hand gesture recognition is an expansive and evolving field. Previous work addresses methods for tracking hand gestures with specialty gaming/desktop environments in real time. The method proposed here focuses on enhancing performance for mobile GPU platforms with restricted resources by limiting memory use/transfers and by reducing the need for code branches. An encoding scheme has been designed to allow contour processing typically used for finding fingertips to occur efficiently on a GPU for non-touch, remote manipulation of on-screen images. Results show high resolution video frames can be processed in real time on a modern mobile consumer device, allowing for fine grained hand movements to be detected and tracked.

Paper 157: Analysis of Skeletal Shape Trajectories for Person Re-identification

Author(s): Amani Elaoud, Walid Barhoumi, Hassen Drira, Ezzeddine Zagrouba

In this paper, we are interested in people re-identification using skeleton information provided by a consumer RGB-D sensor. We perform the modelling and the analysis of human motion by focusing on 3D human joints given by skeletons. In fact, the motion dynamic is modeled by projecting skeleton information on Grassmann manifold. Moreover, in order to define the identity of a test trajectory, we compare it against a labeled trajectory database while using an unsupervised similarity assessment procedure. Indeed, the main contribution of this work resides in the introduced distance that combines temporal information as well as global and local geometrical ones. Realized experiments on standard datasets prove that the proposed method performs accurately even though it does not assume any prior knowledge.

Paper 159: Homography-Based Navigation System for Unmanned Aerial Vehicles

Author(s): Abdulla Al-Kaff, Arturo Escalera, José María Armingol Moreno

The advances in microelectronics foster the Unmanned Aerial Vehicles (UAVs) to be used in many civil and academic applications that require higher levels of autonomy. Therefore, the navigation systems are considered one of the main subjects to study. This paper deals with the problem of estimating the pose of the UAV in the 3D world. In which, a vision-based navigation system using onboard monocular downward looking camera is proposed. The proposed system is based on a SIFT detector and FREAK descriptor which can keeps the performance of the feature matching and decrease the computational time. The system has been evaluated with real flight tests and the obtained results have been compared with the results from the DGPS.

Paper 160: Vision Based LIDAR Segmentation for Scan Matching and Camera Fusion

Author(s): Gabriel Burtin, Patrick Bonnin, Florent Malartre

A vision based algorithm brings fast segmentation process to a 2D lidar point cloud. The extracted features allow us to set up a segment based scan matcher. This matching is one of the steps for the localization. Features also give semantic information about the environ- ment. The detection of a corner or a door indicates a potential encounter with human beings. Aware of this ”danger” area, the robot will be able to adapt its speed and define areas of focus to the vision algorithms. Indeed, vision is known for its heavy computation load. The lidar gives a focus area in the image and will reduce the number of pixels to be analysed.

Paper 162: Background Subtraction with Multispectral Images using Codebook Algorithm

Author(s): Rongrong Liu, Yassine Ruichek, Mohammed El Bagdouri

Detecting moving objects from a video sequence based on conventional images have reached a significant level of maturity with some practical success. However, their performance may degrade under illumination changes. To solve this, the advantages of using multispectral images for background subtraction is investigated and tested over several multispectral videos using codebook algorithm in this paper. Experimental results show that multispectral images represent a viable better alternative to conventional images in the search of robust detection and motion analysis of moving objects.

Paper 164: Full Screen Touch Detection for the Virtual Touch Screen

Author(s): Katsuto Nakajima, Takafumi Igarashi

The authors have proposed a virtual touch screen with a projector and a camera. This provides touch buttons besides the main contents both on the projection surface. The camera monitors user's touch on the buttons. In order to increase a degree of operability equal to touch panels, we present a method that enables touches to be detected at any position within the projected main content, using a depth camera to extract the user's hand and finger about to touch and an RGB camera to detect the user's finger touch on the screen. We also show evaluation results on the touching accuracy and processing speed.

Paper 165: Image Ridge Denoising using No-Reference Metric

Author(s): Nikolay Mamaev, Dmitry Yurin, Andrey Krylov

Image denoising methods depend on inner parameters that control filter strength, so the problem of the filter parameters choice arises. Parameter optimization can be done in the ridge areas, when we can analyze their appearance on the difference between original noisy and filtered image (so- called method noise image). If this difference is irregular, then the filtering strength can be increased. If regular components appear on method noise, then the filtering strength is too large. We use mutual information closely connected with conditional entropy for the analysis and consider images corrupted with Gaussian-like noise with small correlation radius. Ridge detection approach based on Hessian matrix eigenvalues analysis is used for estimation of sizes and directions of image characteristic details. Retinal images containing many ridges of different scales and directions from DRIVE and general images from TID2008 databases with added controlled Gaussian noise were used for testing with NLM and LJNLM-LR methods.

Paper 168: Shape Acquisition System Using an Handheld Line Laser Pointer without Markers

Author(s): Jae Hean Kim, Hyun Kang, Jin Sung Choi, Chang Joon Park

We describe a 3D shape acquisition system composed of a camera and a line laser pointer. A low-cost laser 3D scanner is proposed having a simple tool to solve the positioning problem for the triangulation based 3D reconstruction. In contrast to the previous works that require a special platform or background geometry, the proposed system has advantages of portability and easy maintenance. Moreover, since auto-calibration method is presented, the proposed system is more convenient than the previous works and has advantage in maintaining accuracy.

Paper 170: Proposal of a Segmentation Method adapted to the Infrared Sensor

Author(s): Felix Polla, Kamal Boudjelaba, Bruno Emile, Helene Laurent

In this paper, we show how to use spatial information from infrared sensor images to obtain better segmentation results and how to use temporal information to improve object detection. The proposed method exploits the segmentation results of existing methods to improve the detection of the shape and contour of the moving object in the image. In a comparative study, we use five segmentations from the art's state to which we add our proposal to obtain good performances. For the very specific images obtained at the output of the infrared sensor, it appears that the use of the thresholding segmentation algorithm combined with our approach allows to better characterize a pixel as a background one or not. The ultimate objective of this work is the exploitation of these results in order to classify the activity of persons present in the scene.

Paper 171: Robust Tracking in Weakly Dynamic Scenes

Author(s): Trevor Gee, Rui Gong, Patrice Delmas, Georgy Gimel'farb

Estimating the inter-frame motion of a free-moving camera is important for the reconstruction of large 3-D scene from one or more sequences of frames. This work focuses on scenes with a mixture of dy- namic and static elements and proposes an approach to improve tracking in existing 3-D reconstruction algorithms, as well as provide a basis for new types of 3-D reconstructions that are able to construct scenes of moving objects. The main strategy adopted in this work is to group fea- ture points within xed block-size within the image then to prune groups whose motion deviates from the dominant motions established through majority voting. Our experiments show that the proposed approach per- forms well in several outdoor dynamic scenes, signi cantly outperforming typical feature-based and direct pose estimation techniques in footage with moving elements.

Paper 175: Prostate size Inference from Abdominal Ultrasound Images with Patch Based Prior Information

Author(s): Nur Albayrak, Emrah Yildirim, Yusuf Akgul

Prostate size inference from abdominal ultrasound images is crucial for many medical applications but it remains a challenging task due to very weak prostate borders and high image noise. This paper presents a novel method that enforces image patch prior information on multi-task deep learning followed by a global prostate shape estimation. The patch prior information is learned by multi-task Deep Convolutional Neural Networks (DCNNs) trained on multi-scale image patches to capture both local and global image information. We produce tens of thousands of image patches for the DCNN training that needs a large amount of training data which usually is not available for medical images. The three learned tasks for the DCNN are the distance between the patch center and the nearest contour point, the angle of the line segment between the patch center and the prostate center, and the contour curvature value for the patch center. During the prostate shape inference time, the labels returned from the multi-task DCNN are used in a global shape fitting process to obtain the final prostate contours which are then used for size inference. We performed experiments on transverse abdominal ultrasound images which are very challenging for automatic analysis.

Paper 179: Object Tracking using Deep Convolutional Neural Networks and Visual Appearance Models

Author(s): Bogdan Mocanu, Ruxandra Tapu, Titus Zaharia

In this paper we introduce a novel single object tracking method that extends the traditional GOTURN algorithm with a visual attention model. The proposed approach returns accurate object tracks and is able to handle sudden camera and background movement, long-term occlusions and multiple moving objects that can evolve simultaneously in a same neighborhood. The process of occlusion identification is performed using image quad-tree decomposition and patch matching, based on a convolution neural network trained offline. The object appearance model is adaptively modified in time based on both visual similarity constraints and trajectory verification tests. The experimental evaluation performed on the VOT 2016 dataset demonstrates the efficiency of our method that returns high accuracy scores regardless of the scene dynamics or object shape.

Paper 180: An Enhanced Multi-Label Random Walk for Biomedical Image Segmentation using Statistical Seed Generation

Author(s): Ang Bian, Aaron Scherzinger, Xiaoyi Jiang

Image segmentation is one of the fundamental problems in biomedical applications and is often mandatory for quantitative analysis in life sciences. In recent years, the amount of biomedical image data has significantly increased, rendering manual segmentation approaches impractical for large-scale studies. In many cases, the use of semi-automated techniques is convenient, as those approaches allow to incorporate domain knowledge of experts into the segmentation process. The random walker framework is among the most popular semi-automated segmentation algorithms, as it can easily be applied to multi-label situations. However, this method usually requires manual input on each individual image and, even worse, for each disconnected object. This is problematic for segmenting multiple unconnected objects like individual cells, or very fine anatomical structures. Here, we propose a seed generation scheme as an extension to the random walker framework. Our method needs only few manual labels to generate a sufficient number of seeds for reliably segmenting multiple objects of interest, or even a series of images or videos from an experiment. We show that our method is robust against parameter settings and evaluate the performance on both synthetic as well as real-world biomedical image data.

Paper 182: Face Detection in Thermal Infrared Images: A Comparison of Algorithm- and Machine-Learning-Based Approaches

Author(s): Marcin Kopaczka, Jan Nestler, Dorit Merhof

In recent years, thermal infrared imaging has gained an increasing attention in person monitoring tasks due to its numerous advantages such as illumination invariance and its ability to monitor vital parameters directly. Many of these applications require facial region monitoring. In this context, several methods for face detection in thermal infrared images have been developed. Nearly all of the approaches introduced in this context make use of specific properties of facial images in the thermal infrared domain, such as local temperature maxima in the eye area or the fact that human bodies usually have a higher temperature radiation than the backgrounds used. On the other side, a number of well-performing methods for face detection in the visual spectrum has been introduced in recent years. These approaches use state-of-the-art algorithms from machine learning and feature extraction to detect faces in photographs and videos. So far, only one of these algorithms has been successfully applied to thermal infrared images. In our work, we therefore analyze how a larger number of these these algorithms can be adapted to thermal infrared images and show that a wide number of recently introduced algorithms for face detection in the visual spectrum can be trained to work in the thermal spectrum when an appropriate training database is available. Our evaluation shows that these machine-learning based approaches outperform thermal-specific solutions in terms of detection accuracy and false positive rate. In conclusion, we can show that well-performing methods introduced for face detection in the visual spectrum can also be used for face detection in thermal infrared images, making dedicated thermal-specific solutions unnecessary.

Paper 183: Bimodal Person Re-identification in Multi-camera System

Author(s): Hazar Mliki, Mariem Naffeti, Emna Fendri

This paper introduces a new method to enhance person re-identification by combining person appearance and face modalities in a multi-camera system. The use of face modality requires a preprocessing step of face pose estimation. Therefore, we proposed a new method for face pose estimation in low- resolution context. As for the extraction of person appearance signature, it was performed on discriminant stripes selected automatically. We evaluated the proposed pose estimation method as well as the process of re-identification based on appearance and face modalities on the challenging VIPeR database. The experimental results show that the combination of person appearance and face modalities leads to promising results.

Paper 188: Texturizing and Refinement of 3D City Models with Mobile Devices

Author(s): Ralf Gutbell, Hannes Kuehnel, Arjan Kuijper

Building recognition from images and video streams of mobile devices to texturize and refine an existing 3D city model is an open challenge, since such models most often do not completely represent the actual buildings. We present ways to extract buildings from images enabling improvement of the existing model. The approach is based on edge detection on images to detect walls, pure use of sensor data by creating an overlay to the video stream with the 3D model renderer from current position by a server, and the use of structure from motion algorithms to create point clouds and recognize a building via the support of the device's sensors. We show that we are thus able to texturize and refine an existing 3D city model.

Paper 193: A CSF-based Preprocessing Method for Image Deblurring

Author(s): Maria Carmela Basile, Vittoria Bruni, Domenico Vitulano

This paper aims at increasing the visual quality of a blurred image according to the contrast sensitivity of a human observer. The main idea is to enhance those image details which can be perceived by a human observer without introducing annoying visible artifacts. To this aim, an adaptive wavelet decomposition is applied to the original blurry image. This decomposition splits the frequency axis into subbands whose central frequency and amplitude width are built according to the contrast sensitivity. The details coefficients of that decomposition are then properly modified according to the just noticeable contrast at each frequency band. Preliminary experimental results show that the proposed method increases the visual quality of the blurred image without introducing visible artifacts. In addition, the contrast sensitivity-based image is a good and recommended initial guess for iterative deblurring methods since it allows them to significantly reduce ringing artifacts and halo effects in the final image.

Paper 195: Multi-view Pose Estimation with Flexible Mixtures-of-Parts

Author(s): Emre Dogan, Gonen Eren, Christian Wolf, Eric Lombardi, Atilla Baskurt

We propose a new method for human pose estimation which leverages information from multiple views to impose a strong prior on articulated pose. The novelty of the method concerns the types of coherence modeled. Consistency is maximized over the different views through different terms modeling classical geometric information (coherence of the resulting poses) as well as appearance information which is modeled as latent variables in the global energy function. Experiments on the HumanEva dataset show that the proposed method significantly decreases the estimation error compared to single-view results and attains a 3D PCP score of 86%.

Paper 201: Images Annotation Extension Based on User Feedback

Author(s): Abdessalem Bouzaieni, Salvatore Tabbone

In this paper, we propose a probabilistic graphical model for images annotation extension. The aim is to extend the annotations of a small subset of images to a whole dataset. Since the performance of our system depends on the quality of the learning, we integrate the user to improve the annotation quality and minimize the laborious manual annotation effort at three levels. The first level is related to the learning set. We introduce an iterative loop where annotations are automatically extended and some corrected manually by the user. In the second level, after the annotation extension and during a retrieval step a user can correct or add labels to some images. These images with their new labels are introduced progressively to the system and used to relearn incrementally the model. In the third level, we propose an active learning of our model to select the most informative data to improve the quality of learning and reduce manual effort.

Paper 203: Fast Ground Detection for Range Cameras on Road Surfaces using a Three-Step Segmentation

Author(s): Izaak Van Crombrugge, Ibrahim Ben Azza, Rudi Penne, Gregory Van Barel, Steve Vanlanduit

We present in this paper a fast and simple free floor detection method. Compared to existing methods the proposed method can handle non-planar camera motion by means of a Three-Step procedure. A fast initial segmentation is followed by an intermediate floor plane estimation to adapt to the camera motion. Then, a final segmentation is done using the estimated plane. This allows for correct segmentation even when the camera moves up and down, tilts or rolls. Outdoor measurements of a road surface were performed with a Time-of-Flight camera mounted in front of a car. The measurements contain three types of road surface: concrete, stone and asphalt. The proposed segmentation takes less than 1.25ms per frame for range images with a resolution of 176 by 132, making it fit for real-time applications. The resulting accuracy is higher than the state of the art.

Paper 206: JND-guided Perceptual Pre-filtering for HEVC Compression of UHDTV Video Contents

Author(s): Eloïse Vidal, François-Xavier Coudoux, Patrick Corlay, Christine Guillemot

Recently, two new perceptual filters have been proposed as pre-processing techniques to reduce the bitrate of compressed video con- tents at constant visual quality. The proposed perceptual filters rely on two novel adaptive filters (called BilAWA and TBil) which combine the good properties of the bilateral and Adaptive Weighting Average (AWA) filters. Moreover, these adaptive fi lters are guided by a just-noticeable distortion (JND) model to adaptively control the strength of the fi ltering process, taking into account the properties of the human visual system. In this paper, we study the performances of these two pre-fi ltering al- gorithms in the context of Ultra High-Defi nition (UHD) video contents compressed by means of the state-of-the-art HEVC coding standard. Extensive psychovisual evaluation tests conducted on several UHD-TV sequences are presented in detail. Results show that applying the pro- posed pre-fi lters prior to HEVC encoding of UHD video contents lead to bitrate savings up to 23% for the same perceived visual quality.

Paper 209: Visual versus Textual Embedding for Video Retrieval

Author(s): Danny Francis, Paul Pidou, Bernard Merialdo, Benoît Huet

This paper compares several approaches of natural language access to video databases. We present two main strategies. The first one is visual, and consists in comparing keyframes with images retrieved from Google Images. The second one is textual and consists in generating a text-based description of the keyframes, and comparing these descriptions with the query. We study the effect of several parameters and find out that substantial improvement is possible by choosing the right strategy for a given topic. Finally we investigate a method for choosing the right approach for a given topic.

Paper 210: Learning Siamese Features for Finger Spelling Recognition

Author(s): Bogdan Kwolek, Shinji Sako

This paper is devoted to finger spelling recognition on the basis of images acquired by a single color camera. The recognition is realized on the basis of learned low-dimensional embeddings. The embeddings are calculated both by single as well as multiple siamese-based convolutional neural networks. We train classifiers operating on such features as well as convolutional neural networks operating on raw images. The evaluations are performed on freely available dataset with finger spellings of Japanese Sign Language. The best results are achieved by a classifier trained on concatenated features of multiple siamese networks.

Paper 212: Extracting Relevant Features from Videos for a Robust Smoke Detection

Author(s): Olfa Besbes, Amel Benazza-Benyahia

In this paper, we propose a novel smoke detector based on relevant spatio- temporal features that depict the smoke's dynamic appearance. Since smoke is a dynamic texture that can also be partially transparent, its detection involves two steps. First, moving pixels are detected using an adaptive background subtraction technique. Then, spatio-temporal features, measuring color and texture changes due to smoke in the underlying scene, are exploited to robustly recognize smoke regions. The novelty consists in addressing this two-class classification task by an entropy-based combination of two complementary classifiers using appropriate color and texture features. Furthermore, a sample-based background modeling with a bag-of-visual words representation makes the smoke detection not only discriminant but also robust against outdoor conditions. Experimental results indicate that our method exhibits a good robustness under challenging conditions.

Paper 214: Cell-based Approach for 3D Reconstruction from Incomplete Silhouettes

Author(s): Maarten Slembrouck, Peter Veelaert, Dimitri Van Cauwelaert, David Van Hamme, Wilfried Philips

Shape-from-silhouettes is a widely adopted approach to compute accurate 3D reconstructions of people or objects in a multi-camera environment. However, such algorithms are traditionally very sensitive to errors in the silhouettes due to imperfect foreground-background estimation or occluding objects appearing in front of the object of interest. We propose a novel algorithm that is able to still provide high quality reconstruction from incomplete silhouettes. At the core of the method is the partitioning of reconstruction space in cells, i.e. regions with uniform camera and silhouette coverage properties. A set of rules is proposed to iteratively add cells to the reconstruction based on their potential to explain discrepancies between silhouettes in different cameras. Experimental analysis shows significantly improved F1-scores over standard leave-M-out reconstruction techniques.

Paper 215: 3D Visualization of Radioactive Sources by Combining Depth Maps Generated from a Stereo Radiation Detection Device

Author(s): Pathum Rathnayaka, Seung-Hae Baek, Soon-Yong Park

This paper proposes a simple, yet efficient, idea to visualize the shape and complete spatial distribution of radioactive sources. We used our previously studied gamma radiation detection device to capture stereo images of radioactive sources (using gamma camera) and 2D vision scanning environment (using vision camera). To generate a complete 3D model of radioactive sources and to see their distribution in the scanning area, we captured stereo images at different locations by freely moving the detector. We used the well-known Semi-global Block Matching algorithm to create both radiation and vision disparity images at each location and generated corresponding 3D reconstruction results by calculating depth values. Color ICP-based registration is performed on each individual reconstruction result to generate integrated 3D models for spatial area and radioactive sources, separately. These two integrated models are merged with each other to visualize the complete shape of radioactive sources and its distribution in the scanning environment. A graphical example is presented to show that the proposed method is capable of generating neat 3D models in real-time.

Paper 216: Human Face Detection Improvement using Incremental Learning Based on Low Variance Directions

Author(s): Takoua Kefi, Riadh Ksantini, Mohammed Kaaniche, Adel Bouhoula

Systems that rely on Face Detection have gained great importance ever, since large scale databases of thousands of face images are collected from several sources. Thus, the use of an outperforming face detector becomes a challenging problem. Different classification models have been studied and applied for face detection. However, such models involve large scale datasets, which requires huge memory and enormous amount of training time. Therefore, in this paper, we investigate the potency of incrementally projecting data in low variance directions. In fact, in one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. On the other hand, incremental learning is known to be powerful, when dealing with dynamic data. We performed extensive tests on human faces, and comparative experiments have been carried out to show the effectiveness and superiority of our proposed method over other face detection methods.

Paper 219: Counting Large Flocks of Birds using Videos Acquired with Hand-held Devices

Author(s): Amanda Dash, Alexandra Branzan Albu

Due to the rapidly increasing quality of cameras and processing power in smartphones, citizen scientists can play a more significant role in environmental monitoring and ecological observations. Determining the size of large bird flocks, like those observed during migration seasons, is important for monitoring the abundance of bird populations as wildlife habitats continue to shrink. This paper describes a pilot study aimed at automatically counting birds in large moving flocks, filmed using hand-held devices. Our proposed approach integrates motion analysis and segmentation methods to cluster and count birds from video data. Our main contribution is the design of a bird counting algorithm that requires no human input, and functions well for videos acquired in non-ideal conditions. Experimental evaluation is performed using ground truth of manual annotations and bird counts, and shows promising results.

Paper 224: Leaves Segmentation in 3D Point Cloud

Author(s): William Gelard, Ariane Herbulot, Michel Devy, Philippe Debaeke, Ryan McCormick, Sandra Truong, John Mullet

This article presents a 3D plant segmentation method with an empathy especially put on the leaves segmentation. This method is part of a 3D plant phenotyping project with a main objective that deals with the evolution of the leaf area over time. First, a 3D point cloud of a plant is obtained with Structure from Motion technique and then, the main parts of a plan, here: the stem and the leaves, are segmented in the 3D point cloud. As the main objective is to compute the leaf area over time, the empathy was put on the segmentation and the labelling of the leaves. This article presents an original approach which starts by finding the stem in a 3D point cloud and then the leaves. Moreover, this method relies on the model of a plant as well as the agronomic rules to affect a unique label that do not change over time. This method is evaluated through two plants, sunflower and sorghum

Paper 225: Handling Noisy Labels in Gaze-Based CBIR System

Author(s): Stéphanie Lopez, Arnaud Revel, Diane Lingrand, Frédéric Precioso

Handling noisy labels in classification is a core topic given the number of images available online with unprecise labels or even inaccurate ones. In our context, the label uncertainty is obtained by a fully gaze-based labelling process, called GBIE. We apply a noisy-label tolerant algorithm, P-SVM, which combines classification and regression processes. We have determined, among different strategies, a criterion of reliability to discriminate the most reliable labels involved in the classification from the most uncertain ones involved in the regression. The classification accuracy of the P-SVM is evaluated in different learning contexts, and can even compete in some cases with the baseline, i.e. a standard classification SVM trained with the true-class labels.

Paper 226: Multi-Camera Finger Tracking and 3D Trajectory Reconstruction for HCI Studies

Author(s): Vadim Lyubanenko, Toni Kuronen, Tuomas Eerola, Lasse Lensu, Heikki Kälviäinen, Jukka Häkkinen

Three-dimensional human-computer interaction has the potential to form the next generation of user interfaces and to replace the current 2D touch displays. To study and to develop such user interfaces, it is essential to be able to measure how a human behaves while interacting with them. In practice, this can be achieved by accurately measuring hand movements in 3D by using a camera-based system and computer vision. In this work, a framework for multi-camera finger movement measurements in 3D is proposed. This includes comprehensive evaluation of state-of-the-art object trackers to select the most appropriate one to track fast gestures such as pointing actions. Moreover, the needed trajectory post-processing and 3D trajectory reconstruction methods are proposed. The developed framework was successfully evaluated in the application where 3D touch screen usability is studied with 3D stimuli. The most sustainable performance was achieved by the Structuralist Cognitive model for visual Tracking tracker complemented with the LOESS smoothing.

Paper 227: Sensing Forest for Pattern Recognition

Author(s): Irina Burciu, Thomas Martinetz, Erhardt Barth

We introduce the Sensing Forest as a novel way of learning how to efficiently sense the visual world for a particular recognition task. The efficiency is evaluated in terms of the resulting recognition performance. We show how the performance depends on the number of sensing values, i.e., the depth of the trees and the size of the forest. We here simulate the sensing process by re-sampling digital images; in future applications one might use dedicated hardware to solve such recognition tasks without acquiring images. We show that our algorithm outperforms traditional Random Forests on the benchmarks MNIST and COIL-100. The basic Sensing Forest is a prototype-based Random Forest with prototypes learned with k-means clustering. Recognition performance can be further increased by using Learning Vector Quantization.

Paper 228: Sliding Window Based Micro-Expression Spotting: A Benchmark

Author(s): Khanh Tran, Xiaopeng Hong, Guoying Zhao

Micro-expressions are very rapid and involuntary facial expressions, which indicate the suppressed or concealed emotions and can lead to many potential applications. Recently, research in micro-expression spotting obtains increasing attention. By investigating existing methods, we realize that evaluation standards of micro-expression spotting methods are highly desired. To address this issue, we construct a benchmark for fairer and better performance evaluation of micro-expression spotting approaches. Firstly, we propose a sliding window based multi-scale evaluation standard with a series of protocols. Secondly, baseline results of popular features are provided. Finally, we also raise the concerns of taking advantages of machine learning techniques.

Paper 229: Shearlet-based Region Map Guidance for Improving Hyperspectral Image Classification

Author(s): Mariem Zaouali, Sonia Bouzidi, Ezzeddine Zagrouba

The inclusion of the spatial context in Hyperspectral Images' classification has proved its efficiency. However, considering all the neighboring samples might decrease the classification accuracy, mainly if they do not belong to the same homogenous region. To overcome this issue, a Shearlet-based Region Map Joint Sparse Representation (RM-JSR) is proposed in this paper. Our aim is to reduce the influence of neighboring pixels which do not belong to the same region as the pixel of interest. To do so, we process the image using the Shearlet Transform. We keep only the scale where the edge information is more salient and emphasized in several directions. Then, a K-means algorithm is carried out to segregate the coefficients into edge and not-edge clusters. Afterwards, the Inverse Shearlet Transform is applied on these different clusters and only the reconstructed edge cluster bands are fused into a single image. A threshold is applied in order to finally get the region map where homogeneous regions are well delimited. Our proposed method is designed to overcome the fixed window of the Simultaneous Orthogonal Matching Pursuit (SOMP), which is an implementation of the Joint Sparse Representation paradigm. Compared to other methods attempting to solve this issue, our proposed method achieves better overall classification accuracies.

Paper 230: A Novel and Accurate Local 3D Representation for Face Recognition

Author(s): Soumaya Mathlouthi, Majdi Jrib, Faarzi Ghorbel

In this paper, we intend to introduce a novel curved 3D face representation. It is constructed on some static parts of the face which correspond to the nose and the eyes. Each part is described by the level curves of the superposition of several geodesic potentials generated from many reference points.We propose to describe the eye region by a bipolar representation based on the superposition of two geodesic potentials generated from two reference points and the nose by a three-polar one(three reference points). We use the BU-3DFE database of 3D faces to test the accuracy of the proposed approach. The obtained results in the sense of the Hausdorff shape distance prove the performance of the novel representation for 3D faces identification. The obtained scores are comparable to the state of the art methods in the most of cases.

Paper 231: Filling Missing Parts of a 3D Mesh by Fusion of Incomplete 3D Data

Author(s): Laszlo Kormoczi, Zoltan Kato

This paper deals with the problem of fusing different (potentially partial) 3D meshes to fill in missing parts (holes) of an accurate reference 3D model using a less accurate but more complete moving 3D model. Typically, accurate 3D models can be produced by range devices (Lidar) which is often limited in setting viewpoints, while traditional Structure from Motion methods are using 2D images which are less restricted in viewpoints, but overall produce a less accurate 3D mesh. Combining the advantages of both modalities is an appealing solution to many real world problems. Herein we propose a novel method which detects holes in the accurate reference mesh and then each hole is filled from the less accurate 3D mesh by gradually estimating local affine transformations around the hole's boundary and propagating it into the inner part. Experimental validation is done on a large real dataset, which confirms the accuracy and reliability of the proposed algorithm.

Paper 232: Image Classification for Ground Traversability Estimation in Robotics

Author(s): R. Omar Chavez-Garcia, Jérôme Guzzi, Luca Gambardella, Alessandro Giusti

Mobile ground robots operating on uneven terrain must predict which areas of the environment they are able to pass in order to plan feasible paths. We cast traversability estimation as an image classification problem: we build a convolutional neural network that, given a square 60x60 px image representing the heightmap of a small 1.2x1.2 m patch of terrain, predicts whether the robot will be able to traverse such patch from bottom to top. The classifier is trained for a specific robot model, which may implement any locomotion type (wheeled, tracked, legged, snake-like), using simulation data on a variety of training terrains; once trained, the classifier can be quickly applied to patches extracted from unseen large heightmaps, in multiple orientations, thus building oriented traversability maps. We quantitatively validate the approach on real-elevation datasets.

Paper 235: Evaluation of Dimensionality Reduction Methods for Remote Sensing Images Using Classification and 3D Visualization

Author(s): Andreea Griparis, Daniela Faur, Mihai Datcu

Visual exploration is a natural way to understand the content of a data archive. If the data are multidimensional, the dimensionality reduction is an appropriate preprocessing step before the data visualization. In literature, two types of approaches are devoted to dimensionality reduction: feature selection and feature extraction algorithms. Both techniques intend to project a high dimensional space into a new one, with reduced dimensionality, preserving data inherent information. This paper aims to identify the similarity degree between low and high dimensional representations of a data archive using the optimal number of semantic classes as a criterion. This number is estimated based on the rate distortion theory being computed both before and after dimensionality reduction. The projection of the low dimensional space was obtained using one feature selection and six feature extraction methods.


This software generating these pages is © (not Ghent University), 2002-2026. All rights reserved.

The data on this page is © Acivs 2017. All rights reserved.

The server hosting this website is owned by the department of Telecommunications and Information Processing (TELIN) of Ghent University.

Problems with the website should be reported to .

"

This page was generated on Thursday April 09th, 2026 16:29:35.