Advanced Concepts for Intelligent Vision Systems
Sept. 24-27, 2018
Espace Mendes France, Poitiers, France
Acivs 2018 Abstracts
Paper 196: Can we classify the world? Where Deep Learning Meets Remote Sensing
Deep learning has been recently gaining significant attention for the analysis of data in multiple domains. It seeks to model high-level knowledge as a hierarchy of concepts. With the exploding amount of available data, the improvement of hardware and the advances in training methodologies, now such hierarchies can contain many more processing layers than before, hence the adoption of the term "deep".
In remote sensing, recent years have witnessed a remarkable increase in the amount of available data, due to a consistent improvement in the spectral, spatial and temporal resolutions of the sensors. Moreover, there are new sources of large-scale open access imagery, governments are releasing their geographic data to the public, and collaborative platforms are producing impressive amounts of cartography. With such an overwhelming amount of information, it is of paramount importance to develop smart systems that are able to handle and analyze these data. The scalability of deep learning and its ability to gain insight from large-scale datasets, makes it particularly interesting to the remote sensing community. It is often the case, however, that the deep learning advances in other domains cannot be directly applied to remote sensing. The type of input data and the constraints of remote sensing problems require the design of specific deep learning techniques.
In this talk, I will discuss how deep learning approaches help in remote sensing image interpretation. In particular, I will focus on the most powerful architectures for semantic labeling of aerial and satellite optical images, with the final purpose to produce and update world maps.
Paper 197: Earth Observation Big Data Intelligence: the paradigm shift
Earth Observation (EO) Data Intelligence is addressing the entire value chain: data processing to extract information, the information analysis to gather knowledge, and knowledge transformation in value. The tutorial brings a joint understanding of the Artificila Intelligence (AI) and the of the Deep Learning methods indicating integrated optimal solutions in complex EO applications, including the choice or generation of labeled data sets and the biases influence in validation or benchmarking. EO starts with the mission intelligence, therefore focusing on the latent parametrs hidden in the process of physical parameters retrieval like, orbit, illumination, or imaging process. In this context specifc AI for EO methods are addressed in the tutorial for the practical cases of multisensor data fusion and Satellite Image Time Series analytics. The methods are exemplified with actual use cases for multispectral and Synthetic Aperture (SAR) images.
Paper 105: Fusing Omnidirectional Visual Data for Probability Matching Prediction
This work presents an approach to visual data fusion with omnidirectional imaging in the field of mobile robotics. An inference framework is established through Gaussian processes (GPs) and Information gain metrics, in order to fuse visual data between poses of the robot. Such framework permits producing a probability distribution of feature matching existence in the 3D global reference system. Designed together with a filter-based prediction scheme, this strategy allows us to propose an improved probability-oriented feature matching, since the probability distribution is projected onto the image in order to predict relevant areas where matches are more likely to appear. This approach reveals to improve standard matching techniques, since it confers adaptability to the changing visual conditions by means of the Information gain and probability encodings. Consequently, the output data can feed a reliable visual localization application. Real experiments have been produced with a publicly-available dataset in order to confirm the validity and robustness of the contributions. Moreover, comparisons with a standard matching technique are also presented.
Paper 106: Analysis of Neural Codes for Near-Duplicate Detection
An important feature of digital asset management platforms and search engines is the possibility of retrieving near-duplicates of an image given by the user. Near-duplicates could be photos derived from an original photo after a certain transformation or different photos of the same scene. In this work we analyze the two cases, using convolutional neural networks for calculating the signatures of the images, introducing a new training set for model creation and some new datasets for performance evaluation. Results on these datasets and in standard datasets for image retrieval will be presented and discussed.
Paper 107: Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
This paper benchmarks 16 combinations of popular Deep Neural Networks for 1000-category image recognition and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.
Paper 109: Person Re-identification with a Body Orientation-Specific Convolutional Neural Network
Person re-identification consists in matching images of a par- ticular person captured in a network of cameras with non-overlapping fields of view. The challenges in this task arise from the large variations of human appearance. In particular, the same person could show very different appearances from different points of view. To address this challenge, in this paper we propose an Orientation-Specific Convolutional Neural Network (OSCNN) framework which jointly performs body orientation regression and extracts orientation-specific deep representations for per- son re- identification. A robust joint embedding is obtained by combining feature representations under different body orientations. We experimentally show on two public benchmarks that taking into account body orientations improves the person re-identification performance. Moreover, our approach outperforms most of the previous state-of-the-art re-identification methods on these benchmarks.
Paper 113: Matrix Descriptor of Changes (MDC): Activity Recognition Based on Skeleton
A new method called Matrix Descriptor of Changes (MDC) is introduced in this work for description and recognition of human activity from sequences of skeletons. The primary focus was on one of the main problems in this area which is different duration of activities; it is assumed that the beginning and the end are known. Some existing methods use bag of features, hidden Markov models, recurrent neural networks or straighten the time interval by different sampling so that each activity has the same number of frames to solve this problem. The essence of our method is creating one or more matrices with a constant size. The sizes of matrices depend on the vector dimension containing the per-frame low-level features from which the matrix is created. The matrices then characterize the activity, even if we assume that certain activities may have different durations. The principle of this method is tested with two types of input features: (i) 3D position of the skeleton joints and (ii) invariant angular features of the skeleton. All kinds of feature types are processed by MDC separately and, in the subsequent step, all the information gathered together as a feature vector are used for recognition by Support Vector Machine classifier. Experiments have shown that the results are similar to results of the state-of-the-art methods. The primary contribution of proposed method was creating a new simple descriptor for activity recognition with preservation of the state-of-the-art results. This method also has a potential for parallel implementation and execution.
Paper 114: Distributed Estimation of Vector Fields
In many surveillance applications the area of interest is either wide or includes alleys or corners. Thus, the images from multiple cameras need to be combined and this fact motivates the use of distributed optimization approaches. This work proposes three distributed estimation approaches to motion field estimation from target trajectory data: (1) purely decentralized, without communication, (2) distributed estimation based on a cooperative game, and (3) distributed Alternating Direction Method of Multipliers (ADMM). Their performance in estimating different classes of motion fields is important to select the best approach for each application. Experiments using synthetic and real data show that (a) the cooperative game approach is very susceptible to changes in motion direction, and (b) the distributed ADMM approach is the most robust and reliable approach to estimate changing direction motion fields.
Paper 117: L-infinite Predictive Coding of Depth
The paper introduces a novel L-inf-constrained compression method for depth maps. The proposed method performs depth segmentation and depth prediction in each segment, encoding the resulting information as a base layer. The depth residuals are modeled using a Two-Sided Geometric Distribution, and distortion and entropy models for the quantized residuals are derived based on such distributions. A set of optimal quantizers is determined to ensure a fix rate budget at a minimum L-inf distortion. A fixed-rate L-inf codec design performing context-based entropy coding of the quantized residuals is proposed, which is able to efficiently meet user constraints on rate or distortion. Additionally, a scalable L-inf codec extension is proposed, which enables encoding the quantized residuals in a number of enhancement layers. The experimental results show that the proposed L-inf coding approach substantially outperforms the L-inf coding extension of the state-of-the-art CALIC method.
Paper 119: Identification of Saimaa rRnged Seal Individuals using Transfer Learning
The conservation efforts of the endangered Saimaa ringed seal depend on the ability to reliably estimate the population size and to track individuals. Wildlife photo-identification has been successfully utilized in monitoring for various species. Traditionally, the collected images have been analyzed by biologists. However, due to the rapid increase in the amount of image data, there is a demand for automated methods. Ringed seals have pelage patterns that are unique to each seal enabling the individual identification. In this work, two methods of Saimaa ringed seal identification based on transfer learning are proposed. The first method involves retraining of an existing convolutional neural network (CNN). The second method uses the CNN trained for image classification to extract features which are then used to train a Support Vector Machine (SVM) classifier. Both approaches show over 90% identification accuracy on challenging image data, the SVM based method being slightly better.
Paper 120: Comparison of Co-segmentation Methods for Wildlife Photo-identification
Wildlife photo-identification is a commonly used technique to track animal populations over time. Nowadays, due to large image data sets, automated photo-identification is an emerging research topic. To improve the accuracy of identification methods, it is useful to segment the animal from the background. In this paper we evaluate the suitability of co-segmentation methods for this purpose. The basic idea in co-segmentation is to detect and to segment the common object in a set of images despite the different appearance of the object and different backgrounds. Such methods provide a promising approach to process large photo-identification databases for which manual or even semi-manual approaches are very time-consuming by making it unnecessary to annotate images to train supervised segmentation methods. We compare existing co-segmentation methods on challenging wildlife photo-identification images and show that the best methods obtain promising results on the task.
Paper 122: 3D Object-Camera and 3D Face-Camera Pose Estimation For Quadcopter Control: Application To Remote Labs
We present the implementation of two visual pose estimation algorithms (object-camera and face-camera) and a control system for a low cost quadcopter for an application in a remote electronic laboratory. The objective is threefold: (i) to allow the drone to inspect instruments in the remote lab, (ii) localize a teacher and center his face in the image for student-teacher remote communication, (iii) and to return back home and land on a platform for automatic recharge of the batteries. The object-camera localization system is composed of two complementary visual approaches: (i) a visual SLAM (Simultaneous Localization And Mapping) system, and (ii) a homography-based localization system. We extend the application scenarios of the SLAM system by allowing close range inspection of a planar electrical instrument and autonomous landing. The face-camera localization system is based on 3D modeling of the face, and a state of art 2D facial point detector. Experiments conducted in a remote laboratory workspace are presented. They prove the robustness of the proposed object-camera visual pose system compared to the SLAM system, and they prove the performance of the face-camera visual servoing and pose estimation system in terms of real-time, robustness and accuracy.
Paper 125: Directional Beams of Dense Trajectories for Dynamic Texture Recognition
An effective framework for dynamic texture recognition is introduced by exploiting local features and chaotic motions along beams of dense trajectories in which their motion points are encoded by using a new operator based on local vector patterns (LVP) in full-direction on three orthogonal planes. Furthermore, we also exploit motion information from dense trajectories to boost the discriminative power of the proposed descriptor. Experiments on various benchmarks validate the interest of our approach.
Paper 128: Enhanced Line Local Binary Patterns (EL-LBP): An Efficient Image Representation for Face Recognition
Local Binary Patterns (LBP) is one of the efficient approaches for image representation, especially in the face recognition field. The motivation of the present study is to find a compact descriptor which captures texture information and yet is robust against several visual challenges such as illumination variation, facial expressions and head pose variation. The proposed approach, called it Enhance Line Local Binary Patterns (EL-LBP), is an improvement of 1D-Local Binary Patterns (1D-LBP) by reducing the dimension of feature vectors within 1D-LBP histogram and it leads to decrease the time cost during the matching stage. Experiments using ORL, Yale and AR datasets show that EL-LBP outperforms previous LBP methods in terms of recognition accuracy with much lower time cost, suggesting that this new representation scheme would be more powerful in the embedded vision systems where the computational cost is critical.
Paper 130: NoiseNet: Signal-dependent Noise Variance Estimation with Convolutional Neural Network
In this paper, the problem of blind estimation of uncorrelated signal-dependent noise parameters in images is formulated as a regression problem with uncertainty. It is shown that this regression task can be effectively solved by a properly trained deep convolution neural network (CNN), called NoiseNet, comprising regressor branch and uncertainty quantifier branch. The former predicts noise standard deviation (STD) for a 32x32 pixels image patch, while the latter predicts STD of regressor error. The NoiseNet architecture is proposed and peculiarities of its training are discussed. Signal-dependent noise parameters are estimated by robust iterative processing of many local estimates provided by the NoiseNet. The comparative analysis for real data from NED2012 database is carried out. Its results show that the NoiseNet provides accuracy better than the state-of-the-art existing methods.
Paper 131: Single Sample Face Recognition by Sparse Recovery of Deep-learned Ida Features
Single Sample Per Person (SSPP) Face Recognition is receiving a significant attention due to the challenges it opens especially when conceived for real applications under unconstrained environments. In this paper we propose a solution combining the effectiveness of deep convolutional neural networks (DCNN) feature characterization, the discriminative capability of linear discriminant analysis (LDA), and the efficacy of a sparsity based classifier built on the k- LiMapS algorithm. Experiments on the public LFW dataset prove the method robustness to solve the SSPP problem, outperforming several state-of-the-art methods.
Paper 133: Enhanced Codebook Model and Fusion for Object Detection with Multispectral Images
The Codebook model is one of the popular real time models for object detection. In our previous paper, we have extended it to multispectral images. In this paper, two methods to improve the previous work are proposed. On one hand, multispectral self adaptive parameters and new estimation criteria are exploited to enhance codebook model. On the other hand, the approach of fusion is explored to improve the performance on multispectral images by fusing the detection results of the monochromatic bands. For the enhancements of codebook model, the self adaptive parameter estimation mechanism is developed based on the statistical information of the data themselves, with which, the overall performance has improved, in addition to saving time and effort to search for the appropriate parameters. Besides, the Spectral Information Divergence is used to replace the spectral distortion to evaluate the spectral similarity between two multispectral vectors. Results demonstrate that when the spectral information divergence and brightness criteria are utilized in the self adaptive codebook method, the performance can be improved slightly even further on average. For the approach of fusion, two strategies, namely pooling and majority vote, are adopted to exploit benefits of each spectral band to obtain better object detection performance.
Paper 134: An Efficient Agglomerative Algorithm Cooperating with Louvain Method for Implementing Image Segmentation
The idea that brings social networks analysis domain into image segmentation quite satisfies with most authors and harmony in those researches. However, the community detection based image segmentation often produces over-segmented results. To address this problem, we propose an efficient agglomerative homogeneous regions algorithm by considering image histograms which are contributed into bins of the color group properties. Our method is tested on the publicly available Berkeley Segmentation Dataset. And experimental results show that the proposed algorithm produces sizable segmentation and outperforms almost other known image segmentation methods in term of accuracy and comparative PRI scores.
Paper 139: Integrating UAV in IoT for RoI Classification in Remote Images
The paper presents a cheap and efficient solution for remote processing of images taken by a team of UAVs (Unmanned Aerial Vehicles). The work objective was to implement an integrated system for detection and classification of regions of interest (RoIs), in the case of flood events. The UAVs are considered as objects of the internet. This means the integration of UAVs in IoT (Internet of Things) as intelligent objects. The investigated RoIs are: water, grass, forests, buildings, and roads. A multi UAV - multi GCS (Ground Control Station) solution is proposed. Due to this integration, land segmentation by image processing can be efficiently made in real time. For RoI detection and evaluation a multi CNN structure is used as a multi classifier structure. Particularly, a CNN classifier is implemented for each type of RoI and all the CNNs work in parallel. The orthophotoplan obtained from remote acquired images are successively decomposed in adjacent images of size 6000x4000 and next in overlapping patches of size 65x65 pixels for classifier learning or for testing. Finally, the images are segmented in RoIs by a multi-mask technique and the percentage of each RoI is calculated. The accuracy of segmentation and the processing time, evaluated from 10 real images, was better than in other reported cases.
Paper 141: Unsupervised Perception Model for UAVs Landing Target Detection and Recognition
Today, unmanned aerial vehicles (UAV) play an interesting role in the so-called Industry 4.0. One of many problems studied by companies and research groups are the sensing of the environment intelligently. In this context, we tackle the problem of autonomous landing, and more precisely, the robust detection and recognition of a unique landing target in an outdoor environment. The challenge is how to deal with images under non-controlled light conditions impacted by shadows, change of scale, perspective, vibrations, noise, blur, among others. In this paper, we introduce a robust unsupervised model allowing to detect and recognize a target, in a perceptual-inspired manner, using the Gestalt principles of non-accidentalness and grouping. Our model extracts the landing target contours as outliers using the RX anomaly detector and computing proximity and a similarity measure. Finally, we show the use of error correction Hamming code to reduce the recognition errors.
Paper 146: Scanner Model Identification of Official Documents Using Noise Parameters Estimation in the Wavelet Domain
In this article, we propose a novel approach for discerning which scanner has been used to scan a particular document. Its orig- inality relates to a signature extracted in the wavelet domain of the digitized documents where the acquisition noise specific to a scanner is located in the first subbands of details. This signature is an estimate of the statistical noise model which is modeled by a General Gaussian distribution (GGD) and whose parameters are estimated in the HH sub- band by maximizing the likelihood function. These parameters constitute a unique identifier for a scanner. For a given image, we propose to iden- tify its origin by minimizing the Kullback-Leibler divergence between its signature and those of known scanners. Experiments conducted on a real scanned-image database, developed for the validation of the work presented in this paper, show that the proposed approach achieves high detection performance. Total of 1000 images were used in experiments.
Paper 147: Orthogonally-Divergent Fisheye Stereo
An integral part of driver assistance technology is surround view (SV), a system which uses four fisheye (wide angle) cameras on the front, right, rear, and left sides of a vehicle to completely capture the surroundings. Inherent in SV are four wide baseline, orthogonally divergent fisheye stereo systems, from which, depth information may be extracted and used in 3D scene understanding. Traditional stereo approaches typically require fisheye distortion removal and stereo rectification for efficient correspondence matching. However, such approaches suffer from loss of data and cannot account for widely disparate appearances of objects in corresponding views. We introduce a novel method for computing depth from fisheye stereo that uses an understanding of the underlying lens models and a convolutional network to predict correspondences. We also built a synthetic database for developing and testing fisheye stereo and SV algorithms. We demonstrate the performance of our depth estimation method on this database.
Paper 148: Large Parallax Image Stitching Using an Edge-Preserving Diffeomorphic Warping Process
Image Stitching is a hard task to solve in the presence of large parallax in video frames. In many cases, video frames shot using hand-held cameras have low resolution, blur and large parallax errors. Most recent works fail to align such a sequence of images accurately. The proposed method aims to accurately align image frames, by employing a novel demon-based, edge-preserving diffeomorphic registration, termed as "DiffeoWarps". The first stage aligns the images globally using perspective (homography) transformation. At the second stage, an alternating method of minimization of correspondence energy and TV-regularization improves the alignment. The "diffeowarped" images are then blended to obtain good quality stitched results. We experimented on two standard datasets as well as on a dataset comprising of 10 sets of images/frames collected from unconstrained videos. Both qualitative and quantitative performance analysis show the superiority of our proposed method.
Paper 152: Reconfigurable FPGA Implementation of the AVC Quantiser and Dequantiser Blocks
As image and video resolution continues to increase, compression plays a vital role in the successful transmission of video and image data over a limited bandwidth channel. Computation complexity, as well as the utilization of resources and power, keep increasing when we move from the H264 codec to the H265 codec. Optimizations in each particular block of the Advanced Video Coding (AVC) standard significantly improve the operating frequency of a hardware implementation. In this paper, we designed parametrized reconfigurable quantiser and de-quantiser blocks of AVC through dynamic circuit specialization, which is different from traditional reconfiguration of FPGA. We implemented the design on a Zynq-SoC board, which resulted in optimizations in resource consumption of 14.1% and 20.6% for the quantiser and de-quantiser blocks respectively, compared to non-reconfigurable versions.
Paper 153: Learning Morphological Operators for Depth Completion
Depth images generated by direct projection of LiDAR point clouds on the image plane suffer from a great level of sparsity which is difficult to interpret by classical computer vision algorithms. We propose a method for completing sparse depth images in a semantically accurate manner by training a novel morphological neural network. Our method approximates morphological operations by Contraharmonic Mean Filter layers which are easily trained in a contemporary deep learning framework. An early fusion U-Net architecture then combines dilated depth channels and RGB using multi-scale processing. Using a large scale RGBD dataset we are able to learn the optimal morphological and convolutional filter shapes that produce an accurate and fully sampled depth image at the output. Independent experimental evaluation confirms that our method outperforms classical image restoration techniques as well as current state-of-the-art neural networks. The resulting depth images preserve object boundaries and can easily be used to augment various tasks in intelligent vehicles perception systems.
Paper 154: Robust Geodesic Skeleton Estimation from Body Single Depth
In this paper, we introduce a novel and robust body pose estimation method with single depth image, whereby it is possible to provide the skeletal configuration of the body with significant accuracy even in the condition of severe body deformations. In order for the precise identification, we propose a novel feature descriptor based on a geodesic path over the body surface by accumulating sequence of characters correspond to the path vectors along body deformations, which is referred to as GPS(Geodesic Path Sequence). We also incorporate the length of each GPS into a joint entropy-based objective function representing both class and structural information, instead of the typical objective considering only class labels in training the random forest classifier. Furthermore, we exploit a skeleton matching method based on the geodesic extrema of the body, which enhances more robustness to joints misidentification. The proposed solutions yield more spatially accurate predictions for the body parts and skeletal joints. Numerical and visual experiments with our generated data confirm the usefulness of the method.
Paper 155: I-HAZE: a Dehazing Benchmark with Real Hazy and Haze-free Indoor Images
Image dehazing has become an important computational imaging topic in the recent years. However, due to the lack of ground truth images, the comparison of dehazing methods is not straightfor- ward, nor objective. To overcome this issue we introduce I-HAZE, a new dataset that contains 35 image pairs of hazy and corresponding haze-free (ground-truth) indoor images. Different from most of the existing dehaz- ing databases, hazy images have been generated using real haze produced by a professional haze machine. To ease color calibration and improve the assessment of dehazing algorithms, each scene includes a MacBeth color checker. Moreover, since the images are captured in a controlled environment, both haze-free and hazy images are captured under the same illumination conditions. This represents an important advantage of the I-HAZE dataset that allows us to objectively compare the existing image dehazing techniques using traditional image quality metrics such as PSNR and SSIM.
Paper 156: Two-Camera Synchronization and Trajectory Reconstruction for a Touch Screen Usability Experiment
This paper considers the usability of stereoscopic 3D touch displays. For this purpose extensive subjective experiments were carried out and the hand movements of test subjects were recorded using a two-camera setup consisting of a high-speed camera and a standard RGB video camera with different viewing angles. This produced a large amount of video data that is very laborious to analyze manually which motivates the development of automated methods. In this paper, we propose a method for automatic video synchronization for the two cameras to enable 3D trajectory reconstruction. This together with proper finger tracking and trajectory processing techniques form a fully automated measurement framework for hand movements. We evaluated the proposed method with a large amount of hand movement videos and demonstrated its accuracy on 3D trajectory reconstruction. Finally, we computed a set of hand trajectory features from the data and show that certain features, such as the mean and maximum velocity differ statistically significantly between different target object disparity categories. With small modifications, the framework can be utilized in other similar HCI studies.
Paper 157: Optimising Data for Exemplar-based Inpainting
Optimisation of inpainting data plays an important role in inpainting-based codecs. For diffusion-based inpainting, it is well-known that a careful data selection has a substantial impact on the reconstruction quality. However, for exemplar-based inpainting, which is advantageous for highly textured images, no data optimisation strategies have been explored yet. In our paper, we propose the first data optimisation approach for exemplar-based inpainting. It densifies the known data iteratively: New data points are added by dithering the current error map. Afterwards, the data mask is further improved by nonlocal pixel exchanges. Experiments demonstrate that our method yields significant improvements for exemplar-based inpainting with sparse data.
Paper 158: Improving a Switched Vector Field Model for Pedestrian Motion Analysis
Modeling the trajectories of pedestrians is a key task in video surveillance. However, finding a suitable model to describe the trajectories is challenging, mainly because several of the models tend to have a large number of parameters to be estimated. This paper addresses this issue and provides insights on how to tackle this problem. We model the trajectories using a mixture of vector fields with probabilistic switching mechanism that allows to efficiently change the trajectory motion. Depending on the probabilistic formulation, the motions fields can have a dense or sparse representation, which we believe influences the performance of the model. Moreover, the model has a large set of parameters that need to be estimated using the initialization-dependent EM-algorithm.
To overcome the previous issues, an extensive study of the parameters estimation is conducted, namely: (i) initialization, and (ii) priors distribution that controls the sparsity of the solution. The various models are evaluated in the trajectory prediction task, using a newly proposed method. Experimental results in both synthetic and real examples provide new insights and valuable information how the parameters play an important in the proposed framework.
Paper 159: Intrinsic Calibration of a Camera to a Line-Structured Light using a Single View of Two Spheres
This paper proposes a novel approach to calibrate the intrinsic camera parameters from a single image, which includes the silhouette of two spheres and two ellipses generated by the intersection between the line-structured laser light and the two spheres. This approach uses the vanishing line of a plane and its normal direction to calculate the orthogonal constraints on the image of absolute conic (IAC). And this plane is formed by the camera center and two sphere centers. In addition, the pair of the circular points of the light plane is calculated from the generalized eigenpairs from the intersection between the light plane and the spheres. The intrinsic parameters of the camera can then be recovered from the derived orthogonal constraint and the pair of circular points on the IAC. Furthermore, the 3D positions of these two sphere centers under the camera coordinate can be recovered from the camera intrinsic matrix and then used to evaluate the accuracy of the camera intrinsic matrix. Experiment results on both synthetic and real data show the accuracy and the feasibility of the proposed approach.
Paper 160: Multi-organ Segmentation of Chest CT Images in Radiation Oncology: Comparison of Standard and Dilated UNet
Automatic delineation of organs at risk (OAR) in computed tomography (CT) images is a crucial step for treatment planning in radiation oncology. However, manual delineation of organs is a challenging and time-consuming task subject to inter-observer variabilities. Automatic organ delineation has been relying on non-rigid registrations and atlases. However, lately deep learning appears as a strong competitor with specific architectures dedicated to image segmentation like UNet. In this paper, we first assessed the standard UNet to delineate multiple organs in CT images. Second, we observed the effect of dilated convolutional layers in UNet to better capture the global context from the CT images and effectively learn the anatomy, which results in increased localization of organ delineation. We evaluated the performance of a standard UNet and a dilated UNet (with dilated convolutional layers) on four chest organs (esophagus, left lung, right lung, and spinal cord) from 29 lung image acquisitions and observed that dilated UNet delineates the soft tissues notably esophagus and spinal cord with higher accuracy than the standard UNet. We quantified the segmentation accuracy of both models by computing spatial overlap measures like Dice similarity coefficient, recall & precision, and Hausdorff distance. Compared to the standard UNet, dilated UNet yields the best Dice scores for soft organs whereas for lungs, no significant difference in the Dice score was observed: 0.84±0.07 vs 0.71±0.10 for esophagus, 0.99±0.01 vs 0.99±0.01 for left lung, 0.99±0.01 vs 0.99±0.01 for right lung and 0.91±0.05 vs 0.88±0.04 for spinal cord.
Paper 163: Diffuse Low Grade Glioma NMR Assessment for Better Intra-operative Targeting Using Fuzzy Logic
Nowadays, billion images are made each year to discover or follow brain pathologies. The main goal of our tool is to offer an enhanced view of each diffused low grade glioma by using a fuzzy logic targeting. Using this method, we are trying to reproduce the neuroradiologist process to have a better understanding of the images and a better tumor targeting. We can use this method with only one multi-parametric RMN acquisition (anatomical, diffusion, perfusion, spectroscopy). For the diffuse low grade glioma (DLGG), it helps to deal with uncertain bounds of the tumor and helps to target isolated infiltrated tumorous cells. It will help surgeons in their decision process in case of supra-total resection. As results, we obtain color maps which show different parts of the tumor : the main core and spread cells.
Paper 165: Contour Propagation in CT scans with Convolutional Neural Networks
Although deep convolutional neural networks (CNNs) have outperformed state-of-the-art in many medical image segmentation tasks, deep network architectures generally fail in exploiting common sense prior to drive the segmentation. In particular, the availability of a segmented (source) image observed in a CT slice that is adjacent to the slice to be segmented (or target image) has not been considered to improve the deep models segmentation accuracy. In this paper, we investigate a CNN architecture that maps a joint input, composed of the target image and the source segmentation, to a target segmentation. We observe that our solution succeeds in taking advantage of the source segmentation when it is sufficiently close to the target segmentation, without being penalized when the source is far from the target.
Paper 166: Parallel and Distributed Local Fisher Discriminant Analysis to reduce Hyperspectral Images on Cloud Computing Architectures
Hyperspectral images are data cubes that offer very rich spectral and spatial resolutions. These images are so highly dimensioned that we generally reduce them in a pre-processing step in order to process them efficiently. In this context, Local Fisher Discriminant Analysis (LFDA) is a feature extraction technique that proved better than several commonly used dimensionality reduction techniques. However, this method suffers from memory problems and long execution times on commodity hardware. In this paper, to solve these problems, we first added an optimization step to LFDA to make it executable on commodity hardware and to make it suitable for parallel and distributed computing, then, we implemented it in a parallel and distributed way using Apache Spark. We tested our implementation on Amazon Web Services (AWS)’s Elastic MapReduce (EMR) clusters, using different hyperspectral images with different sizes. This proved higher performances with a speedup of up to 70x.
Paper 167: An Application of Data Compression Models to Handwritten Digit Classification
In this paper, we address handwritten digit classification as a special problem of data compression modeling. The creation of the models---usually known as training---is just a process of counting. Moreover, the model associated to each class can be trained independently of all the other class models. Also, they can be updated later with new examples, even if the old ones are not available anymore. Under this framework, we show that it is possible to attain a classification accuracy consistently above 99.3% on the MNIST dataset, using classifiers trained in less than one hour on a common laptop.
Paper 168: A Wavelet Based Image Fusion Method using Local Multiscale Image Regularity
This paper presents an image fusion method which uses an image dependent multiscale decomposition and a fusion rule which is based on the local image multiscale activity. The latter is used for determining a proper partition of the frequency plane where the max-based fusion strategy is applied; the same image activity is used for guiding the max-based fusion rule. Multiscale local image activity is computed using an estimation of the local Lipschitz regularity at different resolutions. Preliminary experimental results show that the proposed method is able to provide satisfying fusion results in terms of details preservation of the visible image, extraction of the important regions in the infrared image, reduction of artifacts and noise amplification. Presented results, even if in their preliminary form, are able to reach and outperform some of the most recent and performing fusion methods.
Paper 169: A Global Decoding Strategy with a Reduced-reference Metric Designed for the Wireless Transmission of JPWL
A new global decoding strategy with Reduced Reference (RR) metric is proposed to improve the Quality of Experience (QoE) in a wireless transmission context. The RR metric (FMRP) utilizes the magnitude and the relative phase information in the complex wavelet domain as the evaluation features. It determines the number of decoder layers to achieve the goal of evaluating the image in a consistent way with the Human Visual system (HVS). To evaluate the performance of the decoding strategy, we collected some distorted images in realistic channel attacks and recruited volunteers to do a large psychovisual test. The distorted images and the classification data of voluntary assessors are integrated into a database which is in a realistic wireless channel context quite different from the classic database. Experimental studies confirm that the decoding strategy is effective and improves the QoE while ensuring the Quality of Service (QoS).
Paper 170: Clustering Based Reference Normal Pose for Improved Expression Recognition
In this paper the theme of automatic face expression identification is approached. We propose a robust method to identify the neutral face of a person while showing various expressions. The method consists in separating various images of faces based on expressions with a clustering method and retrieving the neutral face as being in the image closest to the centroid of the dominant cluster. The so found neutral face is used in conjunction with an expression detection method. We tested the method on the Extended Cohn-Kanade database where we identify correctly the neutral face with 100% accuracy and on the UNBC McMaster Pain Shoulder database where the use of the neutral pose leads to an increase of 10% in accuracy thus entering in the range of state of the art in pain detection.
Paper 171: Effective Training of Convolutional Neural Networks for Insect Image Recognition
Insects are living beings whose utility is critical in life sciences. They enable biologists obtaining knowledge on natural landscapes (for example on their health). Nevertheless, insect identification is time-consuming and requires experienced workforce. To ease this task, we propose to turn it into an image-based pattern recognition problem by recognizing the insect from a photo.
In this paper state-of-art deep convolutional architectures are used to tackle this problem. However, a limitation to the use of deep CNNs is the lack of data and the discrepancies in classes cardinality. To deal with such limitations, transfer learning is used to apply knowledge learnt from ImageNet-1000 recognition task to insect image recognition task. A question arises from transfer-learning: is it relevant to retrain the entire network or is it better not to modify some layers weights? The hypothesis behind this question is that there must be part of the network which contains generic (problem-independent) knowledge and the other one contains problem-specific knowledge.
Tests have been conducted on two different insect image datasets. VGG-16 models were adapted to be more easily learnt. VGG-16 models were trained a) from scratch b) from ImageNet-1000. An advanced study was led on one of the datasets in which the influences on performance of two parameters were investigated: 1) The amount of learning data 2) The number of layers to be finetuned. It was determined VGG-16 last block is enough to be relearnt. We have made the code of our experiment as well as the script for generating an annotated insect dataset from ImageNet publicly available.
Paper 172: Robust Feature Descriptors For Object Segmentation Using Active Shape Models
Object segmentation is still an active topic that is highly visited in image processing and computer vision communities. This task is challenging due not only to difficult image conditions (e.g., poor resolution or contrast), but also to objects whose appearance vary significantly. This paper visits the Active Shape Model (ASM) that has become a widely used deformable model for object segmentation in images. Since the success of this model depends on its ability to locate the object, many detectors have been proposed. Here, we propose a new methodology in which the ASM search takes the form of local rectangular regions sampled around each landmark point. These regions are then correlated to variable or fixed texture templates learned over a training set. We compare the performance of the proposed approach against other detectors based on: (i) the classical ASM edge detection; (ii) the Histogram of Oriented Gradients (HOG); and (iii) the Scale-Invariant Feature Transform (SIFT). The evaluation is performed in two different applications: facial fitting and segmentation of the left ventricle (LV) in cardiac magnetic resonance (CMR) images, showing that the proposed method leads to a significant increase in accuracy and outperforms the other approaches.
Paper 174: Relocated Colour Contrast Occurrence Matrix and Adapted Similarity Measure for Colour Texture Retrieval
For metrological purposes, distance between texture images is crucial. This work study as a pair the couple texture feature/similarity measure. Starting from the Colour Contrast Occurrence Matrix (C2O) definition, we propose an adapted similarity measure improving texture retrieval. In a second step, we propose a modied version of the C2O definition including the texture's colour average inside a modied similarity measure. Performance in texture retrieval is assessed for four challenging datasets: Vistex, Stex, Outex-TC13 and KTH-TIPS2b databases facing to the recent results from the state-of-the-art. Results show the high efficiency of the proposed approach based on a simple pair feature/similarity measure facing to more complex approaches including Convolutional Neural Networks.
Paper 175: Fast Light Field inpainting Propagation using Angular Warping and Color-guided Disparity Interpolation
This paper describes a method for fast and efficient inpainting of light fields. We first revisit disparity estimation based on smoothed structure tensors and analyze typical artefacts with their impact for the inpainting problem. We then propose an approach which is computationally fast while giving more coherent disparity in the masked region. This disparity is then used for propagating, by angular warping, the inpainted texture of one view to the entire light field. Performed experiments show the ability of our approach to yield appealing results while running considerably faster.
Paper 177: Recursive Chaining of Reversible Image-to-image Translators for Face Aging
This paper addresses the modeling and simulation of progressive changes over time, such as human face aging. By treating the age phases as a sequence of image domains, we construct a chain of transformers that map images from one age domain to the next. Leveraging recent adversarial image translation methods, our approach requires no training samples of the same individual at different ages. Here, the model must be flexible enough to translate a child face to a young adult, and all the way through the adulthood to old age. We find that some transformers in the chain can be recursively applied on their own output to cover multiple phases, compressing the chain. The structure of the chain also unearths information about the underlying physical process. We demonstrate the performance of our method with precise and intuitive metrics, and visually match with the face aging state-of-the-art.
Paper 178: Dealing with Topological Information within a Fully Convolutional Neural Network
A fully convolutional neural network has a receptive field of limited size and therefore cannot exploit global information, such as topological information. A solution is proposed in this paper to solve this problem, based based on pre-processing with a geodesic operator. It is applied to the segmentation of histological images of pigmented reconstructed epidermis acquired via Whole Slide Imaging.
Paper 179: A Deep Learning Approach to Hair Segmentation and Color Extraction from Facial Images
In this paper we tackle the problem of hair analysis in unconstrained images. We propose a fully convolutional, multi-task neural network to segment the image pixels into hair, face and background classes. The network also decides if the person is bald or not. The detected hair pixels are analyzed by a color recognition module which uses color features extracted at super-pixel level and a Random Forest Classifier to determine the hair tone (black, blond, brown, red or white grey). To train and test the proposed solution, we manually segment more than 3500 images from a publicly available dataset. The proposed framework was evaluated on three public databases. The experiments we performed together with the hair color recognition rate of 92% demonstrate the efficiency of the proposed solution.
Paper 181: Automatically Selecting the Best Pictures for an Individualized Child Photo Album
In this paper we investigate the best way to automatically compose a photo album for an individual child from a large collection of photographs taken during a school year. For this, we efficiently combine state-of-the-art identification algorithms to select relevant photos, with an aesthetics estimation algorithm to only keep the best images. For the identification task, we achieved 86% precision for 86% recall on a real-life dataset containing lots of specific challenges of this application. Indeed, playing children appear in non-standard poses and facial expressions, can be dressed up or have their faces painted etc. In a top-1 sense, our system was able to correctly identify 89.2% of the faces in close-up. Apart from facial recognition, we discuss and evaluate extending the identification system with person re-identification. To select out the best-looking photos from the identified child photos to fill the album with, we propose an automatic assessment technique that takes into account the aesthetic photo quality as well as the emotions in the photos. Our experiments show that this measure correlates well with a manually labeled general appreciation score.
Paper 182: Face Detection in Painting using Deep Convolutional Neural Networks
The artistic style of paintings constitutes an important information about the painter's technique. It can provide a rich description of this technique using image processing tools, and particularly using image features. In this paper, we investigate automatic face detection in the Tenebrism style, a particular painting style that is characterized by the use of extreme contrast between the light and dark. We show that convolutional neural network along with an adapted learning base makes it possible to detect faces with a maximum accuracy in this style. This result is particularly interesting since it can be the basis of an illuminant study in the Tenebrism style.
Paper 183: Detecting and Recognizing Salient Object in Videos
Saliency detection has been an interesting research field. Some researchers consider it as a segmentation problem some others treat it differently. In this paper, we propose a novel video saliency framework that detects and recognizes the object of interest. Starting from the assumption that spatial and temporal information of an input video frame can provide better saliency results than using each information alone, we propose a spatio-temporal saliency model for detecting salient objects in videos. First, spatial saliency is measured at patch-level by fusing local contrasts with spatial priors to label each patch as a foreground or a background one. Then, the newly proposed motion distinctiveness feature and temporal gradient magnitude measure are used to obtain the temporal saliency maps. Spatial and temporal saliency maps are fused together into one master saliency map. Object classification framework contains training and testing stage. On the training phase, we use a convolutional neural network to extract features of the proposed training set. Then, deep features are fed into a Support Vector Machine classifier to produce a classification Model. This model will be used to predict the class of the salient object. Despite the framework is simple to implement and efficient to run, it has shown good performances and achieved good results. Experiments on two standard benchmark datasets for video saliency have shown that the proposed temporal cues improve saliency estimation results. Results are compared to six state-of-the-art methods on two benchmark datasets.
Paper 184: Foreground Background Segmentation in Front of Changing Footage on a Video Screen
In this paper, a robust approach for detecting foreground objects moving in front of a video screen is presented. The proposed method constructs a background model for every image shown on the screen, assuming these images are known up to an appearance transformation. This transformation is guided by a color mapping function, constructed in the beginning of the sequence. The foreground object is then segmented at runtime by comparing the input from the camera with a color mapped representation of the background image, by analysing both direct color and edge feature differences. The method is tested on challenging sequences, where the background screen displays photo- realistic videos. It is shown that the proposed method is able to produce accurate foreground masks, with obtained F1-scores ranging from 85.61% to 90.74% on our dataset.
Paper 187: Person Re-identification using Group Context
The person re-identification task consists in matching person images detected from surveillance cameras with non-overlapping fields of view. Most existing approaches are based on the person's visual appearance. However, one of the main challenges, especially for a large gallery set, is that many people wear very similar clothing. Our proposed approach addresses this issue by exploiting information on the group of persons around the given individual. In this way, possible ambiguities are reduced and the discriminative power for person re-identification is enhanced, since people often walk in groups and even tend to walk alongside strangers. In this paper, we propose to use a deep convolutional neural networks (CNN) to extract group feature representations that are invariant to the relative displacements of individuals within a group. Then we use this group feature representation to perform group association under non-overlapping cameras. Furthermore, we propose a neural network framework to combine the group cue with the single person feature representation to improve the person re-identification performance. We experimentally show that our deep group feature representation achieves a better group association performance than the state-of-the-art methods and that taking into account group context improves the accuracy of the individual re-identification.
Paper 188: Derivative Half Gaussian Kernels and Shock Filter
Shock filter represents an important family in the field of nonlinear Partial Differential Equations (PDEs) models for image restoration and enhancement. Commonly, the smoothed second order derivative of the image assists this type of method in the deblurring mechanism. This paper presents the advantages to insert information issued of oriented half Gaussian kernels in a shock filter process. Edge directions assist to preserve contours whereas the gradient direction allow to enhance and deblur images. For this purpose, the two edge directions are extracted by the oriented half kernels, preserving and enhancing well corner points and object contours as well as small objects. The proposed approach is compared to 7 other PDE techniques, presenting its robustness and reliability, without creating a grainy effect around edges.
Paper 189: Fingerprint Classification using Conic Radon Transform and Convolutional Neural Networks
Fingerprint classification is a useful technique for reducing the number of comparisons in automated fingerprint identification systems. But it remains a challenging issue due to the large intra-class and the small inter-class variations. In this work, we propose a novel approach to perform fingerprint classification based on combining the Radon transform and the convolutional neural networks (CNN). The proposed approach is based on the Conic Radon Transform (CRT). The CRT extends the Classical Radon Transform (RT) to integrate an image function f(x,y) over conic sections. The Radon technique enables the extraction of fingerprint's global characteristics which are invariant to geometrical transformations such as translation and rotation. We define an expansion of convolutional neural networks input features based on CRT. Thus, we perform first RT over conic sections on source image, and then use the Radon result as an input for convolutional layers. To evaluate the performance of this approach, we have driven tests on the NIST SD4 benchmark database. The obtained results show that this approach is competitive to other related methods in terms of accuracy rate and computational time.
Paper 190: Bayesian Vehicle Detection using Optical Remote Sensing Images
Automatic object detection is a widely investigated problem in different fields such as military and urban surveillance. The availability of Very High Resolution (VHR) optical remotely sensed data, has motivated the design of new object detection methods that allow recognizing small objects like ships, buildings and vehicles. However, the challenge always remains in increasing the accuracy and speed of these object detection methods. This can be difficult due to the complex background. Therefore, the development of robust and flexible models that analyze remotely sensed data for vehicle detection is needed. We propose in this paper a hierarchical Bayesian model for automatic vehicle detection. Experiments performed using real data indicate the benefit that can be drawn from our approach.