Advanced Concepts for Intelligent Vision Systems
Oct. 26-29, 2015
Museo Diocesano, Catania, Italy
Acivs 2015 Abstracts
Paper 201: Solidarity Filter for Noise Reduction of 3D Edges in Depth Images
3D applications processing depth images significantly benefit from 3D-edge extraction techniques. Intrinsic sensor noise in depth images is largely inherited to the extracted 3D edges. Conventional denoising algorithms remove some of this noise, but also weaken narrow edges, amplify noisy pixels and introduce false edges. We therefore propose a novel solidarity filter for noise removal in 3D edge images without artefacts such as false edges. The proposed filter is defining neighbouring pixels with similar properties and connecting those into larger segments beyond the size of a conventional filter aperture. The experimental results show that the solidarity filter outperforms the median and morphological close filters with 42% and 69% higher PSNR, respectively. In terms of the mean SSIM metric, the solidarity filter provides results that are 11% and 21% closer to the ground truth than the corresponding results obtained by the median and close filters, respectively.
Paper 231: Quasar - Unlocking the Power of Heterogeneous Hardware
Computationally and data intensive applications, such as video processing algorithms, are traditionally developed in programming languages such as C/C++. In order to cope with the more demanding requirements (e.g., real- time processing of large datasets), hardware accelerators such as GPUs have emerged to aid multi-core CPUs for computationally intensive tasks. Because these accelerators offer performance improvements for many (but often not all) operations, the programmer needs to decide which parts of the code are best to be developed for the accelerator or the CPU.
Development for heterogeneous devices comes at a cost: 1) the sophisticated programming and debugging techniques lead to a steep learning curve, 2) development and optimization often requires huge efforts and time from the programmer, 3) often different versions of the code for different target platforms need to be written, 4) the resulting code may not be future-proof: it is not guaranteed to work optimally on future devices.
In this talk we present a new programming language, Quasar, which mitigates these common drawbacks. Quasar is an easy-to-learn, high-level programming language that is hardware-independent, ideal for both rapid prototyping and full deployment on heterogeneous hardware.
Paper 232: Joint Optical Designing: Enhancing Optical Design by Image Processing Consideration
The recent surge in European projects (Panorama, Copcams, Exist, Image Capture of the Future....) illustrates not only a high level of research activity in this area but also a strong interest from industry. Computer vision and video processing have entered a new era offering large data rates, ever increasing spatial and temporal resolution and multiple imaging modalities.
As a consequence, this also opens new areas of optimal optical solutions. This enables new multispectral imaging systems as well as enhanced performances linked to combined optimization of optical designs (if necessary including wavefront coding optical components) while maintaining low power, compact and real time image processing.
The talk will illustrate how these new improved image processing are opening a new era of joint optical design combining optimally optical design and real time image processing.
Paper 233: Computer Vision Applications and Their Industrial Exploitation
During the talk the main business opportunities in the field of the computer vision for the semiconductor industry will be outlined. Market analysis, R&D trends, time to market, customer requirements, algorithm complexity and architectural implementations are the key aspects to be analysed for a product development. A few potential R&D activities like people detection and feature extraction will be presented and their evolution towards a product will be outlined. A short overview on R&D evolution will be outlined.
Paper 234: Smart Image Sensor for Advanced Use and New Applications
Multimedia applications such as video compression, image processing, face recognition, run now on embedded platforms. The huge computing power needed is provided by the evolution of the transistor density and by using specialized accelerators. These accelerators are supported by multimedia instructions set.
Using these complex instructions can be a nightmare for the engineer because there are many ways to program it, quality of the compiler support can be random depending on the couple compiler/platform and worse, performances can be data dependent. Using libraries can be an option if such libraries exist and provide enough performances.
In this talk, I’ll illustrate the difficulty to generate binary code for this application domain by practical example of code generation. Then I’ll show a tool deGoal which is developed in house to resolve these problems.
Paper 236: An Adaptive Framework for Imaging Systems
Computer vision and video processing systems handle large amounts of data with varying spatial and temporal resolution and multiple imaging modalities. The current best practice is to design video processing systems with an overcapacity, which avoids underperforming in the general case, but wastes resources. In this work we present an adaptive framework for imaging systems that aims at minimizing waste of resources. Depending on properties of the processed images, the system dynamically adapts both the implementation of the processing system and properties of the underlying hardware.
Paper 237: Binary Code Generation for Multimedia Application on Embedded Platforms
Multimedia applications such as video compression, image processing, face recognition, run now on embedded platforms. The huge computing power needed is provided by the evolution of the transistor density and by using specialized accelerators. Theses accelerators are supported by multimedia instructions set.
Using theses complex instructions can be a nightmare for the engineer because there is many way to program it, quality of the compiler support can be random depending on the couple compiler/platform and worse, performances can be data dependent. Using libraries can be an option if such library exist and provide enough performances.
In this talk I'll illustrate the difficulty to generate binary code for this application domain by practical example of code generation. Then I'll show a tool deGoal which is developed in house to resolve these problems.
Paper 238: Goals and Directions of the Newly Started EXIST Project
A proposal titled EXIST (‘Extended image sensing technologies’) was accepted during the first EC/ECSEL call of 2014. EXIST will investigate and develop innovative new technologies for image sensors needed in the next plus one (N+2) generation of several application domains. The image sensor research will focus on enhancing and extending the capabilities of current CMOS imaging devices. The EXIST consortium will develop innovative new technologies for image sensors:
- New design (architectures) and process technology (e.g. 3D stacking) for better pixels (lower noise, higher dynamic range, higher quantum efficiency, new functionality in the pixel) and more pixels at higher speed (higher spatial and temporal resolutions, higher bit depth), time-of-flight pixels, local (on-chip) processing, embedded CCD in CMOS Time delayed integration.
- Extended sensitivity and functionality of the pixels: extension into infrared, filters for hyperspectral and multispectral imaging, better colour filters for a wider colour gamut, and FabryPérot Interference cells.
- Increasing the optical, analog and data imaging pipelines to enable high frame rates, better memory management, etc.
Together with sensor related processing these image sensor and filter designs will be demonstrated in 9 different demonstrators in the following application domains: Security, Healthcare, Digital Lifestyle and Agriculture.
Paper 239: Image features for illuminant estimation and correction
Many computer vision applications for both still images and videos can make use of Illuminant estimation and correction algorithms as a pre-processing step to make sure that the recorded color of the objects in the scene does not change under different illumination conditions. It can be shown that illuminant estimation is an ill-posed problem, its solution lacks therefore of uniqueness and stability. To cope with this problem, common solutions usually exploit some heuristic assumptions about the statistical properties of the expected illuminants and/or of the reflectance of the objects in the scene. In this keynote I briefly review state-of-the-art methods and illustrate promising researches aimed to improve single and multiple illuminants estimation by using features automatically extracted from the image.
Paper 240: Domain Adaptation for Visual Applications
Machine learning applications rely in general on a large amount of hand labelled examples. However labelling is expensive and time consuming due to the significant amount of human efforts involved. Domain adaptation addresses the problem of leveraging labelled data in one or more related domains, often referred as source domains, when learning a classifier for unseen data in a target domain. Adaptation across domains is a challenging task for many real applications including NLP tasks, spam filtering, speech recognition and various visual applications. In this talk after a brief overview of different types of domain adaptation methods, I will focus mainly on a several visual scenarios and give a more detailed view of a few recent methods.
Paper 241: The ICAF Project : Image CApture of the Future
One of the primary objectives of the ICAF project was to achieve major advancements in image capture technology and systems to further increase automation in high-added-value production processes, state-of-the-art security systems as well as the traffic and automotive domains. Moreover, it has provided enhancements to quality of life by offering creative industries higher resolution image capture technologies and Video over Internet Protocol. ICAF has provided faster and more sensitive image sensors to allow the next generation of equipment to achieve higher accuracies and speeds. Next, ICAF has also delivered the integrated circuits for the next generation of the CoaXPress interface standard. Because machine vision applications increasingly make use of 3D, research has been made on image sensor and processing architectures for applications such as automated optical inspection in electronics manufacturing. Within ICAF, we have also performed research on 3D algorithms for entertainment and media production. The broadcast market has moved from standard definition television to 1080 line interlaced high definition television with a picture. The project has developed and demonstrated technology that achieves three times the frame rates of today, with the same picture quality per frame, by using innovative noise reduction algorithms. Another important result in this respect is single lens 3D image capture at HD resolution and the algorithms for stereo view interpolation and depth map generation. Increased data rates require adequate data compression algorithms. In ICAF, the research has been conducted on mapping new compression codecs like MVC on FPGA, for real-time operation in 3D broadcast environment.
Paper 102: Patch-based Mathematical Morphology for Image Processing, Segmentation and Classification
In this paper, a new formulation of patch-based adaptive mathematical morphology is addressed. In contrast to classical approaches, the shape of structuring elements is not modified but adaptivity is directly integrated into the definition of a patch-based complete lattice. The manifold of patches is learned with a nonlinear bijective mapping, interpreted in the form of a learned rank transformation together with an ordering of vectors. This ordering of patches relies on three steps: dictionary learning, manifold learning and out of sample extension. The performance of the approach is illustrated with innovative examples of patch-based image processing, segmentation and texture classification.
Paper 104: On Optimal Illumination for Dovid Description using Photometric Stereo
Diffractive optically variable image devices (DOVIDs) are popular security features used to protect security documents such as banknotes, ID cards, passports, etc. Nevertheless, checking authenticity of these security features on both user as well as forensic level still remains a challenging task, requiring sophisticated hardware tools and expert knowledge. Based on a photometric acquisition setup comprised of 32 illumination sources from different directions and a recently proposed descriptor capturing the illumination dependent behavior, we investigate the information content, illumination pattern shape and clustering properties of the descriptor. We studied shape and discriminative power of reduced illumination configurations for the task of discrimination applied to DOVIDs using a sample of Euro banknotes.
Paper 106: Secure Signal Processing using Fully Homomorphic Encryption
This paper investigates the problem of performing signal processing via remote execution methods while maintaining the privacy of the data. Primary focus on this problem is a situation where there are two parties; a client with data or signal that needs to be processed and a server with computational resources. Revealing the signal unencrypted causes a violation of privacy for the client. One solution to this problem is to process the data or signal while encrypted. Problems of this type have been attracting attention recently; particularly with the growing capabilities of cloud computing. We contribute to solving this type of problem by processing the signals in an encrypted form, using fully homomorphic encryption (FHE). Three additional contributions of this manuscript includes (1) extending FHE to real numbers, (2) bounding the error related to the FHE process against the unencrypted variation of the process, and (3) increasing the practicality of FHE as a tool by using graphical processing units (GPU). We demonstrate our contributions by applying these ideas to two classical problems: natural logarithm calculation and signal processing (brightness/contrast filter).
Paper 109: Unsupervised Salient Object Matting
In this paper, we present a new, easy-to-generate method that is capable of precisely matting salient objects in a large-scale image set in an unsupervised way. Our method extracts only salient object without any user-specified constraints or a manual-thresholding of the saliency-map, which are essentially required in the image matting or saliency-map based segmentation, respectively. In order to provide a more balanced visual saliency as a response to both local features and global contrast, we propose a new, coupled saliency-map based on a linearly combined conspicuity map. Also, we introduce an adaptive tri-map as a refined segmented image of the coupled saliency-map for a more precise object extraction. The proposed method improves the segmentation performance, compared to image matting based on two existing saliency detection measures. Numerical experiments and visual comparisons with large-scale real image set confirm the useful behavior of the proposed method.
Paper 110: On Blind Source Camera Identification
An interesting and challenging problem in digital image forensics is the identification of the device used to acquire an image. Although the source imaging device can be retrieved exploiting the file's header (e.g., EXIF), this information can be easily tampered. This lead to the necessity of blind techniques to infer the acquisition device, by processing the content of a given image. Recent studies are concentrated on exploiting sensor pattern noise, or extracting a signature from the set of pictures. In this paper we compare two popular algorithms for the blind camera identification. The first approach extracts a fingerprint from a training set of images, by exploiting the camera sensor's defects. The second one is based on image features extraction and it assumes that images can be affected by color processing and transformations operated by the camera prior to the storage. For the comparison we used two representative dataset of images acquired, using consumer and mobile cameras respectively. Considering both type of cameras this study is useful to understand whether the theories designed for classic consumer cameras maintain their performances on mobile domain.
Paper 111: Bayesian Fusion of Back Projected Probabilities (BFBP): Co-occurrence Descriptors for Tracking in Complex Environments
Among the multitude of probabilistic tracking techniques, the Continuously Adaptive Mean Shift (CAMSHIFT) algorithm has been one of the most popular. Though several modifications have been proposed to the original formulation of CAMSHIFT, limitations still exist. In particular the algorithm underperforms when tracking textured and patterned objects. In this paper we generalize CAMSHIFT for the purposes of tracking such objects in non-stationary backgrounds. Our extension introduces a novel object modeling technique, while retaining a probabilistic back projection stage similar to the original CAMSHIFT algorithm, but with considerably more discriminative power. The object modeling now evolves beyond a single probability distribution to a more generalized joint density function on localized color patterns. In our framework, multiple co- occurrence density functions are estimated using information from several color channel combinations and these distributions are combined using an intuitive Bayesian approach. We validate our approach on several aerial tracking scenarios and demonstrate its improved performance over the original CAMSHIFT algorithm and one of its most successful variants.
Paper 114: Single Image Visual Obstacle Avoidance for Low Power Mobile Sensing
In this paper we present a method for low computational complexity single image based obstacle detection and avoidance, with applicability on low power devices and sensors. The method is built on a novel application of single image relative focus map estimation, using localized blind deconvolution, for classifying image regions. For evaluation we use the MSRA datasets and show the method's practical usability by implementation on smartphones.
Paper 116: Analysis of Hvs-Metrics’ Properties Using Color Image Database Tid2013
Various full-reference (FR) image quality metrics (indices) that take into account peculiarities of human vision system (HVS) have been proposed during last decade. Most of them have been already tested on several image databases including TID2013, a recently proposed database of distorted color images. Metrics performance is usually characterized by the rank order correlation coefficients of the considered metric and a mean opinion score (MOS). In this paper, we characterize HVS-metrics from another practically important viewpoint. We determine and analyze image statistics such as mean and standard deviation for several state of the art quality metrics on classes of images with multiple or particular types of distortions. This allows setting threshold value(s) for a given metric and application.
Paper 117: Content-Fragile Commutative Watermarking-Encryption Based on Pixel Entropy
Content-fragile commutative watermarking-encryption requires that both the content-fragile image signature and the watermarking process are invariant under encryption. The pixel entropy, being dependent on first-order image statistics only, is invariant under permutations. In the present paper we embed semi-fragile signatures based on pixel entropy by using a histogram-based watermarking algorithm, which is also invariant to permutations. We also show how the problem of collisions, i.e. different images having the same signature, can be overcome in this approach, if embedder and encryptor share a common secret.
Paper 120: Edge Detection Method Based on Signal Subspace Dimension for Hyperspectral Images
One of the objectives of image processing is to detect the region of interest (ROI) in the given application, and then perform characterization and classification of these regions. In HyperSpectral Images (HSI) the detection of targets in an image is of great interest for several applications. Generally, when ROI containing targets is previously selected, the detection results are better. In this paper, we propose to select the ROI with a new edge detection method for large HSI containing objects with large and small sizes, based on tensorial modeling, and an estimation of local rank variations.
Paper 121: RSD-DOG : A New Image Descriptor based on Second Order Derivatives
This paper introduces the new and powerful image patch descriptor based on second order image statistics/derivatives. Here, the image patch is treated as a 3D surface with intensity being the 3rd dimension. The considered 3D surface has a rich set of second order features/statistics such as ridges, valleys, cliffs and so on, that can be easily captured by using the difference of rotating semi Gaussian filters. The originality of this method is based on successfully combining the response of the directional filters with that of the Difference of Gaussian (DOG) approach. The obtained descriptor shows a good discriminative power when dealing with the variations in illumination, scale, rotation, blur, viewpoint and compression. The experiments on image matching, demonstrates the advantage of the obtained descriptor when compared to its first order counterparts such as SIFT, DAISY, GLOH, GIST and LIDRIC.
Paper 123: Improving Kinect-Skeleton Estimation
Capturing human movement activities through various sensor technologies is becoming more and more important in entertainment, film industry, military, healthcare or sports. The Microsoft Kinect is an example of low-cost capturing technology that enables to digitize human movement into a 3D motion representation. However, the accuracy of this representation is often underestimated which results in decreasing effectiveness of Kinect applications. In this paper, we propose advanced post-processing methods to improve the accuracy of the Kinect skeleton estimation. By evaluating these methods on real-life data we decrease the error in accuracy of measured lengths of bones more than two times.
Paper 124: Spatiotemporal Integration of Optical Flow Vectors for Micro-Expression Detection
Micro-expressions are brief involuntary facial expressions. Detecting micro-expressions consists of finding the occurrence of micro-expressions in video sequences by locating the onset, peak and offset frames. This paper proposes an algorithm to detect micro-expressions by utilizing the motion features to capture direction continuity. It computes the resultant optical flow vector for small local spatial regions and integrates them in local spatiotemporal regions. It uses heuristics to filter non-micro expressions and find the appropriate onset and offset times. Promising results are obtained on a challenging spontaneous micro-expression database. The main contribution of this paper is to find not only the peak but also the onset and offset frames for spotted micro- expressions which has not been explored before.
Paper 126: BNRFBE Method for Blur Estimation in Document Images
The efficiency of document image processing techniques depends on image quality that is impaired by many sources of degradation. These sources can be in document itself or arise from the acquisition process. In this paper, we are concerned with blur degradation without any prior knowledge on the blur origin. We propose to evaluate the blur parameter at local level on predefined zones without relying on any blur model. This parameter is linked to a fuzzy statistical analysis of the textual part of the document extracted in the initial image. The proposed measure is evaluated on DIQA database where the correlation between blur degree and OCR accuracy is computed. The results show that our blur estimation can help to predict OCR accuracy.
Paper 127: A PNU-based Technique to Detect Forged Regions in Digital Images
In this paper we propose a non blind passive technique for image forgery detection. Our technique is a variant of a method presented in  and it is based on the analysis of the Sensor Pattern Noise (SPN). Its main features are the ability to detect small forged regions and to run in an automatic way. Our technique works by extracting the SPN from the image under scrutiny and, then, by correlating it with the reference SPN of a target camera. The two noises are partitioned into non overlapping blocks before evaluating their correlation. Then, a set of operators is applied on the resulting Correlations Map to highlight forged regions and remove noise spikes. The result is processed using a multi level segmentation algorithm to determine which blocks should be considered forged. We analyzed the performance of our technique by using a dataset of 4,000 images.
Paper 128: Improvement of a Wavelet-Tensor Denoising Algorithm by Automatic Rank Estimation
This paper focuses on the denoising of multidimensional data by a tensor subspace-based method. In a seminal work, multiway Wiener filtering was developed to minimize the mean square error between an expected signal tensor and the estimated tensor. It was then placed in a wavelet framework.
The reliable estimation of the subspace rank for each mode and wavelet decomposition level is still pending. For the first time in this paper, we aim at estimating the subspace ranks for all modes of the tensor data by minimizing a least squares criterion. To solve this problem, we adapt particle swarm optimization. An application involving an RGB image and hyperspectral images exemplifies our method: we compare the results obtained in terms of signal to noise ratio with a slice-by-slice ForWaRD denoising.
Paper 129: Cosine-Sine Modulated Filter Banks for Motion Estimation and Correction
We present a new motion estimation algorithm that uses cosine-sine modulated filter banks to form complex modulated filter banks. The motion estimation is based on phase differences between a template and the reference image. By using a non-downsampled version of the cosine-sine modulated filter bank, our algorithm is able to shift the template image over the reference image in the transform domain by only changing the phases of the template image based on a given motion field. We also show that we can correct small non- rigid motions by directly using the phase difference between the reference and the template images in the transform domain. We also include a first application in magnetic resonance imaging, where the Fourier space is corrupted by motion and we use the phase difference method to correct small motion. This indicates the magnitude invariance for small motions.
Paper 130: Motion Compensation based on Robust Global Motion Estimation: Experiments and Applications
A robust and general method for image alignment is proposed in this paper. The industrial constraints are the possible large and irregular camera motion, some possible occlusions or moving objects in the images and some blur or motion blur. Images are taken from an Unmanned Aerial Vehicle or a long-range camera. Given this context, a similarity transformation is estimated. An hybrid algorithm is proposed, implemented in a pyramidal way, and combining direct and feature-based approaches. Some detailed experiments in this paper show the robustness and efficiency of the proposed algorithm. Results of some applications of this method are given, like image stabilisation, image mosaicing and road surveillance.
Paper 131: Age and Gender Characterization through a Two Layer Clustering of Online Handwriting
Age characterization through handwriting is an important research field with several potential applications. It can, for instance, characterize normal aging process on one hand and detect significant handwriting degradation possibly related to early pathological states. In this work, we propose a novel approach to characterize age and gender from online handwriting styles. Contrary to previous works on handwriting style characterization, our contribution consists of a two-layer clustering scheme. At the first layer, we perform a writer-independent clustering on handwritten words, described by global features. At the second layer, we perform a clustering that considers style variation at the previous level for each writer, to provide a measure of his/her handwriting stability across words. We investigated different clustering algorithms and their effectiveness for each layer. The handwriting style patterns inferred by our novel technique show interesting correlations between handwriting, age and gender.
Paper 132: A Predictive Model for Human Activity Recognition by Observing Actions and Context
This paper presents a novel model to estimate human activities. a human activity is defined by a set of human actions.The proposed approach is based on the usage of Recurrent Neural Networks (RNN) and Bayesian inference through the continuous monitoring of human actions and its surrounding environment. In the current work human activities are inferred considering not only visual analysis but also additional resources; external sources of information, such as context information, are incorporated to contribute to the activity estimation. The novelty of the proposed approach lies in the way the information is encoded, so that it can be later associated according to a predefined semantic structure. Hence, a pattern representing a given activity can be defined by a set of actions, plus contextual information or other kind of information that could be relevant to describe the activity. Experimental results with real data are provided showing the validity of the proposed approach.
Paper 133: A Task-Driven Eye Tracking Dataset for Visual Attention Analysis
To facilitate the research in visual attention analysis, we design and establish a new task-driven eye tracking dataset of 47 subjects. Inspired by psychological findings that human visual behavior is tightly dependent on the executed tasks, we carefully design specific tasks in accordance with the contents of 111 images covering various semantic categories, such as text, facial expression, texture, pose, and gaze. It results in a dataset of 111 fixation density maps and over 5,000 scanpaths. Moreover, we provide baseline results of thirteen state-of-the-art saliency models. Furthermore, we hold discussions on important clues on how tasks and image contents influence human visual behavior. This task-driven eye tracking dataset with the fixation density maps and scanpaths will be made publicly available.
Paper 134: Visual Localisation from Structureless Rigid Models
Visual rigid localisation algorithms can be described by their model/sensor input couple, where model and input can either be 2-D or 3-D sets of points. While Perspective-N-Point (PnP) solvers directly solve the 3-D/2-D case, to the best of our knowledge there is no localisation method to directly solve the 2-D/3-D case. This work proposes to handle the 2-D/3-D case by expressing it as two successive PnP problems which can be dealt with using classical solvers. Results suggest the overall method has comparable or better precision and robustness than state of the art PnP solvers. The approach is demonstrated on an object localisation application.
Paper 135: Full-Body Human Pose Estimation by Combining Geodesic Distances and 3D-Point Cloud Registration
In this work, we address the problem of recovering the 3D full-body human pose from depth images. A graph- based representation of the 3D point cloud data is determined which allows for the measurement of pose-independent geodesic distances on the surface of the human body. We extend pre- vious approaches based on geodesic distances by extracting geodesic paths to multiple surface points which are obtained by adapting a 3D torso model to the point cloud data. This enables us to distinguish between the dierent body parts - without having to make prior assumptions about their loca- tions. Subsequently, a kinematic skeleton model is adapted. Our method does not need any pre-trained pose classiers and can therefore estimate arbitrary poses.
Paper 137: Collaborative, Context Based Activity Control Method for Camera Networks
In this paper, a collaborative method for activity control of a network of cameras is presented. The method adjusts the activation level of all nodes in the network according to the observed scene activity, so that no vital information is missed, and the rate of communication and power consumption can be reduced. The proposed method is very flexible as an arbitrary number of activity levels can be defined, and it is easily adapted to the performed task. The method can be used either as a standalone solution, or integrated with other algorithms, due to its relatively low computational cost. The results of preliminary small scale test confirm its correct operation.
Paper 138: Ringing Artifact Suppression using Sparse Representation
The article refers to the problem of ringing artifact suppression. The ringing effect is caused by high-frequency information corruption or loss, it appears as waves or oscillations near strong edges. We propose a novel method for ringing artifact suppression after Fourier cut-off filtering. It can be also used for image deringing in the case of image resampling and other applications where the frequency loss can be estimated. The method is based on the joint sparse coding approach. The proposed method preserves more small image details than the state-of-the-art algorithms based on total variation minimization, and outperforms them in terms of image quality metrics.
Paper 140: Distance-based Descriptors for Pedestrian Detection
In this paper, we propose an improvement of the detection approach that is based on the distance function. In the method, the distance values are computed inside the image to describe the properties of objects. The appropriately chosen distance values are used in the feature vector that is utilized as an input for the SVM classifier. The key challenge is to find the right way in which the distance values should be used to describe the appearance of objects effectively. The basic version of this method was proposed to solve the face detection problem. As we observed from the experiments, the method in the basic form is not suitable for pedestrian detection. Therefore, the goal of this paper is to improve this method, and create the pedestrian detector that outperforms the state-of-the-art detectors. The experiments show that the proposed improvement overcomes the accuracy of the basic version by approximately 10%.
Paper 141: Improved Region-Based Kalman Filter (IRKF) for Tracking Body Joints and Evaluating Gait in Surveillance Videos
We propose an Improved Region-based Kalman Filter to estimate fine precise body joint trajectories to facilitate gait analysis from low resolution surveillance cameras. This is important because existing pose estimation and tracking techniques obtain noisy and discrete trajectories which are insufficient for gait analysis. Our objective is to obtain a close approximation to the true sinusoidal/non-linear transition of the body joint locations between consecutive time instants. The proposed filter models the non-linear transitions along the sinusoidal trajectory, and incorporates a refining technique to determine the fine precision estimates of the body joint location using prior information from the individual's rough pose. The proposed technique is evaluated on an outdoor low-resolution gait dataset categorized by individuals wearing a weighted vest (simulating a threat) or no weighted vest. Experimental results and comparisons with similar representative methods prove the accuracy and precision of the proposed filter for fine-precision body joint tracking. With respect to analyzing gait for threat identification, the proposed scheme exhibits better accuracy than state of the art pose discrete estimates.
Paper 145: Towards a Bayesian Video Denoising Method
The quality provided by image and video sensors increases steadily, and for a fixed spatial resolution the sensor noise has been gradually reduced over the years. However, modern sensors are also capable of acquiring at higher spatial resolutions which are still affected by noise, specially under low lighting conditions. The situation is even worse in video cameras, where the capture time is bounded by the frame rate. The noise in the video degrades its visual quality and hinders its analysis. In this paper we present a new video denoising method extending the non-local Bayes image denoising algorithm. The method does not require motion estimation, and yet preliminary results show that it compares favourably with the state-of-the-art methods in terms of PSNR.
Paper 148: Color Image Quality Assessment based on Gradient Similarity and Distance Transform
In this paper, a new full-reference image quality assessment (IQA) metric is proposed. It is based on a Distance transform (DT) and a gradient similarity. The gradient images are sensitive to image distortions. Consequently, investigations have been carried out using the global variation of the gradient and the image skeleton for computing an overall image quality prediction. First, color image is transformed to YIQ space. Secondly, the gradient images and DT are identified from Y component. Thirdly, color distortion is computed from I and Q components. Then, the maximum DT similarity of the reference and test images is defined. Finally, combining the previous metrics the difference between test and references images is derived. The obtained results have shown the efficiency of the suggested measure.
Paper 149: Image Recognition in UAV Application Based on Texture Analysis
In this paper we propose a simple and efficient method of image classification in UAV monitoring application. Taking into consideration the color distribution two types of texture feature are considered: statistical and fractal characteristics. In the learning phase four different and efficient features were selected: energy, correlation, mean intensity and lacunarity on different color channel (R, G and B). Also, four classes of aerial images were considered (forest, buildings, grassland and flooding zone). The method of comparison, based on sub-images, average and the Minkovski distance, improves the performance of the texture-based classification. A set of 100 aerial images from UAV was tested for establishing the rate of correct classification.
Paper 150: Edge Width Estimation for Defocus Map from a Single Image
The paper presents a new edge width estimation method based on Gaussian edge model and unsharp mask analysis. The proposed method is accurate and robust to noise. Its effectiveness is demonstrated by its application for the problem of defocus map estimation from a single image. Sparse defocus map is constructed using edge detection algorithm followed by the proposed edge width estimation algorithm. Then full defocus map is obtained by propagating the blur amount at edge locations to the entire image. Experimental results show the effectiveness of the proposed method in providing a reliable estimation of the defocus map.
Paper 151: Unified System for Visual Speech Recognition and Speaker Identification
This paper proposes a unified system for both visual speech recognition and speaker identification. The proposed system can handle image and depth data if they are available. The proposed system consists of four consecutive steps, namely, 3D face pose tracking, mouth region extraction, features computing, and classification using the Support Vector Machine method. The system is experimentally evaluated on three public datasets, namely, MIRACL-VC1, OuluVS, and CUAVE. In one hand, the visual speech recognition module achieves up to 96 % and 79.2 % for speaker dependent and speaker independent settings, respectively. On the other hand, speaker identification performs up to 98.9 % of recognition rate. Additionally, the obtained results demonstrate the importance of the depth data to resolve the subject dependency issue.
Paper 152: Human Machine Interaction via Visual Speech Spotting
In this paper, we propose an automatic visual speech spotting system adapted for RGB-D cameras and based on Hidden Markov Models (HMMs). Our system is based on two main processing blocks, namely, visual feature extraction and speech spotting and recognition. In feature extraction step, the speaker's face pose is estimated using a 3D face model including a rectangular 3D mouth patch used to precisely extract the mouth region. Then, spatio-temporal features are computed on the extracted mouth region. In the second stage, the speech video is segmented by finding the start and the end points of meaningful utterances and recognized by Viterbi algorithm. The proposed system is mainly evaluated on the MIRACL-VC1 dataset. Experimental results demonstrate that the proposed system can segment and recognize key utterances with a recognition rates of 83 % and a reliability of 81.4 %.
Paper 154: Minimizing the Impact of Signal-Dependent Noise on Hyperspectral Target Detection
Multilinear algebra based method for noise reduction in hyperspectral images (HSI) is proposed to minimize negative impacts on target detection of signal-dependent noise. A parametric model, suitable for HSIs that the photon noise is dominant compared to the electronic noise contribution, is used to describe the noise. To diminish the data noise from hyperspectral images distorted by both signal-dependent (SD) and signal-independent (SI) noise, a tensorial method, which reduces noise by exploiting the different statistical properties of those two types of noise, is proposed in this paper. This method uses a parallel factor analysis (PARAFAC) decomposition to remove jointly SI and SD noises. The performances of the proposed method are assessed on simulated HSIs. The results on the real-world airborne hyperspectral image HYDICE (Hyperspectral Digital Imagery Collection Experiment) are also presented and analyzed. These experiments have demonstrated the benefits arising from using the pre-whitening procedure in mitigating the impact of the SD in different detection algorithms for hyperspectral images.
Paper 155: A Unified Camera Calibration from Arbitrary Parallelograms and Parallepipeds
This paper presents a novel approach to calibrate cameras that can use geometric information of parallelograms and parallepipeds simultaneously. The proposed method is a factorization based approach solving the problem linearly by decomposing a measurement matrix into parameters of cameras, parallelograms and parallepipeds. Since the two kinds of geometric constraints can complement each other in general man-made environment, the proposed method can obtain more stable estimation results than the previous approaches that can use geometric constraints only either of parallelograms or of parallelepipeds. The results of the experiments with real outdoor images are presented to demonstrate the feasibility of the proposed method.
Paper 157: Multi-Distinctive MSER Features and their Descriptors: A Low-complexity Tool for Image Matching
The paper proposes multi-distinctive MSER features (md-MSER) which are MSER keypoints combined with a number of encompassed keypoints of another type, which should also be affine-invariants (e.g. Harris-Affine keypoints) to maintain the invariance of the proposed method. Such a bundle of keypoints is jointly represented by the corresponding number of SIFT-based descriptors which characterize both visual and spatial properties of md-MSERs. Therefore, matches between individual md-MSER features indicate both visual and configurational similarities so that true feature correspondences can be established (at least in some applications) without the verification of spatial consistencies (i.e. the computational costs of detecting contents visually similar in a wider context are significantly reduced). The paper briefly presents the principles of building and representing md-MSER features. Then, performances of md-MSER-based algorithms are experimentally evaluated in two benchmark scenarios of image matching and retrieval. In particular, md-MSER algorithms are compared to typical alternatives based on other affine-invariant keypoints.
Paper 158: Two-Stage filtering scheme for Sparse Representation based Interest Point Matching for Person Re-identification
The objective of this paper is to study Interest Points (IP) filtering in video-based human re-identification tasks. The problem is that having a large number of IPs to describe person, Re-identification grows into a much time consuming task and IPs become redundant. In this context, we propose a Two-Stage filtering step. The first stage reduces the number of IP to be matched and the second ignores weak matched IPs participating in the re-identification decision. The proposed approach is based on the supervision of SVM, learned on training dataset. Our approach is evaluated on the dataset PRID-2011 and results show that it is fast and compare favorably with the state of the art.
Paper 160: Optical Sensor Tracking and 3D-Reconstruction of Hydrogen-Induced Cracking
This paper presents an approach, which combines a stereo-camera unit and ultrasonic sensors to reconstruct hydrogen-induced crack (HIC) three-dimensional locations. The sensor probes are tracked in the images and their 3D position is triangulated. The combination with the ultrasonic measurement leads to a determination of the crack position in the material. To detect varying crack characteristics, different probes have to be used. Their measurements are combined in a common coordinate system. To evaluate the method, a milled reference block was examined and the results are compared to the ground-truth of the block model.
Paper 161: Fast and Robust Variational Optical Flow for High-Resolution Images using SLIC Superpixels
We show how pixel-based methods can be applied to a sparse image representation resulting from a superpixel segmentation. On this sparse image representation we only estimate a single motion vector per superpixel, without working on the full-resolution image. This allows the accelerated processing of high-resolution content with existing methods.
The use of superpixels in optical flow estimation was studied before, but existing methods typically estimate a dense optical flow field -- one motion vector per pixel -- using the full-resolution input, which can be slow. Our novel approach offers important speed-ups compared to dense pixel-based methods, without significant loss of accuracy.
Paper 162: Embedded System Implementation for Vehicle Around View Monitoring
Traffic safety has become a priority in recent years, and therefore, the field of intelligent transportation surveillance systems has become a major field of research. Among vehicle surveillance systems, the 360 degree around view monitor (AVM) system is regarded as the development direction recently. In this paper, an approach to constructing a 360 degree bird's-eye-view around view monitor system is proposed; the approach involves rectifying four fisheye cameras and stitching together the four calibrated images obtained from the cameras into one surrounding view image on a low-cost and high portability Android embedded system. To improve the computation performance, the aforementioned procedures are combined into a single step construction mapping using table lookup mechanism and multithreading technique. Through hardware implementation and experiments evaluation, the proposed AVM system performs satisfactorily with surrounding view video output frame rate 12 fps and the average matching error is as low as 2.89 pixel.
Paper 164: Exploring Protected Nature through Multimodal Navigation of Multimedia Contents
We present a framework useful to explore naturalistic environments in a multimodal way. The multimedia information related to the different natural scenarios can be explored by the user in his home desktop through virtual tours from a web based interface, as well as from a dedicated mobile app during an on site tour of the considered natural reserves. A wearable station useful to a guide to broadcast multimedia content to the users' smartphones and tablet to better explain the naturalistic places has been developed as part of the framework. As pilot study, the framework has been employed in three different naturalistic reserves covering epigeal, hypogeum, and marine ecosystems.
Paper 166: Soft Biometrics by Modeling Temporal Series of Gaze Cues Extracted in the Wild
Soft biometric systems have spread among recent years, both for powering classical biometrics, as well as stand alone solutions with several application scopes ranging from digital signage to human-robot interaction. Among all, in the recent years emerged the possibility to consider as a soft biometrics also the temporal evolution of the human gaze and some recent works in the literature explored this exciting research line by using expensive and (perhaps) unsafe devices which, moreover, require user cooperation to be calibrated. By our knowledge the use of a low-cost, non-invasive, safe and calibration-free gaze estimator to get soft-biometrics data has not been investigated yet. This paper fills this gap by analyzing the soft-biometrics performances obtained by modeling the series of gaze estimated by exploiting the combination of head poses and eyes' pupil locations on data acquired by an off-the-shelf RGB-D device
Paper 167: Sphere-Tree Semi-Regular Remesher
Surface meshes have become widely used since they are frequently adopted in many computer graphic applications. These meshes are often generated by isosurface representations or scanning devices. Unfortunately, these meshes are often dense and full of redundant vertices and irregular sampling. These defects make meshes not capable to support multiple applications; such as display, compression and transmission. To solve these problems and reduce the com-plexity, the mesh quality (connectivity regularity) must be ameliorated. Its improvement is called re-meshing. This paper presents a novel re-meshing approach based on Sphere-Tree construction. First, we approximate the original object with a dictionary of multi-dimensional geometric shapes (spheres) called Sphere-Tree which is, then, re-meshed. Finally, we use a refinement step to avoid artifacts and produce a new semi-regular mesh.
Paper 168: Multiple Description Coding for Multi-view Video
In this paper, a novel multiple description coding (MDC) scheme for multi-view video coding (MVC) is proposed that can deliver higher coding efficiency. This is achieved by separating one description into two subsequences and directly using the modes and prediction vectors computed during the encoding of one subsequence to the encoding process of the other subsequence from the same description. Such reuse strategy is made possible due to high correlation existing between the two subsequences that were generated from the input multi- view video sequence through the spatial polyphase subsampling and ‘cross- interleaved’ sample grouping. Due to the data reuse, the required representation bits for storage and transmission are greatly saved. Extensive simulation results experimented on the JMVC codec platform have shown that the performance of the proposed MDC scheme outperforms several state-of-the-art MDC methods for stereoscopic video and multi-view video.
Paper 169: Cascaded Regressions of Learning Features for Face Alignment
Face alignment is a fundamental problem in computer vision to localize the landmarks of eyes, nose or mouth in 2D images. In this paper, our method for face alignment integrates three aspects not seen in previous approaches: First, learning local descriptors using Restricted Boltzmann Machine (RBM) to model the local appearance of each facial points independently. Second, proposing the coarse-to-fine regression to localize the landmarks after the estimation of the shape configuration via global regression. Third, and using synthetic data as training data to enable our approach to work better with the profile view, and to forego the need of increasing the number of annotations for training. Our results on challenging datasets compare favorably with state of the art results. The combination with the synthetic data allows our method yielding good results in profile alignment. That highlights the potential of using synthetic data for in-the-wild face alignment.
Paper 171: Adaptive Scale Selection for Multiscale Image Denoising
Adaptive transforms are required for better signal analysis and processing. Key issue in finding the optimal expansion basis for a given signal is the representation of signal information with very few elements of the basis. In this context a key role is played by the multiscale transforms that allow signal representation at different resolutions. This paper presents a method for building a multiscale transform with adaptive scale dilation factors. The aim is to promote sparsity and adaptiveness both in time and scale. To this aim interscale relationships of wavelet coefficients are used for the selection of those scales that measure significant changes in signal information. Then, a wavelet transform with variable dilation factor is defined accounting for the selected scales and the properties of coprime numbers. Preliminary experimental results in image denoising by Wiener filtering show that the adaptive multiscale transform is able to provide better reconstruction quality with a minimum number of scales and comparable computational effort with the classical dyadic transform.
Paper 173: Online Face Recognition System based on Local Binary Patterns and Facial Landmark Tracking
This paper presents a system for real-time face recognition. The system learns and recognizes faces from a video on the fly and it doesn’t need already trained database. The system consists of the following sub-methods: face detection and tracking, face alignment, key frame selection, face description and face matching. The system detects face tracks from a video, which are used in learning and recognition. Facial landmark tracking is utilized to detect changes in facial pose and expression in order to select key frames from a face track. Faces in key frames are represented using local binary patterns (LBP) histograms. These histograms are stored into the database. Nearest neighbor classifier is used in face matching. The system achieved recognition rate of 98.6% in offline test and 95.9% in online test.
Paper 174: Depth-based Filtration for Tracking Boost
This paper presents a novel depth information utilization method for tracking performance boosting of traditional RGB trackers for arbitrary objects (objects not known in advance) by object segmentation/separation supported by depth information. The main focus is on real-time applications where exploitation of depth sensors, nowadays affordable, is not only possible but also feasible, such as robotics or surveillance. The aim is to show that the depth information used for target segmentation significantly helps reducing incorrect model updates caused by occlusion or drifts and improves success rate and precision of traditional RGB tracker while keeping comparably efficient and thus possibly real-time. The paper also presents and discusses the achieved performance results.
Paper 175: Automatic Detection of Social Groups in Pedestrian Crowds
We present a novel approach for automatic detection of social groups of pedestrians in crowds. Instead of computing pairwise similarity between pedestrian trajectories, followed by clustering of similar pedestrian trajectories into groups, we cluster pedestrians into a groups which have similar distributions across source and sink locations in the scene. The paper presents the proposed approach and its evaluation using different state-of-the-art datasets: experimental results demonstrate its effectiveness achieving significant accuracy both under dichotomous and trichotomous coding schemes. Experimental results also show that our approach is less computationally expensive than the current state-of-the-art methods.
Paper 178: Fast and Low Power Consumption Outliers Removal for Motion Vector Estimation
When in a pipeline a robust global motion estimation is needed, RANSAC algorithm is the usual choice. Unfortunately, since RANSAC is an iterative method based on random analysis, it is not suitable for real-time processing. This paper presents an outlier removal algorithm, which reaches a robust estimation (at least equal to RANSAC) with really low power consumption and can be employed for embedded time implementation.
Paper 179: Plane Extraction For Indoor Scene Recognition
In this paper, we present an image based plane extraction method well suited for real-time operations. Our approach exploits the assumption that the surrounding scene is mainly composed by planes disposed in known directions. Planes are detected from a single image exploiting a voting scheme that takes into account the vanishing lines. Then, candidate planes are validated and merged using a region growing based approach to detect in real-time planes inside an unknown indoor environment. Using the related plane homographies it is possible to remove the perspective distortion, enabling standard place recognition algorithms to work in an invariant point of view set up. Quantitative experiments performed with real world images from a publicly available data set show the effectiveness of our approach compared with a very popular method.
Paper 182: Dictionary-Based Compact Data Representation for Very High Resolution Earth Observation Image Classification
In the context of fast growing data archives, with continuous changes in volume and diversity, information mining has proven to be a difficult, yet highly recommended task. The first and perhaps the most important part of the process is data representation for efficient and reliable image classification. This paper is presenting a new approach for describing the content of Earth Observation Very High Resolution images, by comparison with traditional representations based on specific features. The benefit of data compression is exploited in order to express the scene content in terms of dictionaries. The image is represented as a distribution of recurrent patterns, removing redundant information, but keeping all the explicit features, like spectral, texture and context. Further, a data domain analysis is performed using Support Vector Machine aiming to compare the influence of data representation to semantic scene annotation. WorldView2 data and a reference map are used for algorithm evaluation.
Paper 183: A Trust Region Optimization Method for Fast 3D Spherical Configuration in Morphing Processes
This paper addressed the problem of Spherical Mesh parameterization the main contribution of this work was to propose an effective optimization scheme to compute such parameterization, and to have an algorithm exposing a property of global convergence This is the case of trust region spherical parameterization (TRSP) to minimizing the ratio of inverted triangle, have an efficient spherical parameterization, and to generate bijective and lowly distorted mapping results so the faces have the correct orientation, thus creating a 3d spherical geometry object. Simulation results show that it is possible to achieve a considerable correspondence between the angle and area perspective distortion.
Paper 185: Image Analysis and Microscopy in Food Science: Computer Vision and Visual Inspection
Rheological properties of food products are strongly related to their microstructure. Microscopy is thus a preferred tool in food research. In food science, microscopy has long been used for visual inspection. Recently, however, quantitative analysis has become the new trend. In spite of this, only a few experts in computer vision are actively involved into image analysis projects, applied to food microscopy. Microscopists tend to use simple tools, without bothering whether they are appropriate for their application. As a consequence, most published work in food science journals lacks scientific rigour, when it comes to analysing images. On the other hand, image analysis experts tend to undervalue microscopists' needs and opinions, which can be surprisingly different from what most people in the computer vision community might think. Drawing upon our experience, we try to highlight microscopists' perspective on image segmentation and, at the same time, show a few examples of collaborative projects that compute interesting measures for the food science community, that do not rely on segmentation accuracy.
Paper 186: ROS-based SLAM for a Gazebo-Simulated Mobile Robot in image-Based 3D Model of Indoor Environment
Nowadays robot simulators have robust physics engines, high-quality graphics, and convenient interfaces, affording researchers to substitute physical systems with their simulation models in order to pre-estimate the performance of theoretical findings before applying them to real robots. This paper describes Gazebo simulation approach to simultaneous localization and mapping (SLAM) based on Robot Operating System (ROS) using PR2 robot. The ROS-based SLAM approach applies Rao-Blackwellized particle filters and laser data to locate the PR2 robot in unknown environment and build a map. The real room 3D model was obtained from camera shots and reconstructed with Autodesk 123D Catch and MeshLab software. The results demonstrate the fidelity of the simulated 3D room to the obtained from the robot laser system ROS-calculated map and the feasibility of ROS-based SLAM with a Gazebo-simulated mobile robot to its usage in camera-based 3D environment. This approach will be further extended to ROS-based robotic simulations in Gazebo with a Russian anthropomorphic robot AR-601M.
Paper 188: Spatio-Temporal Object Recognition
Object recognition in video is in most cases solved by extracting keyframes from the video and then applying still image recognition methods on these keyframes only. This procedure largely ignores the temporal dimension. Nevertheless, the way an object moves may hold valuable information on its class. Therefore, in this work, we analyze the effectiveness of different motion descriptors, originally developed for action recognition, in the context of action-invariant object recognition. We conclude that a higher classification accuracy can be obtained when motion descriptors (specifically, HOG and MBH around trajectories) are used in combination with standard static descriptors extracted from keyframes. Since currently no suitable dataset for this problem exists, we introduce two new datasets and make them publicly available.
Paper 194: A Graph Based People Silhouette Segmentation using Combined Probabilities Extracted from Appearance, Shape template Prior, and Color Distributions
In this paper, we present an approach for the segmentation of people silhouettes in images. Since in real-world images estimating pixel probabilities to belong to people or background is difficult, we propose to optimally combine several ones. A local window classifier based on SVMs with Histograms of Oriented Gradients features estimates probabilities from pixels' appearance. A shape template prior is also computed over a set of training images. From these two probability maps, color distributions relying on color histograms and Gaussian Mixture Models are estimated and the associated probability maps are derived. All these probability maps are optimally combined into a single one with weighting coefficients determined by a genetic algorithm. This final probability map is used within a graph-cut to extract accurately the silhouette. Experimental results are provided on both the INRIA Static Person Dataset and BOSS European project and show the benefit of the approach.
Paper 195: Buffering Hierarchical Representation of Color Video Streams for Interactive Object Selection
Interactive video editing and analysis has a broad impact but it is still a very challenging task. Real- time video segmentation requires carefully defining how to represent the image content, and hierarchical models have shown their ability to provide efficient ways to access color image data. Furthermore, algorithms allowing fast construction of such representations have been introduced recently. Nevertheless, these methods are most often unable to address (potentially endless) video streams, due to memory limitations. In this paper, we propose a buffering strategy to build a hierarchical representation combining color, spatial, and temporal information from a color video stream. We illustrate its relevance in the context of interactive object selection.
Paper 199: An H.264 Sensor Aided Encoder for Aerial Vdeo Sequences with in-the-Loop Metadata Enhancement
Unmanned Aerial Vehicles (UAVs) are often employed to capture high resolution images in order to perform image mosaicking and/or 3D reconstruction. Images are usually stored on-board or sent to the ground using still image or video data compression. Still image encoders are preferred when low frame rates are involved, because video coding systems are based on motion estimation and compensation algorithms which fail when the motion vectors are significantly long. The latter is the case of low frame rate videos, in which the overlapping between subsequent frames is very small.
In this scenario, UAVs attitude and position metadata from the Inertial Navigation System (INS) can be employed to estimate global motion parameters without video analysis. However, a low complexity analysis can refine the motion field estimated using only the metadata.
In this work, we propose to use this refinement step in order to improve the position and attitude estimation produced by the navigation system with the aim of maximizing the encoder performance. Experiments on both simulated and real world video sequences confirm the effectiveness of the proposed approach.
Paper 200: A Minimax Framework for Gender Classification based on small-sized datasets
Gender recognition is a topic of high interest especially in the growing field of audience measurement techniques for digital signage applications. Usually, supervised approaches are employed and they require a preliminary training phase performed on large datasets of annotated facial images that are expensive (e.g. MORPH) and, anyhow, they cannot be updated to keep track of the continuous mutation of persons' appearance due to changes of fashions and styles (e.g. hairstyles or makeup). The use of small-sized (and then updatable in a easier way) datasets is thus high desirable but, unfortunately, when few examples are used for training, the gender recognition performances dramatically decrease since the state-of-art classifiers are unable to handle, in a reliable way, the inherent data uncertainty by explicitly modeling encountered distortions. To face this drawback, in this work an innovative classification scheme for gender recognition has been introduced: its core is the Minimax approach, i.e. a smart classification framework that, including a number of existing regularized regression models, allows a robust classification even when few examples are used for training. This has been experimentally proved by comparing the proposed classification scheme with state of the art classifiers (SVM, kNN and Random Forests) under various pre-processing methods.
Paper 203: EFIC: Edge based Foreground Background Segmentation and Interior Classification for Dynamic Camera Viewpoints
Foreground background segmentation algorithms attempt to separate interesting or changing regions from the background in video sequences. Foreground detection is obviously more difficult when the camera viewpoint changes dynamically, such as when the camera undergoes a panning or tilting motion. In this paper, we propose an edge based foreground background estimation method, which can automatically detect and compensate for camera viewpoint changes. We will show that this method significantly outperforms state-of-the-art algorithms for the panning sequences in the ChangeDetection.NET 2014 dataset, while still performing well in the other categories.
Paper 204: What does one Image of One Ball Tell Us about the Focal Length?
We reanimate the (sometimes forgotten) Belgian Theorems, and show how the balls of Dandelin can give us elegant proofs for geometric properties of perspective ball projections. As a consequence, we derive a new formule for computing the focal length by means of 1 image of 1 ball. By means of simulations we show the sensitivity of our formula in function of the ratio of the major axis to the minor axis of the elliptic ball image. This provides a measure of reliability and enables us to select the most appropriate ball image.
Paper 205: Towards More Natural Social Interactions of Visually Impaired Persons
We review recent computer vision techniques with reference to the specific goal of assisting the social interactions of a person affected by very severe visual impairment or by total blindness. We consider a scenario in which a sequence of images is acquired and processed by a wearable device, and we focus on the basic tasks of detecting and recognizing people and their facial expression. We review some methodologies of Visual Domain Adaptation that could be employed to adapt existing classification strategies to the specific scenario. We also consider other sources of information that could be exploited to improve the performance of the system.
Paper 207: A Mobile Application for Braille to Black Conversion
This work aims to the production of inclusive technologies to help people affected by diseases. In particular, we present a pipeline to convert Braille documents, acquired with a mobile device, into classic printed text. The mobile application has been thought as support for assistants (e.g., in the education domain) and parents of blind and partially sighted persons (e.g., children and elderly) for the reading of Braille written documents. The software has been developed and tested thanks to the collaboration with experts in the field. Experimental results confirm the effectiveness of the proposed imaging pipeline in terms of conversion accuracy, punctuation, and final page layout.
Paper 208: Tooth Segmentation Algorithm for Age Estimation
The estimation of age of adult subjects usually is based on age-related changes in skeleton. An interesting non invasive method, recently developed, involves teeth parameters achievable from peri- apical X-ray. Specifically, this procedure estimates age of adults through the changes of the tooth due to the apposition of secondary dentine, using pulp and tooth area as fundamental parameters. Aim of this study is to define an algorithm able to detect the boundary of the interested tooth in order to apply an automatic procedure for age estimation. The algorithm is based on classical segmentation methods as thresholding and shape analysis. Furthermore, our early results obtained on a small sample, are encouraging to proceed on this path.
Paper 209: Toward A Universal Stereo Image Quality Metric Without Reference
Stereoscopic Image becomes an attractive tool in image processing area. However, such as in 2D, this kind of images can be also affected by some types of degradations. In this paper, we are interesting by the impact of some of these degradation types on the perceived quality and we propose a new framework for Stereoscopic Image Quality Metric without reference (SNR-IQM) based on a degradation identification and features fusion steps. Support Vector Machine (SVM) models have been here used. The aptitude of our method to predict the subjective judgments has been evaluated using the 3D LIVE Image Quality Dataset and compared with some recent methods considered as the state-of-the-art. The obtained experimental results show the relevance of our work
Paper 212: Robust Fusion of Trackers Using Online Drift Prediction
Visual object tracking is a standard function of computer vision that has been the source of numerous propositions. This diversity of approaches leads to the idea of trying to fuse them and take advantage of the strengths of each of them while controlling the noise they may introduce in some circumstances. The work presented here describes a generic framework for the combination of trackers, where fusion may occur at different levels of the processing chain. The fusion process is governed by the online detection of abnormal behavior either from specific features provided by each tracker, or from out of consensus detection.
The fusion of three trackers exploiting complementary designs and features is evaluated on 72 fusion schemes. Thorough experiments on 12 standard video sequences and on a new set of 13 videos addressing typical difficulties faced by vision systems used in the demanding context of driving assistance show that using fusion improves greatly the performance of each individual tracker, and reduces by a factor two the probability of drifting.
Paper 214: A Game Engine as a Generic Platform for Real-Time Previz-on-Set in Cinema Visual Effects
We present a complete framework designed for film production requiring live (pre) visualization. This framework is based on a famous game engine, Unity. Actually, game engines possess many advantages that can be directly exploited in real-time pre-vizualization, where real and virtual worlds have to be mixed. In the work presented here, all the steps are performed in Unity: from acquisition to rendering. To perform real-time compositing that takes into account occlusions that occur between real and virtual elements as well as to manage physical interactions of real characters towards virtual elements, we use a low resolution depth map sensor coupled to a high resolution film camera. The goal of our system is to give the film director's creativity a flexible and powerful tool on stage, long before post-production.
Paper 215: Time Ordering Shuffling for Improving Background Subtraction
By construction, a video is a series of ordered frames, whose order is defined at the time of the acquisition process. Background subtraction methods then take this input video and produce a series of segmentation maps expressed in terms of foreground objects and scene background. To our knowledge, this natural ordering of frames has never been questioned or challenged.
In this paper, we propose to challenge, in a prospective view, the natural ordering of video frames in the context of background subtraction, and examine alternative time orderings. The idea consists in changing the order before background subtraction is applied, by means of shuffling strategies, and re-ordering the segmentation maps afterwards. For this purpose, we propose several shuffling strategies and show that, for some background subtraction methods, results are preserved or even improved. The practical advantage of time shuffling is that it can been applied to any existing background subtraction seamlessly.
Paper 217: Head Roll Estimation using Horizontal Energy Maximization
Head Pose estimation is often a necessary step for many applications using human face, for example in human-computer interaction systems, in face recognition or in face tracking. Here we present a new method to estimate face roll by maximizing a horizontal global energy. The main idea is face salient elements such as nose basis, eyes and mouth have an approximate horizontal direction. According to roll orientation, several local maximums are extracted. A further step of validation using a score computed on relative positions, sizes, and patterns of eyes, nose and mouth allow choosing one of the local maximums. This method is evaluated on BioID and Color Feret databases and achieves roll estimation with a mean absolute error of approximately 4 degrees.
Paper 219: Semantic Shape Models for Leaf Species Identification
We present two complementary botanical-inspired leaf shape representation models for the classification of simple leaf species (leaves with one compact blade). The first representation is based on some linear measurements that characterise variations of the overall shape, while the second consists of semantic part-based segment models. These representations have two main advantages: First, they only require the extraction of two points: the base and apex, which are the key characterisation points of simple leaves. The second advantage is the complementary of the proposed model representations, which provides robustness against large leaf species variations as well as high inter-species and low intra-class similarity that occurs for some species. For the decision procedure, we use a two-stage Bayesian framework: the first concerns each shape model separately and the second is a combination of classification scores (posterior probabilities) obtained from each shape model. Experiments carried out on real world leaf images, the simple leaves of the Pl@ntLeaves scan images (46 species), show an increase in performance compared to previous related work.
Paper 220: A Generic Feature Selection Method for Background Subtraction Using Global Foreground Models
Over the last few years, a wide variety of background subtraction algorithms have been proposed for the detection of moving objects in videos acquired with a static camera. While much effort have been devoted to the development of robust background models, the automatic spatial selection of useful features for representing the background has been neglected. In this paper, we propose a generic and tractable feature selection method. Interesting contributions of this work are the proposal of a selection process coherent with the segmentation process and the exploitation of global foreground models in the selection strategy. Experiments conducted on the ViBe algorithm show that our feature selection technique improves the segmentation results.
Paper 223: A Comparison of Multi-Scale Local Binary Pattern Variants for Bark Image Retrieval
With the growing interest in identifying plant species and the availability of digital collections, many automated methods based on bark images have been proposed. Bark identification is often formulated as a texture analysis problem. Among numerous approaches, Local Binary Pattern (LBP) based texture description has achieved good performances. Bark structure appearance is subject to resolution variations which can be due to a number of factors (environment, age, acquisition conditions, etc). Thus it remains a very challenging problem. In this paper, we implement and study the efficiency of different multi-scale LBP descriptors: Multi-resolution LBP (MResLBP), Multi-Block LBP (MBLBP), LBP-Filtering (LBPF), Multi-Scale LBP (MSLBP), and Pyramid based LBP (PLBP). These descriptors are compared on two bark datasets: AFF and Trunk12. The descriptors are evaluated under increasing levels of scale space. The performances are assessed using the Mean Average Precision and RecallPrecision curves. The results show that multi-scale LBP descriptors out-perform the basic LBP and MResLBP. In our tests, we observe that the best re-sults of LBPF and PLBP are obtained under low scale space levels. In we also observe similar results for MSLBP and MBLBP across the six scales considered.
Paper 227: Bootstrapping Computer Vision and Sensor Fusion for Absolute and Relative Vehicle Positioning
With the migration into automated driving for various classes of vehicles, affordable self-positioning upto at least cm accuracy is a goal to be achieved. Commonly used techniques such as GPS are either not accurate enough in their basic variant or accurate but too expensive. In addition, sufficient GPS coverage is in several cases not guaranteed. In this paper we propose positioning of a vehicle based on fusion of several sensor inputs. We consider inputs from improved GPS (with internet based corrections), inertia sensors and vehicle sensors fused with computer vision based positioning. For vision-based positioning, cameras are used for feature-based visual odometry to do relative positioning and beacon-based for absolute positioning. Visual features are brought into a dynamic map which allows sharing information among vehicles and allows us to deal with less robust feautures. This paper does not present final results, yet it is intended to share ideas that are currently being investigated and implemented.
Paper 230: Direct Image Alignment for Active Near Infrared Image Differencing
One of the difficult challenges in face recognition is dealing with the illumination variations that occur in various environments. A practical and efficient way to address harsh illumination variations is to use active image differencing in near-infrared frequency domain. In this method, two types of image frames are taken: an illuminated frame is taken with near infrared illumination, and an ambient frame is taken without the illumination. The difference between face regions of two types of frames reveals the face image illuminated only by the illumination. Therefore the image is not affected by the ambient illumination and illumination robust face recognition can be achieved. But the method assumes that there is no motion between two frames. Faces in different locations on the two frames introduces a motion artifact. To compensate for motion between two frames, a motion interpolation method has been proposed; but it has limitations, including an assumption that the face motion is linear. In this paper, we propose a new image alignment method that directly aligns the actively illuminated and ambient frames. The method is based on Lucas Kanade parametric image alignment method and involves a new definition of errors based on the properties of the two types of frames. Experimental result shows that the direct method outperforms the motion interpolation method in terms of resulting face recognition rates.