Publications
Google Scholar maintains an update-to-date list of my publications.
2024
- arXivDifferentiable Interacting Multiple Model Particle FilteringJohn-Joseph Brady, Yuhui Luo, Wenwu Wang, Vı́ctor Elvira, and Yunpeng LiarXiv preprint arXiv:2410.00620, 2024
We propose a sequential Monte Carlo algorithm for parameter learning when the studied model exhibits random discontinuous jumps in behaviour. To facilitate the learning of high dimensional parameter sets, such as those associated to neural networks, we adopt the emerging framework of differentiable particle filtering, wherein parameters are trained by gradient descent. We design a new differentiable interacting multiple model particle filter to be capable of learning the individual behavioural regimes and the model which controls the jumping simultaneously. In contrast to previous approaches, our algorithm allows control of the computational effort assigned per regime whilst using the probability of being in a given regime to guide sampling. Furthermore, we develop a new gradient estimator that has a lower variance than established approaches and remains fast to compute, for which we prove consistency. We establish new theoretical results of the presented algorithms and demonstrate superior numerical performance compared to the previous state-of-the-art algorithms.
- TSPNormalizing Flow-based Differentiable Particle FiltersXiongjie Chen, and Yunpeng LiIEEE Transactions on Signal Processing (TSP), 2024
Recently, there has been a surge of interest in incorporating neural networks into particle filters, e.g. differentiable particle filters, to perform joint sequential state estimation and model learning for non-linear non-Gaussian state-space models in complex environments. Existing differentiable particle filters are mostly constructed with vanilla neural networks that do not allow density estimation. As a result, they are either restricted to a bootstrap particle filtering framework or employ predefined distribution families (e.g. Gaussian distributions), limiting their performance in more complex real-world scenarios. In this paper we present a differentiable particle filtering framework that uses (conditional) normalizing flows to build its dynamic model, proposal distribution, and measurement model. This not only enables valid probability densities but also allows the proposed method to adaptively learn these modules in a flexible way, without being restricted to predefined distribution families. We derive the theoretical properties of the proposed filters and evaluate the proposed normalizing flow-based differentiable particle filters’ performance through a series of numerical experiments.
- ECCVBayesian Detector Combination for Object Detection with Crowdsourced AnnotationsZhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, and Yunpeng LiIn European Conference on Computer Vision (ECCV), 2024
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise, especially in crowdsourcing scenarios. Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions. To address these algorithmic limitations and evaluation inconsistency, we first propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations, with the unique ability of automatically inferring the annotators’ label qualities. Unlike previous approaches, BDC is model-agnostic, requires no prior knowledge of the annotators’ skill level, and seamlessly integrates with existing object detection models. Due to the scarcity of real-world crowdsourced datasets, we introduce large synthetic datasets by simulating varying crowdsourcing scenarios. This allows consistent evaluation of different models at scale. Extensive experiments on both real and synthetic crowdsourced datasets show that BDC outperforms existing state-of-the-art methods, demonstrating its superiority in leveraging crowdsourced data for object detection. Our code and data are available at: https://github.com/zhiqin1998/bdc.
- FUSIONRegime Learning for Differentiable Particle FiltersJohn-Joseph Brady, Yuhui Luo, Wenwu Wang, Vı́ctor Elvira, and Yunpeng LiIn International Conference on Information Fusion (FUSION), 2024
Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultaneously. In this paper, we propose the neural network based regime learning differentiable particle filter (RLPF) to address this problem. We further design a training procedure for the RLPF and other related algorithms. We demonstrate competitive performance compared to the previous state-of-the-art algorithms on a pair of numerical experiments.
- MIUAH-FCBFormer: Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating PaperRyan Banks, Bernat Rovira-Lastra, Jordi Martinez-Gomis, Akhilanand Chaurasia, and Yunpeng LiIn UK Conference on Medical Image Understanding and Analysis (MIUA), 2024
Occlusal contacts are the locations at which the occluding surfaces of the maxilla and the mandible posterior teeth meet. Occlusal contact detection is a vital tool for restoring the loss of masticatory function and is a mandatory assessment in the field of dentistry, with particular importance in prosthodontics and restorative dentistry. The most common method for occlusal contact detection is articulating paper. However, this method can indicate significant medically false positive and medically false negative contact areas, leaving the identification of true occlusal indications to clinicians. To address this, we propose a multiclass Vision Transformer and Fully Convolutional Network ensemble semantic segmentation model with a combination hierarchical loss function, which we name as Hierarchical Fully Convolutional Branch Transformer (H-FCBFormer). We also propose a method of generating medically true positive semantic segmentation masks derived from expert annotated articulating paper masks and gold standard masks. The proposed model outperforms other machine learning methods evaluated at detecting medically true positive contacts and performs better than dentists in terms of accurately identifying object-wise occlusal contact areas while taking significantly less time to identify them. Code is available at https://github.com/Banksylel/H-FCBFormer.
- SAMRevisiting semi-supervised training objectives for differentiable particle filtersJiaxi Li, John-Joseph Brady, Xiongjie Chen, and Yunpeng LiIn IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), 2024
Differentiable particle filters combine the flexibility of neural networks with the probabilistic nature of sequential Monte Carlo methods. However, traditional approaches rely on the availability of labelled data, i.e., the ground truth latent state information, which is often difficult to obtain in real-world applications. This paper compares the effectiveness of two semisupervised training objectives for differentiable particle filters. We present results in two simulated environments where labelled data are scarce
- ACMLFolded Hamiltonian Monte Carlo for Bayesian Generative Adversarial NetworksNarges Pourshahrokhi, Yunpeng Li, Samaneh Kouchaki, and Payam BarnaghiIn Asian Conference on Machine Learning (ACML), 2024
Probabilistic modelling on Generative Adversarial Networks (GANs) within the Bayesian framework has shown success in estimating the complex distribution in literature. In this paper, we develop a Bayesian formulation for unsupervised and semi-supervised GAN learning. Specifically, we propose Folded Hamiltonian Monte Carlo (F-HMC) methods within this framework to learn the distributions over the parameters of the generators and discriminators. We show that the F-HMC efficiently approximates multi-modal and high dimensional data when combined with Bayesian GANs. Its composition improves run time and test error in generating diverse samples. Experimental results with high-dimensional synthetic multi-modal data and natural image benchmarks, including CIFAR-10, SVHN and ImageNet, show that F-HMC outperforms the state-of-the-art methods in terms of test error, run times per epoch, inception score and Frechet Inception Distance scores.
2023
- FoDSAn overview of differentiable particle filters for data-adaptive sequential Bayesian inferenceXiongjie Chen, and Yunpeng LiFoundations of Data Science, 2023
By approximating posterior distributions with weighted samples, particle filters (PFs) provide an efficient mechanism for solving non-linear sequential state estimation problems. While the effectiveness of particle filters has been recognised in various applications, their performance relies on the knowledge of dynamic models and measurement models, as well as the construction of effective proposal distributions. An emerging trend involves constructing components of particle filters using neural networks and optimising them by gradient descent, and such data-adaptive particle filtering approaches are often called differentiable particle filters. Due to the expressiveness of neural networks, differentiable particle filters are a promising computational tool for performing inference on sequential data in complex, high-dimensional tasks, such as vision-based robot localisation. In this paper, we review recent advances in differentiable particle filters and their applications. We place special emphasis on different design choices for key components of differentiable particle filters, including dynamic models, measurement models, proposal distributions, optimisation objectives, and differentiable resampling techniques.
- AsilomarLearning Differentiable Particle Filter on the FlyJiaxi Li, Xiongjie Chen, and Yunpeng LiIn Asilomar Conference on Signals, Systems, and Computers, 2023
Differentiable particle filters are an emerging class of sequential Bayesian inference techniques that use neural networks to construct components in state space models. Existing approaches are mostly based on offline supervised training strategies. This leads to the delay of the model deployment and the obtained filters are susceptible to distribution shift of test-time data. In this paper, we propose an online learning framework for differentiable particle filters so that model parameters can be updated as data arrive. The technical constraint is that there is no known ground truth state information in the online inference setting. We address this by adopting an unsupervised loss to construct the online model updating procedure, which involves a sequence of filtering operations for online maximum likelihood-based parameter estimation. We empirically evaluate the effectiveness of the proposed method, and compare it with supervised learning methods in simulation settings including a multivariate linear Gaussian state-space model and a simulated object tracking experiment.
- ICICSDRoT: A Decentralised Root of Trust for Trusted NetworksLoganathan Parthipan, Liqun Chen, Christopher JP Newton, Yunpeng Li, Fei Liu, and 1 more authorIn International Conference on Information and Communications Security (ICICS), 2023
- ICASSPParticle flow Gaussian sum particle filterKarthik Comandur, Yunpeng Li, and Santosh NannuruIn International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
The particle flow Gaussian particle filter (PFGPF) uses an invertible particle flow to generate a proposal density. It approximates the predictive and posterior distributions as Gaussian densities. In this paper, we use a bank of PFGPF filters to construct a Particle flow Gaussian sum particle filter (PFGSPF), which approximates the prediction and posterior as Gaussian mixture model. This approximation is useful in complex estimation problems where a single Gaussian approximation is inadequate. We compare the performance of this proposed filter with the PFGPF and others in challenging numerical simulations.
- ICASSPBatch-Ensemble Stochastic Neural Networks for Out-of-Distribution DetectionXiongjie Chen, Yunpeng Li, and Yongxin YangIn International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community because it is important for deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modeling data distributions in feature spaces. We further incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble stochastic neural networks (BE-SNNs) and overcome the feature collapse problem. We compare the performance of the proposed BE-SNNs with the other state-of-the-art approaches and show that BE-SNNs yield superior performance on several OOD detection benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, Fashion-MNIST vs NotMNIST dataset, and the CIFAR10 vs SVHN dataset.
- SSPDifferentiable bootstrap particle filters for regime-switching modelsWenhan Li, Xiongjie Chen, Wenwu Wang, Vı́ctor Elvira, and Yunpeng LiIn IEEE Statistical Signal Processing Workshop (SSP), 2023
Differentiable particle filters are an emerging class of particle filtering methods that use neural networks to construct and learn parametric state-space models. In real-world applications, both the state dynamics and measurements can switch between a set of candidate models. For instance, in target tracking, vehicles can idle, move through traffic, or cruise on motorways, and measurements are collected in different geographical or weather conditions. This paper proposes a new differentiable particle filter for regime-switching state-space models. The method can learn a set of unknown candidate dynamic and measurement models and track the state posteriors. We evaluate the performance of the novel algorithm in relevant models, showing its great performance compared to other competitive algorithms.
2022
- ICLRAugmented sliced Wasserstein distancesXiongjie Chen, Yongxin Yang, and Yunpeng LiIn International Conference on Learning Representations (ICLR), 2022
While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost. The sliced Wasserstein distance and its variants improve the computational efficiency through the random projection, yet they suffer from low accuracy if the number of projections is not sufficiently large, because the majority of projections result in trivially small values. In this work, we propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs), constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks. It is derived from a key observation that (random) linear projections of samples residing on these hypersurfaces would translate to much more flexible nonlinear projections in the original sample space, so they can capture complex structures of the data distribution. We show that the hypersurfaces can be optimized by gradient ascent efficiently. We provide the condition under which the ASWD is a valid metric and show that this can be obtained by an injective neural network architecture. Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.
- ECMLImitation learning with Sinkhorn distancesGeorge Papagiannis, and Yunpeng LiIn European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2022
Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation.
- EUSIPCOConditional measurement density estimation in sequential Monte Carlo via normalizing flowXiongjie Chen, and Yunpeng LiIn European Signal Processing Conference (EUSIPCO), 2022
Tuning of measurement models is challenging in real-world applications of sequential Monte Carlo methods. Re-cent advances in differentiable particle filters have led to various efforts to learn measurement models through neural networks. But existing approaches in the differentiable particle filter frame-work do not admit valid probability densities in constructing measurement models, leading to incorrect quantification of the measurement uncertainty given state information. We propose to learn expressive and valid probability densities in measurement models through conditional normalizing flows, to capture the complex likelihood of measurements given states. We show that the proposed approach leads to improved estimation performance and faster training convergence in a visual tracking experiment.
- FUSIONParticle flow Gaussian particle filterKarthik Comandur, Yunpeng Li, and Santosh NannuruIn International Conference on Information Fusion (FUSION), 2022
State estimation in non-linear models is performed by tracking the posterior distribution recursively. A plethora of algorithms have been proposed for this task. Among them, the Gaussian particle filter uses a weighted set of particles to construct a Gaussian approximation to the posterior. In this paper, we propose to use invertible particle flow methods, derived under the Gaussian boundary conditions for a flow equation, to generate a proposal distribution close to the posterior. The resultant particle flow Gaussian particle filter (PFGPF) algorithm retains the asymptotic properties of Gaussian particle filters, with the potential for improved state estimation performance in high-dimensional spaces. We compare the performance of PFGPF with the particle flow filters and particle flow particle filters in two challenging numerical simulation examples.
- BMVCTowards unsupervised sketch-based image retrievalConghui Hu, Yongxin Yang, Yunpeng Li, Timothy M Hospedales, and Yi-Zhe SongIn British Machine Vision Conference (BMVC), 2022
The practical value of existing supervised sketch-based image retrieval (SBIR) algorithms is largely limited by the requirement for intensive data collection and labeling. In this paper, we present the first attempt at unsupervised SBIR to remove the labeling cost (both category annotations and sketch-photo pairings) that is conventionally needed for training. Existing single-domain unsupervised representation learning methods perform poorly in this application, due to the unique cross-domain (sketch and photo) nature of the problem. We therefore introduce a novel framework that simultaneously performs sketch-photo domain alignment and semantic-aware representation learning. Technically this is underpinned by introducing joint distribution optimal transport (JDOT) to align data from different domains, which we extend with trainable cluster prototypes and feature memory banks to further improve scalability and efficacy. Extensive experiments show that our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
2021
- NeurIPSHumBugDB: A Large-scale Acoustic Mosquito DatasetIvan Kiskin, Marianne Sinka, Adam D Cobb, Waqas Rafique, Lawrence Wang, and 11 more authorsIn Conference on Neural Information Processing Systems (NeurIPS), 2021
This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight. We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time. Significantly, 18 hours of recordings contain annotations from 36 different species. Mosquitoes are well-known carriers of diseases such as malaria, dengue and yellow fever. Collecting this dataset is motivated by the need to assist applications which utilise mosquito acoustics to conduct surveys to help predict outbreaks and inform intervention policy. The task of detecting mosquitoes from the sound of their wingbeats is challenging due to the difficulty in collecting recordings from realistic scenarios. To address this, as part of the HumBug project, we conducted global experiments to record mosquitoes ranging from those bred in culture cages to mosquitoes captured in the wild. Consequently, the audio recordings vary in signal-to-noise ratio and contain a broad range of indoor and outdoor background environments from Tanzania, Thailand, Kenya, the USA and the UK. In this paper we describe in detail how we collected, labelled and curated the data. The data is provided from a PostgreSQL database, which contains important metadata such as the capture method, age, feeding status and gender of the mosquitoes. Additionally, we provide code to extract features and train Bayesian convolutional neural networks for two key tasks: the identification of mosquitoes from their corresponding background environments, and the classification of detected mosquitoes into species. Our extensive dataset is both challenging to machine learning researchers focusing on acoustic identification, and critical to entomologists, geo-spatial modellers and other domain experts to understand mosquito behaviour, model their distribution, and manage the threat they pose to humans.
- GlobecomA survey of technologies for building trusted networksLoganathan Parthipan, Liqun Chen, David Gérault, Yunpeng Li, Fei Liu, and 2 more authorsIn IEEE Globecom Workshops (GC Wkshps), 2021
In the current generation of networks, there has been a strong focus on security and integrity. In sixth-generation (6G) networks trust will also be an important requirement, but how do we build trust in a network? Many researchers have started to pay attention to this, but research in this field is still at an early stage. Taking our lead from the development of trusted computing for single devices we require a root of trust and a mechanism for reliably measuring and reporting on the state of the network. In this paper, we survey existing technologies that we feel can be used to achieve this. We explore trusted computing technologies that enable a single device to be trusted and suggest how they can be adapted to help build a trusted network. For reporting, we need a mechanism to immutably store measurements on the system. We consider that distributed ledger technologies could fulfil this role as they offer immutability, decentralised consensus, and transparency.
- FUSIONDifferentiable particle filters through conditional normalizing flowXiongjie Chen, Hao Wen, and Yunpeng LiIn International Conference on Information Fusion (FUSION), 2021
Differentiable particle filters provide a flexible mechanism to adaptively train dynamic and measurement models by learning from observed data. However, most existing differentiable particle filters are within the bootstrap particle filtering framework and fail to incorporate the information from latest observations to construct better proposals. In this paper, we utilize conditional normalizing flows to construct proposal distributions for differentiable particle filters, enriching the distribution families that the proposal distributions can represent. In addition, normalizing flows are incorporated in the construction of the dynamic model, resulting in a more expressive dynamic model. We demonstrate the performance of the proposed conditional normalizing flow-based differentiable particle filters in a visual tracking task.
- MEEHumBug–an acoustic mosquito monitoring tool for use on budget smartphonesMarianne E Sinka, Davide Zilli, Yunpeng Li, Ivan Kiskin, Dickson Msaky, and 6 more authorsMethods in ecology and evolution, 2021
1. Mosquito surveys are time-consuming, expensive and can provide a biased spatial sample of occurrence data—the data often representing the location of the surveys, not the occurrence of the mosquitoes. 2. We present the HumBug project, an acoustic system that can turn any Android smartphone into a mosquito sensor. Our sensor has the potential to significantly increase the quantity of mosquito occurrence data as well as access locations that are more difficult to survey by traditional means. 3. We describe our database of wild-captured mosquito fight tone audio data and outline our mosquito detection algorithms that these data train. We also present our MozzWear App, designed to work on budget smartphones, which, together with our HumBug Net (an adapted traditional bednet), facilitates data collection and allows the user to record and directly upload mosquito flight tones from any dwelling with a bednet in the field. 4. Our HumBug system has the potential to vastly increase our understanding of the distribution of mosquito species in space and time and greatly improve surveys needed to assess the success or failure of ongoing vector control measures. At a time when the WHO reports a plateauing in the decade-long decline in malaria mortality rates, this new technological solution for surveying mosquito vectors will provide a timely new resource.
2020
- ICAIFIndex tracking with differentiable asset selectionYu Zheng, Yunpeng Li, Qiuhua Xu, Timothy Hospedales, and Yongxin YangIn ACM International Conference on AI in Finance, 2020
Partial index tracking aims to replicate the performance of a given benchmark index with a small number of its constituents. It can be formulated as a sparse regression problem, but remains challenging due to several practical constraints, especially the fixed number of assets in the portfolio. In this paper, we propose a differentiable relaxation for asset selection, such that we can construct a portfolio with exactly 𝐾 assets, where the objective function can be optimised efficiently via vanilla gradient descent. Our method is backtested with S&P 500 index data from 2002 to 2020. Empirical results demonstrate that our model achieves excellent tracking performance compared with some widely used approaches.
- NCABioacoustic detection with wavelet-conditioned convolutional neural networksIvan Kiskin, Davide Zilli, Yunpeng Li, Marianne Sinka, Kathy Willis, and 1 more authorNeural Computing and Applications, 2020
Many real-world time series analysis problems are characterized by low signal-to-noise ratios and compounded by scarce data. Solutions to these types of problems often rely on handcrafted features extracted in the time or frequency domain. Recent high-profile advances in deep learning have improved performance across many application domains; however, they typically rely on large data sets that may not always be available. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. We show that convolutional neural networks (CNNs), operating on wavelet transformations of audio recordings, demonstrate superior performance over conventional classifiers that utilize handcrafted features. Our key result is that wavelet transformations offer a clear benefit over the more commonly used short-time Fourier transform. Furthermore, we show that features, handcrafted for a particular dataset, do not generalize well to other datasets. Conversely, CNNs trained on generic features are able to achieve comparable results across multiple datasets, along with outperforming human labellers. We present our results on the application of both detecting the presence of mosquitoes and the classification of bird species.
2019
- TSPInvertible particle-flow-based sequential MCMC with extension to Gaussian mixture noise modelsYunpeng Li, Soumyasundar Pal, and Mark J. CoatesIEEE Transactions on Signal Processing (TSP), 2019
Sequential state estimation in non-linear and non-Gaussian state spaces has a wide range of applications in statistics and signal processing. One of the most effective non-linear filtering approaches, particle filtering, suffers from weight degeneracy in high-dimensional filtering scenarios. Several avenues have been pursued to address high dimensionality. Among these, particle flow filters construct effective proposal distributions by using invertible flow to migrate particles continuously from the prior distribution to the posterior, and sequential Markov chain Monte Carlo (SMCMC) methods use a Metropolis-Hastings (MH) accept-reject approach to improve filtering performance. In this paper, we propose to combine the strengths of invertible particle flow and SMCMC by constructing a composite MH kernel within the SMCMC framework using invertible particle flow. In addition, we propose a Gaussian-mixture-model-based particle flow algorithm to construct effective MH kernels for multi-modal distributions. Simulation results show that for high-dimensional state estimation example problems, the proposed kernels significantly increase the acceptance rate with minimal additional computational overhead and improve estimation accuracy compared with state-of-the-art filtering algorithms.
2018
- ML4DBCCNet: Bayesian classifier combination neural networkOlga Isupova, Yunpeng Li, Danil Kuzin, Stephen J Roberts, Katherine Willis, and 1 more authorIn NeurIPS Workshop on Machine Learning for the Developing World (ML4D), 2018
Machine learning research for developing countries can demonstrate clear sustainable impact by delivering actionable and timely information to in-country government organisations (GOs) and NGOs in response to their critical information requirements. We co-create products with UK and in-country commercial, GO and NGO partners to ensure the machine learning algorithms address appropriate user needs whether for tactical decision making or evidence-based policy decisions. In one particular case, we developed and deployed a novel algorithm, BCCNet, to quickly process large quantities of unstructured data to prevent and respond to natural disasters. Crowdsourcing provides an efficient mechanism to generate labels from unstructured data to prime machine learning algorithms for large scale data analysis. However, these labels are often imperfect with qualities varying among different citizen scientists, which prohibits their direct use with many state-of-the-art machine learning techniques. We describe BCCNet, a framework that simultaneously aggregates biased and contradictory labels from the crowd and trains an automatic classifier to process new data. Our case studies, mosquito sound detection for malaria prevention and damage detection for disaster response, show the efficacy of our method in the challenging context of developing world applications.
- DCASEFast mosquito acoustic detection with field cup recordings: an initial investigation.Yunpeng Li, Ivan Kiskin, Marianne Sinka, Davide Zilli, Henry Chan, and 5 more authorsIn Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2018
In terms of vectoring disease, mosquitoes are the world’s deadliest. A fast and efficient mosquito survey tool is crucial for vectored disease intervention programmes to reduce mosquito-induced deaths. Standard mosquito sampling techniques, such as human landing catches, are time consuming, expensive and can put the collectors at risk of diseases. Mosquito acoustic detection aims to provide a cost-effective automated detection tool, based on mosquitoes’ characteristic flight tones. We propose a simple, yet highly effective, classification pipeline based on the mel-frequency spectrum allied with convolutional neural networks. This detection pipeline is computationally efficient in not only detecting mosquitoes, but also in classifying species. Many previous assessments of mosquito acoustic detection techniques have relied only upon lab recordings of mosquito colonies. We illustrate in this paper our proposed algorithm’s performance over an extensive dataset, consisting of cup recordings of more than 1000 mosquito individuals from 6 species captured in field studies in Thailand.
2017
- BSPCMicrowave breast cancer detection via cost-sensitive ensemble classifiers: Phantom and patient investigationYunpeng Li, Emily Porter, Adam Santorelli, Milica Popović, and Mark CoatesBiomedical Signal Processing and Control (BSPC), 2017
Microwave breast screening has been proposed as a complementary modality to the current standard of X-ray mammography. In this work, we design three ensemble classification structures that fuse information from multiple sensors to detect abnormalities in the breast. A principled Neyman–Pearson approach is developed to allow control of the trade-off between false positive rate and the false negative rate. We evaluate performance using data derived from measurements of heterogeneous breast phantoms. We also use data collected in a clinical trial that monitored 12 healthy patients monthly over an eight-month period. In order to assess the efficacy of the proposed algorithms we model scans of breasts with malignant lesions by artificially adding simulated tumour responses to existing scans of healthy volunteers. Tumour responses are constructed based on measured properties of breast tissues and real breast measurements, thus the simulation model takes into account the heterogeneity of the breast tissue. The algorithms we present take advantage of breast scans from other patients or tissue-mimicking breast phantoms to learn about breast content and what constitutes a “tumour-free” and “tumour-bearing” set of measurements. We demonstrate that the ensemble selection-based algorithm, which constructs an ensemble of the most informative classifiers, significantly outperforms other detection techniques for the clinical trial data set.
- TSPParticle filtering with invertible particle flowYunpeng Li, and Mark CoatesIEEE Transactions on Signal Processing (TSP), 2017
A key challenge when designing particle filters in high-dimensional state spaces is the construction of a proposal distribution that is close to the posterior distribution. Recent advances in particle flow filters provide a promising avenue to avoid weight degeneracy; particles drawn from the prior distribution are migrated in the state space to the posterior distribution by solving partial differential equations. Numerous particle flow filters have been proposed based on different assumptions concerning the flow dynamics. Approximations are needed in the implementation of all of these filters; as a result, the particles do not exactly match a sample drawn from the desired posterior distribution. Past efforts to correct the discrepancies involve expensive calculations of importance weights. In this paper, we present new filters which incorporate deterministic particle flows into an encompassing particle filter framework. The valuable theoretical guarantees concerning particle filter performance still apply, but we can exploit the attractive performance of the particle flow methods. The filters we describe involve a computationally efficient weight update step, arising because the embedded particle flows we design possess an invertible mapping property. We evaluate the proposed particle flow particle filters’ performance through numerical simulations of a challenging multitarget multisensor tracking scenario and complex high-dimensional filtering examples.
- ML4AudioCost-sensitive detection with variational autoencoders for environmental acoustic sensingYunpeng Li, Ivan Kiskin, Davide Zilli, Marianne Sinka, Henry Chan, and 2 more authorsIn NeurIPS Workshop on Machine Learning for Audio Signal Processing (ML4Audio), 2017
Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off between the false positive rate and the false negative rate. This paper presents a cost-sensitive classification paradigm, in which the hyper-parameters of classifiers and the structure of variational autoencoders are selected in a principled Neyman-Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug project which aims to detect the presence of mosquitoes using sound collected by simple embedded devices.
- ML4DMosquito detection with low-cost smartphones: data acquisition for malaria researchYunpeng Li, Davide Zilli, Henry Chan, Ivan Kiskin, Marianne Sinka, and 2 more authorsIn NeurIPS Workshop on Machine Learning for the Developing World (ML4D), 2017
Mosquitoes are the only vector for malaria, causing hundreds of thousands of deaths in the developing world each year. Not only is the prevention of mosquito bites of paramount importance to the reduction of malaria transmission cases, but understanding in more forensic detail the interplay between malaria, mosquito vectors, vegetation, standing water and human populations is crucial to the deployment of more effective interventions. Typically the presence and detection of mosquitoes is quantified through insect traps and human operations. If we are to gather timely, large-scale data to improve this situation, we need to automate the process of mosquito detection and classification as much as possible. In this paper, we present a prototype mobile sensing system that acts as both a portable early warning device and an automatic acoustic data acquisition pipeline to help fuel scientific inquiry and policy. The machine learning algorithm that powers the mobile system achieves excellent off-line multi-species detection performance while remaining computationally efficient. Further, we have conducted preliminary live mosquito detection tests using low-cost mobile phones and achieved promising results. The deployment of this system for field usage in Southeast Asia and Africa is planned in the near future. In order to accelerate processing of field recordings and labelling of collected data, we employ a citizen science platform in conjunction with automated methods, the former implemented using the Zooniverse platform, allowing crowdsourcing on a grand scale.
- SPIEParticle flow superpositional GLMB filterAugustin-Alexandru Saucan, Yunpeng Li, and Mark CoatesIn SPIE Conference on Signal Processing, Sensor/Information Fusion, and Target Recognition, 2017
In this paper we propose a Superpositional Marginalized δ-GLMB (SMδ-GLMB) filter for multi-target tracking and we provide bootstrap and particle flow particle filter implementations. Particle filter implementations of the marginalized δ-GLMB filter are computationally demanding. As a first contribution we show that for the specific case of superpositional observation models, a reduced complexity update step can be achieved by employing a superpositional change of variables. The resulting SMδ-GLMB filter can be readily implemented using the unscented Kalman filter or particle filtering methods. As a second contribution, we employ particle flow to produce a measurement-driven importance distribution that serves as a proposal in the SMδ-GLMB particle filter. In high-dimensional state systems or for highly- informative observations the generic particle filter often suffers from weight degeneracy or otherwise requires a prohibitively large number of particles. Particle flow avoids particle weight degeneracy by guiding particles to regions where the posterior is significant. Numerical simulations showcase the reduced complexity and improved performance of the bootstrap SMδ-GLMB filter with respect to the bootstrap Mδ-GLMB filter. The particle flow SMδ-GLMB filter further improves the accuracy of track estimates for highly informative measurements.
- ICASSPParticle flow SMC delta-GLMB filterAugustin-Alexandru Saucan, Yunpeng Li, and Mark CoatesIn International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
In this paper we derive a particle flow particle filter implementation of the δ-Generalized Labeled Multi-Bernoulli (δ-GLMB) filter. The bootstrap particle filter δ-GLMB suffers from weight degeneracy for high-dimensional state systems or low measurement noise. In order to avoid weight degeneracy, we employ particle flow to produce a measurement-driven importance distribution that serves as a proposal in the δ-GLMB particle filter. Flow-induced proposals are developed for both types of targets encountered in the δ-GLMB filter, i.e., persistent and birth targets. Numerical simulations reflect the improved performance of the proposed filter with respect to classical bootstrap implementations.
- ICASSPSequential MCMC with invertible particle flowYunpeng Li, and Mark CoatesIn International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017
Particle filters are among the most effective filtering algorithms for nonlinear and non-Gaussian models. When the state dimension is high, they are known to suffer from weight degeneracy. Sequential Markov chain Monte Carlo (SMCMC) methods have been proposed as an alternative sequential inference technique that can perform better in high dimensional state spaces. In this paper, we propose to construct a composite Metropolis-Hastings (MH) kernel within the SMCMC framework using invertible particle flow. Simulation results show that the proposed kernel significantly increases the acceptance rate and improves estimation accuracy compared with state-of-the-art filtering algorithms, in high dimensional simulation examples.