:::info
Authors:
(1) Jyotibdha Acharya (Student Member, IEEE), HealthTech NTU, Interdisciplinary Graduate Program, Nanyang Technological University, Singapore;
(2) Arindam Basu (Senior Member, IEEE), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore.
:::
Table of Links
Abstract and I Introduction
II. Materials and Methods
III. Results and Discussions
IV. Conclusion and References
Abstract—The primary objective of this paper is to build classification models and strategies to identify breathing sound anomalies (wheeze, crackle) for automated diagnosis of respiratory and pulmonary diseases. In this work we propose a deep CNN-RNN model that classifies respiratory sounds based on Melspectrograms. We also implement a patient specific model tuning strategy that first screens respiratory patients and then builds patient specific classification models using limited patient data for reliable anomaly detection. Moreover, we devise a local log quantization strategy for model weights to reduce the memory footprint for deployment in memory constrained systems such as wearable devices. The proposed hybrid CNN-RNN model achieves a score of 66.31% on four-class classification of breathing cycles for ICBHI’17 scientific challenge respiratory sound database. When the model is re-trained with patient specific data, it produces a score of 71.81% for leave-one-out validation. The proposed weight quantization technique achieves ≈ 4× reduction in total memory cost without loss of performance. The main contribution of the paper is as follows: Firstly, the proposed model is able to achieve state of the art score on the ICBHI’17 dataset. Secondly, deep learning models are shown to successfully learn domain specific knowledge when pre-trained with breathing data and produce significantly superior performance compared to generalized models. Finally, local log quantization of trained weights is shown to be able to reduce the memory requirement significantly. This type of patient-specific re-training strategy can be very useful in developing reliable long-term automated patient monitoring systems particularly in wearable healthcare solutions.
I. INTRODUCTION
Two most clinically significant lung sound anomalies are wheeze and crackle. Wheeze is a continuous high pitched adventitious sound that results from obstruction of breathing airway. While normal breathing sounds have majority of their energy concentrated in 80-1600Hz [1], wheeze sounds have been shown to be present in the frequency range 100Hz2KHz. Wheeze is normally associated with patients suffering from asthma, chronic obstructive pulmonary disease (COPD) etc. Crackles are explosive and discontinuous sounds present during inspiratory and expiratory parts of breathing cycle with a significantly smaller duration compared to the total breathing cycle. Crackles have been associated with obstructive airway diseases and interstitial lung diseases [2].
Auscultation has been used historically for screening and monitoring respiratory diseases. It provides a simple and non-invasive approach to detect respiratory and cardiovascular diseases based on lung sound abnormalities. But these methods suffer from two disadvantages. Firstly, a trained medical professional is required to diagnose a patient based on adventitious lung sounds and therefore, disproportionate number of medical practitioners compared to overall population hinders the speed at which patients are tested. Secondly, even if the patients are diagnosed by experienced professionals, there might be subjectivity in the diagnosis due to dissimilar interpretation of the respiratory sounds by different medical professionals [3].
So, in the past decade several attempts were made to design algorithms and feature extraction techniques for automated detection of breathing anomalies. Some popular feature extraction techniques used include spectrogram [4], MelFrequency Cepstral Coefficients (MFCC) [5], wavelet coefficients [6], entropy based features [7] etc. Several machine learning (ML) algorithms have been developed in past few years to detect breathing sound anomalies such as logistic regression [8], Dynamic Time Wrap (DTW), Gaussian mixture model (GMM) [9], random forest [4], Hidden Markov Model (HMM) [10] etc. An exploration of existing literature reveals some conspicuous issues with these approaches. Firstly, most of the ML algorithms use manually crafted highly complex features suitable for their algorithms and due to absence of publicly available datasets, it was hard to compare the efficacy of the feature extraction methods and algorithms proposed [11]. Secondly, most of the strategies were developed for a binary classification problem to identify either wheeze or crackle and therefore, not suitable for multi-class classification to detect wheeze and crackle simultaneously [12]. These drawbacks make these approaches difficult to apply in real world scenarios.
Deep learning has gained a lot of attention in recent years due to its unparalleled success in a variety of applications including clinical diagnostics and biomedical engineering [13]. A significant advantage of these deep learning paradigms is that there is no need to manually craft features from the data since the network learns useful features and abstract representations from the data through training. Due to wide success of convolutional neural networks (CNN) in image related tasks, they have been extensively used in biomedical research for image classification [14], anomaly detection [15], image segmentation [16], image enhancement [17], automated report generation [18] etc. There have been multiple successful applications of deep CNNs in diagnosis of cardiac diseases [19], neurological diseases [20], cancer [21] and ophthalmic diseases [22]. While CNNs have shown significant promise for analyzing image data, recurrent neural networks (RNN) are better suited for learning long term dependencies in sequential and time-series data [23]. The state of the art systems in natural language processing (NLP), audio and speech processing use deep RNNs to learn sequential and temporal features [24]. Finally, hybrid CNN-RNN models have shown significant success in video analytics [25] and speech recognition [26]. These hybrid models show particular promise in cases where both spatial and temporal/sequential features need to be learned from the data.
Since deep learning came into prominence, it is also being used by researchers for audio based biomedical diagnosis and anomaly detection. Some significant areas of audio based diagnosis using deep learning include sleep apnea detection, cough sound identification, heart sound classification etc. Amoh et al. [27] used a chest mounted sensor to collect audio containing both speech and cough sounds and then used both CNN and RNN to identify cough sounds. Similarly, Nakano et al. [28] trained a deep neural networks on tracheal sound spectrograms to detect sleep apnea. In [29], authors train a CNN architecture to classify heart sounds into normal and abnormal classes. The non-invasive nature of audio based diagnosis make them an attractive choice for biomedical applications.
A major handicap in training a deep network is that a significantly large dataset and considerable time and resources need to be allocated for the training. While the second issue can be solved by using dedicated deep learning accelerators (GPU,TPU etc), the first issue is even more exacerbated for medical research since medical datasets are very sparse and difficult to obtain. One way to circumvent this issue is to use transfer learning. The central idea behind transfer learning is following: a deep network trained in a domain D1 to perform task T1 can successfully use the learned data representations to perform task T2 in domain D2. Most commonly used method for transfer learning is using a large dataset to train a deep network and then re-training a small section of the network on the available data (often significantly small) for the specific task and specific domain. Transfer learning has been used in medical research for cancer diagnosis [30], prediction of neurological diseases [31] etc. While traditionally transfer learning refers to transfer of knowledge between two disparate domains, for biomedical research, it is also used for knowledge transfer in the same domain where a model is trained on a larger population dataset and the knowledge is then transferred for context specific learning on a smaller dataset [32][33]. This strategy is specially useful for biomedical applications due to scarcity of domain specific patient data.
Finally, for employing machine learning methods for medical diagnosis, two primary approaches are used. The first one is generalized models where the models are trained on a database of multiple patient data and it is tested on new patient data. This type of models learns generalized features present across all the patients. While this kind of models are often easier to deploy, they often suffer from inter-patient variability of features and may not produce reliable results for unseen patient data. The second approach is patient-specific models, where the models are trained on patient-specific data to produce more precise results for the patient-specific diagnosis. While these models are harder to train due to difficulty in collecting large amount of patient-specific data, they often produce very reliable and consistent results [34]. While a patient-specific model requires additional time and effort from healthcare professionals for collecting and labeling the data, specially for chronic diseases where long-term monitoring is of the essence, this additional effort is well compensated by reduced hospitalization and reduced loss of time from work for the patient resulting from better continuous monitoring [35].
Since a large fraction of medical diagnosis algorithms are geared toward wearable devices and mobile platforms, large memory and computational power requirement of deep learning methods present a considerable challenge for commercial deployment. Weight quantization [7], low precision computation [36] and lightweight networks [37] are some of the approaches used to address this challenge. Quantizing the weights of the trained network is the most straight-forward way to reduce the memory requirement for deployment. DNNs with 8 or 16 bit weights have been shown to achieve comparable accuracy compared to their full precision counterpart [7]. Though linear quantization is most commonly used, log quantization has been shown to achieve similar accuracy at lower bit precision [38]. Finally, lightweight networks like MobileNet [37] reduces computational complexity and memory requirement without significant loss of accuracy by replacing traditional convolution layers by depthwise separable convolution.
In this paper we propose a hybrid CNN-RNN model to perform four class classification of breathing sounds on International Conference on Biomedical and Health Informatics (ICBHI’17) scientific challenge respiratory sound database [39] and then devise a screen and model tuning strategy to build patient specific diagnosis models from limited patient data. For comparison of our model with more commonly used CNN architectures, we applied the same methodology on VGGnet [40] and Mobilenet [37] architecture. Finally, we propose a layerwise logarithmic quantization scheme that can reduce the memory footprint of the networks without significant loss of performance. The sections are organized as follows: section II describes the dataset, feature extraction method, proposed deep learning model and weight quantization. Section III tabulates the results for generalized and patient specific model and quantization performance. Finally, section IV discusses the conclusions and main contributions of the paper.
:::info
This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.
:::
