A Deep Learning-based Dengue Mosquito Detection Method Using Faster R-CNN and Image Processing Techniques

Received: 20th February 2021; Accepted: 21st June 2021; Published: 1st July 2021 Abstract: Dengue fever, a mosquito-borne disease caused by dengue viruses, is a significant public health concern in many countries especially in the tropical and subtropical regions. In this paper, we introduce a deep learning-based model using Faster R-CNN with InceptionV2 accompanied by image processing techniques to identify the dengue mosquitoes. Performance of the proposed model is evaluated using a custom mosquito dataset built upon varying environments which are collected from the internet. The proposed Faster R-CNN with InceptionV2 model is compared with other two state-of-art models, R-FCN with ResNet 101 and SSD with MobilenetV2. The False positive (FP), False negative (FN), precision and recall are used as performance measurement tools to evaluate the detection accuracy of the proposed model. The experimental results demonstrate that as a classifier the FasterRCNN model shows 95.19% of accuracy and outperforms other state-of-the-art models as R-FCN and SSD model show 94.20% and 92.55% detection accuracy, respectively for the test dataset.


Introduction
An infectious disease is a terrible threat and causes destruction for a nation. It affects the growth of public health and social sustainability. Mosquito-borne viruses like Malaria, Dengue, West Nile Fever, Zika Fever, etc. are the world's fastest-spreading infectious diseases 1 . Dengue is a viral infection exacerbated by four types of the Flaviviridae family of viruses (DENV-1, DENV-2, DENV-3, and DENV-4). The viruses spread through bites of contaminated female mosquitoes named Aedes aegypti and Aedes albopictus 1 . These mosquitoes are also a vector of Chikungunya, yellow fever, and Zika viruses 2 . The disease has been prevalent in the South Asian countries where there is a heavy rain that provides the ideal breeding ground for the mosquito-borne virus during the monsoon

Literature Review
Object detection has gained significant attention from researchers in recent years because of its association with image recognition. Image processing methods such as detection based on Histogram of Oriented Gradients (HOG) [13], object detection using Haar-like features [14], and machine learning-based methods such as support vector machine (SVM) [15], artificial neural network (ANN) [16], background removal [17] have been commonly used in traditional image classification or detection. Compared to the conventional machine learning approaches, recent developments in architectures of the deep neural network defined by the CNN have shown a considerable improvement in efficiency [18,19]. As the significance of mosquito control has increased, several experiments have been performed to use neural networks so that it can predict mosquitoes from a single image [20,21]. An image-based insect classification system has been proposed in this research [22] using four methods of feature extraction: Hu moments (Hu), Elliptic Fourier Descriptors (EFD), Radial Distance Functions (RDF), and Local Binary Patterns (LBP); but these images require manual preprocessing, which is definitely time intensive. Okayasu et al. [23] implemented a vision-based approach to classify mosquito species. By collecting image datasets of three types of mosquito species, a deep learning approach together with data augmentation have been considered effective for the mosquito species classification with greater accuracy. The aim of this study is to compare traditional classification systems focused on handcraft features and CNN-related deep learning methods. However, the images captured with a smartphone do not show the differences in colour if the shooting environment is dark, because a smartphone camera cannot provide a quality capture compared to reflex single-lens cameras eventually resulting in misclassification.
An automatic mosquito detection platform based on vision, which can function for the inhabitants of closed-perimeter mosquitoes is deployed in [24]. The module would classify mosquitoes extracting morphological features from other species, such as bees and flies, accompanied by the help of support vector machine-based classification. This vision is based on the idea that addressing the problem of mosquito detection throughout this way provides an important alternative to conventional mosquito monitoring, mapping, and sampling methods. The C-SVC SVM module, which has been proposed in this method, has a maximum accuracy of 85.2% in terms of the proportion of images obtained when detecting mosquitoes. Nonetheless, they should have included additional features in classification efficiency and research extension to compare alternative methods, such as neural networks, genetic algorithms, and in the case of mosquito classification, defining distinguishable characteristics for sub-species classification of mosquitoes.
A solution to detecting Aedes aegypti species using images has been proposed in [25]. They have used a 500x optical zoom sensor, and support vector machine. A confusion matrix has also been used to demonstrate the system's precision in detecting Aedes Aegypti. However, it is still necessary to develop a faster program for detecting mosquito properties through the implementation of other techniques in MATLAB as a part of the framework, since processing the file can take a long time. Moreover, this approach is costly and only solves a binary classification problem. Similarly, another image processing and deep learning method for the bacteria recognition system has been introduced in [26], but the results indicated that standard image resolution datasets for bacteria could be better in the future. In [27], 7 species and 60 mosquitoes have been classified using the random forest algorithm, depending on classifiers and the denoted pixel values. The verification however, has been inadequate since the number of images is around 60. In [28], research on imaging techniques for the classification of insects is presented. However, mosquitoes are not categorized there.
A dengue detection method using cell phone vision sensors with a lightweight object recognition algorithm has been presented in [29]. It has provided an efficient way to use the mobile phone's computing power to detect dengue and a minor medical patch that transforms the various colors of shades. Consequently, in defining input images, the active contouring algorithm was more efficient, but in order to connect an image of the object, the algorithm requires several complex sets of image operations. In [30], the anticipation number of confirmed dengue fever (DF) with three different prediction models based on machine learning and deep learning approach has been applied. Among www.aetic.theiaer.org the three different models, the GA-RNN model provides the best performance. However, this work needs to improve the output of the LR model by reducing its appropriate shifting effect.
Current dengue mosquito research usually reflects a set of domains of general detection methods and does not represent the state-of-the-art approaches that include any innovative solutions in the field of deep learning-based detection. This paper analyses deep learning-based dengue mosquito detection approaches in a systematic way. Using the advanced deep learning techniques, we present possible solutions and future research directions in image processing.

Proposed Methodology
In this section, we represent the implementation procedure of the detection algorithm. This research is aimed at training the CNN object detection classifier for dengue mosquito detection. The diagram of workflow for complete dengue mosquito detection procedure is presented in Figure 1 which consists of the following blocks: data pre-processing (data acquisition, data annotation), data processing, training and testing the deep classifier.

Data Pre-processing
Initially, data is required to implement the pre-processing task collected in compliance with the desired condition. Detailed data pre-processing procedures are depicted in Figure 2 and described in the next sections. www.aetic.theiaer.org

Data Acquisition
The input image is initially identified as an image of a dengue mosquito or another image of insects (For example, flies, grasshopper, moths). Normal mosquitoes have a slim body with long legs and fairly long antennas with very similar shape and size segments. Dengue mosquitoes have longerlegs than the normal mosquitoes and have white stripes on their legs and bodies. Data has been obtained from the video (transformed into images), self-captured images, and various relevant internet resources under different conditions. As mosquitoes and insects always move, it is difficult to obtain clear images directly while they are flying. A total of 241 dengue mosquito images are collected to form a dataset. Samples of the dataset is shown in Figure 3 and Figure 4. After the data acquisition process, the images have been reformatted to a predefined size of 800*600. The resized image involves the conversion into the same size of many of the images. The entire data set is initially distributed into two datasets, the training set, and the test set. The total dataset is divided into 85% and 15% for training and testing respectively.

Data Annotation
This is a method of identifying the appropriate data in different formats, such as text, video, or images. After collecting the dengue mosquitoes' images from the different environmental conditions, the desired image is selected using the labeling software. It draws visual boxes in the image around a dengue mosquito and saves the XML files for that image containing the label data for each image automatically. The dengue mosquito has been identified according to the region of interest (ROI) and labeled 'dengue mosquito' in the image for accurate detection. Figure 5 shows the annotated portion of the sample image data.

Data Processing
Data processing is a method of transforming the data from a given process into a significantly more usable and preferred form by making it relevant and informative. After pre-processing phase, two steps are considered in the data process which generates comma separated value (CSV) and TFRecords as depicted in Figure 6. A CSV file is a delimited text file using a comma to distinguish values. Using this, a plain-text file has been generated in a tabular format for easy export and structured import of the data. A TFRecord file stores the data as a binary string sequence which means that the structure of the data needs to be specified before it is written to the file. With the labeled images, we need to build TFR records that are used for object detector training as an input data. In this way, a separate data file has been developed for storing a series of binary records.

Process of training
Lastly, before the training, we have to generate a label map and a training configuration file for the process. A text type document graph file, regarded as Label Map, is generated where the label map indicates if the object currently resides, and it is achieved by specifying the mapping of class names to class ID numbers. We have one 'dengue' mark so the ID number is 1. Eventually, the training system for object detection needs to be designed. It is the last step before beginning the dengue detector classifier training, and the training pipeline for object detection must be configured. It determines which model should be used and which parameters are to be chosen for training. We also declared the number of evaluation steps of training in the configuration file, set the path of TFRecord test and train data files, number of evaluation data, label-map.pbtxt path. For object detection, modern meta-architectures use CNN. We have considered Faster R-CNN, R-FCN, and SSD metaarchitecture for dengue mosquito detection.

Detected Data Evaluation
After completing all the processes for the detection of dengue mosquitoes, the images of the different environmental situations are evaluated with the expected bounding boxes. For bounding box detection, Figure 7 shows all predicted regions in an image result where we identified dengue mosquitoes using the three selected models: Faster R-CNN, R-FCN, and SSD. In the observed bounding boxes, the percentages of detections are presented here and it varies in each image for the different models.
We are starting with the first dengue mosquito input image, followed by the output images of three different models. In Faster R-CNN, the detection percentage is 80%, and the detection bounding box is one. However, there are three bounding detection boxes for R-FCNN and the detection www.aetic.theiaer.org percentage are 82%, 71%, and 51%, respectively, while there are two bounding boxes for the SSD model and the detection percentage is 55%. Detection bounding box percentages for the Faster R-CNN is 82%, R-FCN is 71%, 58%, 50% and SSD is 78%. They are experienced for the second input image. Consequently, for the third input image, Faster R-CNN-88% and 53%, R-FCN-63% and 69%, and SSD-72%, 70% and 55% are shown in the percentages of bounding box detection. For the fourth sample input image, detection bounding box percentages for Faster R-CNN is 69% and 64%, R-FCN-57% and 54%, and SSD-No detection. Similarly, analyzing for the last input image, detection bounding box percentages have been identified for Faster R-CNN-75% and 59%, R-FCN-91%, 75%, 65%, 62% and 59%, and SSD-No detection.  [31] model, the Faster R-CNN, R-FCN, and SSD models do not predict single bounding boxes for the neural network here, that is why we have experienced the multiple bounding boxes for the three models.

Experimental Analysis
The experiment has been performed on secondary collected dataset of images from the different online resources and used python codes to resize all the raw images into 800x600 shapes. Due to the limited number of online images and difficulty to capture the images for the flying nature of the objects, forming a good dataset was challenging. For each image, the collected resized dengue mosquito images are acquired through a bounding box labelling process and an annotation file has been created containing bounding boxes. The collected image dataset is then divided into a training www.aetic.theiaer.org set and a test set for performance evaluation. The train and test datasets can be divided into 70% -30%, 80% -20%, but due to the limited dataset, in the experimental evaluation we split them into 85% and15%. We have taken other insect images (as non-dengue) for detection to evaluate whether or not it is a dengue mosquito for an accurate depiction. The deep learning framework TensorFlow [32] version 1.x has been used to run the code for training on Google's Colab platform. We have used Intel Bay Trail M Quad-Core 3540 Processor, up to 2.66GHz, for training and validation of dengue mosquito detection algorithms on a laptop.
The accuracy of the model has differed [33] due to the combination of the meta-architecture and the extractor. To select the optimal model, a comparison of these combinations is therefore required. By combining meta-architectures and feature extractors, we trained dengue mosquito detection models and compared each model's accuracy. The Faster R-CNN Inception V2, R-FCN RestNet 101, and SSD Mobilenet V2 models have been trained, compared, and analyzed.
In the performance evaluation, we have used the following statistical parameter: true positive (TP), true negative (TN), false positive (FP), false negative (FN), precision, and recall [34]. The true positive is the original dengue mosquito region that have been correctly identified. True negative represents a substantial part that does not pertain to the dengue mosquito region and is not performed to detect. False-positive represents an area that has been detected which is not present in the original dengue mosquito region. False negative refers to an area that is not identified and is also within the dengue mosquito's original region. After that, we recognize the coordinates of the original region area. So, considering the correctly detected area, we can determine the true positive. In the same way, we have evaluated other statistical parameters consequently. Precision or specificity represents the percentage of accurate positive predictions between several and it is calculated by   Table 1 presents the performance of the detection which is evaluated with the four parameters: FP, FN, precision, and recall. Figure 8 demonstrates the efficiency comparison of this experiment's test set, which includes Faster R-CNN, R-FCN, and SSD methods. Throughout this experiment, we can conclude that the Faster R-CNN with Inception V2 model shows the better performance than others by exhibiting the highest detection accuracy of 95.19%. Figure 9 represents the erroneous results of non-dengue mosquito image detection. As depicted in Figure 9(b), the image for Faster R-CNN is not detected as a dengue mosquito image, which is an accurate result of this model. Figures 9(c) and 9(d), nevertheless, show the error result for R-FCN and SSD models, as the image is detected as a dengue mosquito of 77% and 55%, respectively.
After comparing the output images for the three models, in particular, it has been analysed that SSD models show flawed results relative to other models. As shown in Figure 7, SSD failed to detect the last two images. Previous research in [35] stated that on small objects, SSD models show poor efficiency. Subsequently, all the images are detected for Faster R-CNN and R-FCN, but the percentages of every image varied due to the different training steps, as well as the period of completion of the training steps for each model. Faster R-CNN outperforms the R-CNN in terms of training steps. The R-FCN requires more time for each image to complete the training steps and detect www.aetic.theiaer.org  Table 2. In this case, we can conclude that Faster R-CNN has performed a better detection accuracy than the R-FCN and SSD models. It is mentionable that due to the processor limitation, we fail to train the models for more than 10 hours. After 10 hours, the training steps were immediately terminated. If more steps are trained, the detection precision can be improved.

Conclusion
In this paper, we have used deep learning algorithm and image processing techniques to detect dengue mosquitos from the images obtained from various relevant sources. To identify dengue mosquitoes, we use the Faster R-CNN with InceptionV2, R-FCN with ResNet101, and SSD with MobilenetV2 models. We train these three models and apply them for real-time recognition by using the training data. The experimental results show that the Faster R-CNN with the InceptionV2 model can reliably identify whether the dengue mosquito exists or not in an image compared to the other two proposed models. In addition, the method of dengue mosquito recognition from the image is very precise and accurate with this deep learning-based approach, as opposed to other mosquito species. In the future, work will be conducted to apply the method in dynamic situations in real-time environments.