A Review on Physiological Signal Based Emotion Detection

: Emotions are feelings that are the result of biochemical processes in the body that are influenced by a variety of factors such as one's state of mind, situations, experiences, and surrounding environment. Emotions have an impact on one's ability to think and act. People interact with each other to share their thoughts and feelings. Emotions play a vital role in the field of medicine and can also strengthen the human computer interaction. There are different techniques being used to detect emotions based on facial features, texts, speech, and physiological signals. One of the physiological signal breathing is a parameter which represents an emotion. The rational belief that different breathing habits are correlated with different emotions has expanded the evidence for a connection between breathing and emotion. In this manuscript different recent investigations about the emotion recognition using respiration patterns have been reviewed. The aim of the survey is to sum up the latest technologies and techniques to help researchers develop a global solution for emotional detection system. Various researchers use benchmark datasets and few of them created their own dataset for emotion recognition. It is observed that many investigators used invasive sensors to acquire respiration signals that makes subject uncomfortable and conscious that affects the results. The numbers of subjects involved in the studies reviewed are of the same age and race which is the reason why the results obtained in those studies cannot be applied to diverse population. There is no single global solution exist.


Introduction
Emotions are an integral aspect of everyday life and are focused on conscious mental reaction to events, objects, and is linked to different physiological activities. Emotions affects the way people think, comply, or feel around other people. Emotions are composed of five coordinated activities: mental situation assessment, clinical findings (central and autonomous nervous system response), actions, facial expressions, and thoughts [1]. Emotion recognition plays a vital role in field of medicine specifically for patients with psycho-neural disorder or patients with learning disability and autism [2]. Children with autism spectrum disorder (ASD) typically find it difficult to understand, communicate and regulate feelings [2]. It can also strengthen human-machine engagement by using emotional information in conversation [2]. There are different methods being used for the detection of emotions. The most famous techniques are based on individual physical signs that includes facial expressions, speech, movements, and gestures, etc. Various investigations use facial expressions, human speech, and texts for emotions recognition [3][4][5]. Recognizing emotion with physical signals is not accurate and it is convenient for people to hide their true feelings to alter their voice or to manipulate their face [6]. These complications contribute to emotional recognition using physiological signals. Electrocardiogram (ECG), Galvanic skin www.aetic.theiaer.org response (GSR), Respiration rate, body temperature, Electromyography (EMG), and blood pressure are different physiological signals. In the mechanism of emotional recognition, different researchers use different physiological signals. Emotional understanding of the physiological signals is promising because they are unconscious representations and are not consciously manipulated by humans [7]. These physiological signals can be recorded by using different wearable and attachable sensors. The sensible assumption that various breathing activity is connected to various feelings has increased evidence of the connection between breathing and emotion. Respiration represents the emotions [8] as various breathing patterns indicates different emotions [9]. In this manuscript different investigation that detects emotion using respiration patterns were reviewed.

Literature Review
For the classification of emotions, Augsburg dataset of physiological signals were used by [10]. Augsburg dataset, that comprises of twenty-five recordings for each of these four emotions (Joy, Anger, Sadness, and pleasure) that were induced by using the musical induction method. Four types of physiological signals were acquired that includes electrocardiogram (ECG), electromyography (EMG), respiration, and skin conductivity (SC). Ensemble Empirical Mode Decomposition (EEMD) method was used to derive time-domain, time-frequency, non-linear, and intrinsic mode function (IMF) features. In the time domain seventy-three features from ECG, sixty-two from respiration, and twenty-one from both SC & EMG were extracted. One-hundred-ten features from EMG, eighty-four from both ECG and respiration were extracted from time-frequency. Five non-linear features were derived from each signal. C4. 5 Decision tree (DT) was used to reduce the number of features to the best five features that have great impact on the classification. Correct classification rate (CRR) was used to calculate the performance by using the total number of samples and the number of correctly classified samples. CRR achieved an accuracy of 88%on the selected best features.
The study [11] proposed a methodology to recognize emotions from ECG and respiratory signals as well as with the combination (cardio-respiratory) elicited by watching music video clips. Furthermore, extracted features from the heart, respiration, and their combination were classified using Support Vector Machine (SVM). The study [11] used DEAP (Dataset for Emotion Analysis using Physiological signals) dataset, which contains electrocardiogram (ECG), galvanic skin response (GSR), blood volume pressure, respiration rate, skin temperature, electromyography (EMG), and electrooculogram signals of thirty-two subjects (50% males and 50% females) of age ranges between 19-37 years. In this study only two signals ECG and respiration rate were used for emotion recognition. Thirteen features were collected at a sampling rate of 512 Hz while watching forty music video clips of one-minute length with the ranking of 1-9 that is negative/low to positive/high by using arousal, valence, and liking. These thirteen features include heart rate (HR), R-R interval, LF (Low Frequency) and HF (High Frequency), RSA (respiratory sinus arrhythmia) power, RSA frequency, and RSA amplitude were cardiac features. Breathing frequency and amplitude were respiratory features as well as the amplitude ratio of the RSA to the respiratory oscillation, the difference between the respiratory and RSA frequencies, the phase difference of the respiration and the RSA, the slope of phase difference, and its standard deviation were extracted by combining the RSA and respiration. Low/high liking, low/high arousal, and positive/negative valence were performed for the classification of ECG and respiration signals. ECG signals gives an accuracy of 74%, 71%, and 72% for liking, arousal, and valence, respectively. Respiration rate shows an accuracy of 73% liking, 72% arousal, and 70% valence. On the other hand, 76% accuracy for liking and 74% for arousal and valence is achieved by using the combination of HR and respiration for classification.
[12] proposed a system that uses different physiological signal such as ECG, skin temperature (ST), galvanic skin response (GSR), EMG, HR, respiration rate, blood oxygen level, systolic and diastolic blood pressure (SBP) and (DBP) in the classification of different emotions anger, joy and neutral. A total of three healthy male subjects of the ages eighteen to nineteen years took part in the data collection process. Zephyr BioPatch chest strap was used for ECG and respiration signal collection. E4 wristband was used to collect GSR, ST, and blood volume pulse (BVP) while SBP, DBP, blood oxygen and glucose level were measured using CONTECT Bluetooth enabled devices. E4 wristband is worn by the subject and Zephyr chest strap was worn below the pectoral muscle. Blood glucose readings took twice at the start and at the www.aetic.theiaer.org end of the experiment. Baseline signals were recorded when the subject was sitting relaxed and after that emotion joy was induced using movie clips and anger was induced by cognitive techniques. The physiological signal of each subject was recorded two times in the emotion induction state. Four physiological signals SBP, DBP, EMG, and blood oxygen level were excluded because they did not have many contributions to emotion classification (joy vs anger). Bandpass filtering was applied to the ECG, respiration, GSR, ST, HR, and BVP. Due to a small dataset that has two instances for each emotion class predictive and statistical analysis of the data was performed. The emotion was classified as Happiness when the signal has high GSR, HR, and moderate respiration value. While angriness has low GSR, ST, high respiration, and moderate HR values.
In the study [13] MAHNOB-HCI dataset of physiological signal was used for emotion classification in the arousal valence model. ECG, respiration, skin temperature, and galvanic skin response (GSR) were utilized for emotion classification. The MAHNOB-HCI multimodal dataset consists of twenty-four subjects and twenty movie clips. Baseline wandering was removed to smooth the signals. The signal of the first and last thirty seconds was removed because this signal was due to the movie clip of neutral emotion. Butterworth filter was applied on GSR, respiration, and ECG signal with a cutoff frequency of 0.3 Hz, 0.45 Hz, and 0.5 Hz, respectively. Nine emotions that includes happiness, amusement, neutral, fear, anxiety, fear, disgust, surprise, and sadness were collected by using six cameras at a sampling rate of 256 Hz. The emotions were classified into two classes high, and low for arousal, whereas positive and negative for valence. Level Feature Fusion (LFF) algorithm was used to combine the modalities before training and SVM was used for classification. SVM achieved an accuracy of 64.23% arousal and 68.75% valence.
MAHNOB-HCI dataset of physiological signals was used for emotion classification into valence and arousal classes by [14]. This dataset has the physiological signals (ECG, GSR, ST, and respiration amplitude) of twenty-four volunteers induced by showing twenty movie clips. Baseline wandering was removed to smooth the signals of the dataset. The signal of the first and last thirty seconds was removed because this signal was due to the movie clip of neutral emotion. Butterworth filter was applied on GSR, ECG, and respiration signal with a cutoff frequency of 0.3 Hz, 0.7 Hz, and 1 Hz, respectively. Heart rate variability (HRV) was calculated from the ECG signal and respiration rate from respiration amplitude. A total of one hundred sixty-nine features were extracted from these signals. SVM with different kernels was used for classification into two classes High and low in arousal, negative and positive in valence. SVM with RBF kernel shows better accuracy of 68.5% in arousal and 68.75% in valence class. The classification process was implemented on Raspberry Pi III model B (RPi) because it easily interacts with the physiological wearable sensors by using Bluetooth or WIFI modules for emotion recognition. An emotion recognition system using ECG and respiration signals was presented by [15]. A total of eleven healthy subjects took part in the experiment. In this experiment joy, sadness, anger, and pleasant emotions were induced using movie clips. Physiological signals ECG and respiration rate were acquired using a smart wearable device designed by [16] during the emotion elicitation process. A questionnaire to ensure that the emotion elicitation was filled by each subject after the emotion elicitation process. The signal of the subjects that did not report the expected emotion was discarded. ECG and respiration signals were sampled at a rate of 250 Hz and 10 Hz, respectively. P-waves, T-waves and QRS complex were measured of ECG signal and a total of seventy-eight features were extracted from ECG signal that includes mean, mode, median, variance, minimum, maximum, range, R-R, P-P, Q-Q, S-S, T-T, P-Q, Q-S and S-T time intervals etc. A low pass filter was applied on respiration signal and a total of sixty-seven features were extracted. More features mean more information, but it also leads to computational complexity. So, the Genetic Algorithm was used to select fourteen key features from hundred and fortyfive features. These fourteen key features were then fed into the SVM for classification that was tested with leave-one-out cross-validation and an accuracy of 81.82%, 63.64%, 54.55%, and 30% for joy, sadness, anger, and pleasure was achieved.
An emotion recognition system based on respiration signals was proposed by [17]. The deep learning method was used to recognize emotions from respiration signals. The Dataset for Emotion Analysis using Physiological signals (DEAP) and Augsburg datasets were used to classify the emotions. DEAP dataset contains the physiological signals of thirty-two subjects (50% males and 50% females) of age ranges between 19-37 years. The respiration signals were collected at a sampling rate of 32 Hz. These signals were www.aetic.theiaer.org collected by the natural reaction of the subjects on forty music video clips of one-minute length. Augsburg dataset was comprising of respiration signals of four emotions induced by using the musical induction method in twenty-five days. In the Augsburg dataset, the respiration signals were also collected at a sampling rate of 32 Hz. Russel's circumplex theory's Arousal and Valance dimensions were used to classify four emotions. Sparse Auto Encoder (SAE) was used to extract feature from unlabeled respiration signals or respiration patterns. The emotion recognition task was also distributed into two binary classification tasks that was "High valence, low arousal" and "low valence, high arousal". Logistic regression (LR) was used to categorize the features into two classification tasks as one for valence classification task and the other for the arousal classification task. The SAE used hidden layers in which the 1st hidden layer consists of 200 neurons and the 2nd hidden layer contains 50 neurons for feature extraction. These extracted features were used as input for LR. To test the efficiency of this method, experiments were performed by splitting the dataset as 80% for training and 20% for testing. On the DEAP dataset, LR achieved an accuracy of 73.06% and 80.78% for valence and arousal classification, respectively. On Augsburg affection dataset the LR model shows an accuracy of 83.72% for valence and 85.89% arousal classification.
Four types of physiological signals including ECG, respiration rate, blood pressure (BP), and temperature of inhaling and exhaling of respiration were used for emotion classification by [18]. Suitable sensors were used to collect previously described signals from a single subject at the time of emotion induction by movie clips of three minutes length. The movie clip was played at a laptop placed at distance of one meter and signal was acquired for one minute during the emotion elicitation process. Noise was removed by applying the Low-pass filter at the raw ECG and respiration signal. Raw ECG and respiration signal were segmented, and the mid portion of the signal was taken at a resampling size of 4 Hz for further processing. Built-in MATLAB functions were used to obtain a total of nineteen statistical, time, and frequency domain features from the preprocessed ECG and respiration signals. ANN with three layers and two-hundred hidden layer neurons was used for classification. A total of six emotions happy, sad, fear, disgust, anger, and surprise were classified with an average accuracy of 80%.
The physiological signal interpretation framework (Emo-CSI) was presented for emotion classification using physiological signals [19]. The Emo-CSI framework uses heart rate, respiration pattern, skin humidity, and conductivity to classify emotions into displeasure, neutral, pleasure, calm, and excited emotions. Pulse sensor ESEN220 attached to the subject's earlobe was used to acquire heart rate. Two sensors were attached to the subject body to acquire breathing pattern data. The chest belt uses a switching technique and the second sensor that was attached to the waist belt of the subject uses a valuable resistance technique. SHT31-A sensor was used to collect skin temperature and humidity (SKT and SKH) data. Arduino Mega 2560 was used as a microcontroller for these sensors. Twenty-three subjects including ten males and thirteen females of age ranges between 20-27 years took part in the data collection process. These previously discussed sensors were attached to the subject and a baseline state of these signals was recorded by showing a black screen for fifteen seconds. After that, emotions were induced using pictures with a matching sound and previously described signals were acquired. A self-assessment test to validate emotion induction was performed by each subject. These collected data signals from physical devices were normalized and statistical analysis was performed to summarize the signals. A total of thirty-two features including average, maximum, minimum, and standard deviation (SD) were extracted from the previously collected physiological signals. Support vector machine (SVM), decision tree (DT), and artificial neural network (ANN) were used for the classification of emotions along arousal and valence classes. Valence has displeasure, neutral, and pleasure emotion while Arousal comprises on calm, medium, and exciting emotions. SVM outperformed the other two classification models with an accuracy of 55.45% and 59% along valence and arousal classes, respectively.
In the study [20] human emotions were classified using respiration signals. The study [20] has two objectives in which the first objective was to analyze the effect of the created emotions on breathing signals and the second objective was to calculate the emotions in two different ways that were Fast Fourier Transform (FFT) and neural networks (NN). In this study, investigators created their own dataset in al-Khwarizmi College of Engineering University of Baghdad's experimental lab. Twenty-five males and females of age ranges between 18-25 years took part in this experiment. Seven movie clips were shown to www.aetic.theiaer.org each subject that include movie clips to induce six basic emotions (happy, sadness, surprise, anger, anxiety, and disgust) and a neutral emotion. All these clips have different length, but the maximum length of clips was kept four-minutes. Each subject goes into the lab alone and after each movie clip, an appropriate break was provided to get relaxed for next emotion. This study uses the BIOPAC instrument, airflow sensor, and mouthpiece to collect breathing signals. The breathing signals contain the features: airflow rate and airflow volume that were acquired from the BIOPAC instrument. The BIOPAC instrument represents the respiration signals with the help of an airflow sensor and mouthpiece. The extracted features were classified using MATLAB and orange open-source software. FFT uses orange an open-source program for classification and achieved 80% result. Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) were also used for classification, LR performs better with a recall of 80%.
A system using HRV signal based on respiration rate for emotion classification was proposed by [21]. Twenty-five subjects including twelve males and thirteen females of the age ranges between 18-35 years took part in the data collection process. Four different emotions joy, fear, anger, and sadness were evoked in the subjects by showing movie clips. Two videos per emotion were watched by each subject in two days. ECG and respiration signals were recorded using a BIOPAC device at the time of the subject watching movie clips. ECG signals were recorded at sampling rate of 1 kHz and respiration signals at a sampling rate of 125 Hz. A Wavelet-base detector was used to detect beat occurrence from ECG signal and heart rate was estimated from that beat occurrence. Low pass filtering with a cut-off frequency of 0.03 Hz was applied to obtain the HRV signal. Bandpass filter from 0.04-0.80 Hz was used to filter breathing signals. Power spectral density (PSD) of modulating signal (m(t)) was computed by Welch periodogram. The largest peak of respiratory signal (r(t)) in the PSD was used to calculate respiratory frequency (Fr). high Frequency (HF) band was redefined calculating spectral correlation between m(t) and r(t) at a bandwidth of 0.02 Hz and called as shifted and resized HF band (SCHF). SCHF readings with Fr<0.1 Hz were excluded due to overlapping between LF and HF. Synthetic Modulating signal (ms(t)) was calculated by summing up the HF and LF components. The Lillie test was used to evaluate the normalization of the collected data. T-test and Wilcoxon tests were applied for statistical analysis of the data. Features derived from the SCHF band were PHFSC, PLFnSC, RSC, DHF, amax, bmax, and rmax and features extracted from the basic HF band are PHF, PLFnF, and Fr. Area under the receiver operating characteristic curve (AUC) was calculated to find the capability of features in emotion classification. Features having AUC greater than or equal to 0.70 are considered for further process. Accuracy was calculated using the leave-one-out cross-validation method. PANAS-X scale was used to validate the collected dataset. Accuracy of two-class classification (Relax vs Joy, Joy vs Sadness and Joy vs Anger) using HRV signals and respiration rate was 79.2%, 77.8%, and 77.3% respectively. Neutral state vs Negative emotions shows no statistical differences during the investigation [21].
A non-invasive Bio-Radar that works on the Doppler effect was used for respiration rate signal collection for emotion classification by [22]. Bio-Radar uses electromagnetic waves that were transmitted to the chest, reflected echo was received and phase modulation was created in the received signal [23]. Radar has two antennas one acts as a transmitter, the other as a receiver and a software-defined radio (SDR) based front-end. Nine subjects took part in the data collection process that was conducted on three different days with a gap of two days. Movie clips were used to evoke fear and happiness emotions in the subject and respiration signals were acquired. Main purpose of this investigation [22] was to validate the usage of Bio-Radar for emotion recognition. So, a certified BIOPAC MP160 device was also used to record the respiration signals during the emotion induction process. A Bio-Radar signal was a low pass signal with a bandwidth of 0.2-2 Hz. The respiratory signal bandwidth of a healthy adult is 0.2-0.4 Hz [24], in hyperpnea case 0.4-0.8 Hz and 0.8-2 Hz with other random motions of the body [25]. The received signal was inspected in real-time because phase adjustment was needed in real-time due to many reasons. Firstly, the arc position and mean value changes each time the system restarts, the distance between the radar and the subject chest, and the motion of the subject during an experiment. The sampling rate of the received signal was down sampled from 100 kHz to 1 kHz in real-time using LabVIEW. BIOPAC signal was also acquired at a sampling rate of 1 kHz. An algorithm was designed to remove DC components to form the radar signal. By observation of both radar and BIOPAC signals, it was observed that Bio-Radar www.aetic.theiaer.org also captures the sudden movement of the subject due to the induction of emotion that plays an important role in the emotion recognition process. A second-order Butterworth bandpass of 0.05-1.5 Hz was applied to the signal and divided into one-minute segments for feature extraction. A total of twelve features were extracted from the signals that include statistical features mean, variance, respiration rate, waveform width and time between two peaks, power spectral density (PSD) as a feature between 0-1.5 Hz, and their ratio. Both binary and multiclass classification was applied using three different classifiers (SVM, KNN, RF) on both BIOPAC and Bio-Radar signals with a dataset of 221 observations for each emotion. The accuracy rate of SVM, KNN, and RF was 58.1%, 58.2%, and 65.2% respectively when all extracted features of the Bio-Radar signal were used for classification that was less than the BIOPAC signal. Accuracy improved to 63.3%, 66.2%, and 67.1% when a few important features were used for classification. The accuracy of all the three classifiers remains between 60-70% which shows that respiration signals acquired by Bio-Radar can also be used for emotion recognition.
A non-invasive sensor was used by [26] to acquire respiration rate for emotion recognition. An infrared temperature sensor MLX90614DAA that have an accuracy of ± 0.5 ®C and it works in the range of 10-40 ®C temperature. The sensor has two parts; one is a thermopile that detects the infrared radiation emitted by an object and the other is the signal conditioning circuit that converts the output of a thermopile into a voltage range that is measured as temperature. The sensor was placed in a mouth mask near the nostrils to sense the temperature variations at the time of inhaling and exhaling. Temperature is slightly higher at the time of exhaling and low at the time of inhale and this cycle repeats for each respiration cycle. So, the sensor senses these temperatures and sent this data to a microcontroller (ATMEGA328P). The microcontroller provides the respiration rate by calculating the rise in the temperature per minute (or by calculating the peak values in the signal). The pattern of the signal was different for each person due to different emotions. The respiration rate signal of ten healthy subjects of ages ranges between 20-40 years was collected by inducing different emotions like happiness, fear, anger, and relaxation. It is observed during the investigation [26] that the amplitude and frequency of respiration signal were lower during relaxed or happy (positive) emotions, and the frequency of the respiration signal increases during the anger or fear (negative) emotions. Eight statistical features including mean, median, minimum, maximum, range, standard deviation, median absolute, mean absolute, L1, L2, and max norm were extracted from each signal of positive and negative emotion. It was observed that the values of these features overlap for positive and negative emotions so these features cannot be used for effective classification. So, the wavelet transform was applied using Symlet mother wavelet 2 and the signal was decomposed into five levels. The statistical features extracted at fifth level shows variation between positive and negative emotions, so these features can be used to classify the signal using clustering techniques. An emotion recognition system using physiological signal of gamers during gameplay was presented by [27]. A dataset of physiological signals of thirty-three Italian gamers (twenty-nine males and four females) ages ranges between eighteen to forty year was acquired. They usually play video games for almost four days per week. Four types of physiological signals that includes ECG, EMG, GSR, and respiration rate. Arduino Due was used as a microcontroller to collect data from the sensors and sent it to the computer using Bluetooth protocol. Olimex-EKG-EMG shield compatible with Arduino was used for ECG and EMG signal collection. GSR signal was collected by placing two electrodes on the two fingertips of the left hand. NTC Thermistor was placed under the nose of the subject for respiration rate. Racing Games (RAGA) Project Cars and RedOut that can be played in Virtual Reality (VR) or on a standard monitor were selected for emotion stimulation. After two hours of the gaming session, each subject mapped their emotion on the arousal and valence scale. Physiological signals were acquired at different frequencies, so their frequencies are uniformed at a sampling rate of 556 Hz. Signals for each game with VR and without VR were separated to acquire four different datasets. Equiripple FIR bandpass with a cutoff frequency of 35 Hz and 20 Hz was applied to ECG and EMG signals. Heart rate (HR) is acquired from an ECG signal. Moving Average filter with a sampling rate of one-second in the time domain was applied on respiration signal. First-order Butterworth low-pass with a cutoff frequency of 5 Hz was applied on the GSR signal. One feature from ECG signal, thirty-eight from respiration, sixty-two from GSR, and seventy-seven from EMG signal were extracted. Irrelevant and redundant features were www.aetic.theiaer.org removed by calculating the Pearson linear correlation between different features. SVM, RF, Gradient Boosting, and Gaussian process regression (GPR) were used for emotion prediction. Each of these models was validated using five-fold cross-validation that shows GPR as the best algorithm for emotion detection.
Auto-Mutual Information Function (AMIF) and Cross-Mutual Information Function (CMIF) were used for emotion recognition by [28]. AMIF was applied on HRV signals and CMIF was used to find the complex coupling between HRV and respiratory signals. ECG and respiration signals of twenty-five healthy subjects of age ranges between eighteen to fifty years were collected during the emotion elicitation process. ECG signals were sampled at 1 kHz and respiratory signal r(t) at 125 Hz using an Mp100 BIOPAC system. Four emotions joy, fear, anger, and sadness were induced using movie clips (two clips per emotion) in two days. A Wavelet-base detector was used to collect RR intervals from ECG signals. RR time series RR(t) was sampled at 4 Hz by linear interpolation. LF band, HF band HFSCHF are filtered from RR(t). In HFSCHF, the Hf band was redefined and centered at frequency of respiration (Fr) and limits were calculated by the mean of cross-correlation between the power spectrum of HRV and respiration. Bandpass filter from 0.04 Hz to 0.8 Hz was applied on r(t) and down sampled to 4 Hz. AMIF and CMIF were non-linear equivalents of the auto-correlation function based on the Shannon entropy. AMIF was applied on filtered RR(t) while CMIF0 and CMIFmax values were found from the coupling of RR(t) and r(t). Lillie test was applied on all signals to check the normal distribution of data. T-test or Wilcoxon test was applied to evaluate the differences between each emotion. The results show that the AMIF applied to the RR(t) classifies between relax and joy, joy, and each negative valence conditions and fear, sadness, and anger with an accuracy higher than 70% and area under the receiver operating characteristic curve index AUC >= 0.70.

Discussion and Analysis
The recent investigations to detect emotions using respiration signals have been reviewed. In these investigations different sensors augmented with machine learning were presented, which subsequently yield in emotion detection system. Authors of study [10,11,13,14,17] used benchmark datasets for the emotion classification. The authors of the study [10] used the Augsburg dataset of physiological signals for emotion detection, which has only twenty-five recordings for each emotion (joy, anger, pleasure, and sadness), yielding a dataset of only a hundred instances. The authors did not describe the sensors used for data collection in the manuscript and only explored decision trees for classification purposes. Ratio of data used for training and testing of the decision tree is missing in the manuscript.
The DEAP dataset is used in study [11] for the classification of forty music videos in low/ high liking, arousal, and positive/negative valance. Respiration signal is acquired from the ECG signal which is not validated. Furthermore, the DEAP dataset subjects are all of the same age, and the good results obtained in the study [11] with specific features may not be generalizable to a more diverse population.
In study [13,14] Mahnob-Hci data set is used for emotion classification of nine emotions on arousal and valance scale. Physiological signals were collected using cameras that, due to the effects of environmental variables such as light, temperature, and so on, could not be used in real time. Feature fusion is used to classify emotions on arousal and valance scale in [13]. The ratio of data used for training and testing is missing in the manuscript. The trained model in study [13] is implemented on RPi in study [14]. In study [17] both Augsburg and DEAP datasets are used for emotion classification. Augsburg dataset is small dataset having only twenty-five instances against each emotion (joy, anger, pleasure, and sadness). While the DEAP dataset subjects are all of the same age, and the good results obtained may not be generalizable to a more diverse population. Comparison of the studies used benchmark datasets is show in Table 1.
Few authors created their own dataset by using different sensor to classify emotions. In study [12] four invasive sensors are used to collect physiological signals of three subjects two times for each emotion (anger, joy, neutral). Invasiveness of sensor that are attached to different body parts makes the subject uncomfortable. The dataset is small so, only statistical analysis is performed. No machine learning classifier is trained and tested in [12]. [15] utilizes a wearable shirt designed by [16] to collect physiological signals. Just 11 subjects' respiration and ECG signals are obtained, resulting in a limited dataset. The proportion of data used for training and testing, as well as the length of videos used to evoke emotions, www.aetic.theiaer.org are not specified in the manuscript. Using physiological signals, researchers of [18] classified happy, sad, fear, disgust, anger, and surprise emotions. The sensors used for physiological signal acquisition are not mentioned in the manuscript. The manuscript lacks demographic details about the subjects as well as the ratio of training to testing datasets. In the study [19], five intrusive sensors are used to classify emotions. The research includes 23 subjects ranging in age from 20 to 27 years. Consequently, the obtained findings may not be generalizable to a more diverse population. The manuscript lacks the number of records in the dataset as well as the proportion of training and testing data used to train and evaluate the classifier. The intrusiveness of sensors makes participants feel uneasy during the emotion elicitation process. The research classifies six basic emotions and one neutral emotion [20]. To calculate the air flow rate and respiration rate, a Biopac sensor with a mouthpiece and an air flow sensor is used. During the emotion elicitation process, a mouthpiece is inserted into the subject's mouth making subjects uneasy. While the findings of this study are encouraging, they cannot be applied to a diverse public since all of the participants are of the same age and ethnicity. The research [21] classifies four different emotions: joy, fear, anger, and sadness. The MP 100 Biopac system is used to collect respiratory signals. The dataset was limited since only two values per emotion were collected from twenty-five subjects. The dataset has not been used to train any machine learning classifiers. The PAN-S scale is used to differentiate between valance and arousal class. Furthermore, the device used to record the signals is intrusive.
The use of bio radar to classify emotions based on non-invasive vital sign monitoring was validated in a study [22]. The findings were positive, but the analysis only included nine subjects. There is a need to increase the number of subjects in order to test the classifiers validity. In the study [26], a temperature sensor is mounted in a mouth mask near the nostrils to measure the rate of respiration based on temperature changes. Placing a sensor too close to the subject's nostrils causes discomfort. The emotion induction procedure is absent from the manuscript, as is the number of subjects involved in the study is ten. There is no machine learning classifier that has been trained and evaluated. Invasive sensors are www.aetic.theiaer.org being used in a study [27] to detect gamers' emotions during game play. Since it is trained on data obtained by each player, the prediction model is connected to the actual player. The classifier's accuracy is not specified in the document. Since all participants are of the same race, the findings of this study cannot be generalized to a diverse population. In the study [28], four emotions are classified: joy, fear, anger, and sadness. The MP 100 Biopac device is used to collect physiological data. The dataset is limited since physiological signals from twenty-five subjects are obtained twice for each emotion. Furthermore, the system used is invasive in nature. Comparison table of studies in which own dataset is created is given in Table 2. During review it is observed that some investigators used benchmark datasets, and some created their own data set for emotion detection. But the number of subjects involved and the number of instances www.aetic.theiaer.org in the dataset is low. Also, the subjects involved in studies are of the same age and ethnicity that why the results achieved cannot be applied to diverse population. It is also observed that almost all the investigators used invasive sensors that either makes the subject uncomfortable or conscious during the data collection process yielding in the improper data set.

Conclusion
Emotions are people's feelings, which are the results of physiological processes in the body that depend on different elements, such as mind, environments, perceptions, and circumstances. Emotion detection plays vital role in different fields including medicine and artificial intelligence to make human computer interaction better. Researchers have used various methods to detect emotions. This paper reviewed methods implemented by several researchers in combination with technology implementation and interpretation to address the problem of emotion detection using respiration. During the study, it is discovered that some investigators used benchmark datasets for emotion detection, while others developed their own data set. Although good results are achieved in the studies but, the number of subjects involved and the number of instances in the dataset are both small. Furthermore, the participants participating in the experiments are all the same age and race, so the findings obtained cannot be generalized to a diverse population. It is also discovered that almost all of the investigators used intrusive sensors, which either made the subject uncomfortable or conscious during the data collection process, resulting in an incorrect data set. The aim of this review is to summaries most recent techniques and technologies to assist researchers to materialize a global solution for emotion detection system. In the future, there is a need to create a benchmark data set containing the respiration signals of hundreds or thousands of subjects of various ages and ethnicities. Non-invasive respiration signal acquisition systems or methods may also be useful in the future.