A new approach to children's education quarterly

A new approach to children's education quarterly

A study on evaluating the effect of voice activity detection (VAD) approach on speech emotion recognition of autistic children

Document Type : Original Article

Author
Master of Computer Science, Scientific Computing, Department of Computer Science, Mazandaran University, Babolsar, Mazandaran, Iran,
Abstract
Background and Aim: Autism spectrum is a neurological disorder that manifests itself in the early years of a child's development. People with autism face challenges in regulating emotions and express their emotional states in different ways. The current research presents a vocal activity detection (VAD) system adapted to the voices of autistic children.

Methods: The proposed VAD system is a Recurrent Neural Network (RNN) with short-term memory (LSTM) cells. The data includes 25 English-speaking autistic children performing a structured learning activity and was collected as part of the DE-ENIGMA project.

Results: Our experiments show that the pediatric VAD system performs less well than our generic VAD system trained under the same conditions, as we obtain system performance characteristic curve under the curve (ROC-AUC) criteria of 0.662 and 0.850, respectively. The SER results show different performances between capacity and excitation, depending on the VAD system used, with a maximum match correlation coefficient (CCC) of 0.263 and a minimum root mean square error (RMSE) of 0.107.

Conclusion: Although the performance of SER models is generally low, the pediatric VAD system can lead to slightly improved results compared to other VAD systems and especially the VAD-less baseline, which supports the hypothesized importance of pediatric VAD systems in the context under discussion.
Keywords

Subjects


[1] Harár P, Burget R, Dutta MK. “Speech emotion recognition with deep learning,” in 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN) (Noida: IEEE), 2017; 137-140.
[2] Alghifari MF, Gunawan TS, Qadri SA, Kartiwi M, Janin Z. On the use of voice activity detection in speech emotion recognition. Bull. Elect. Eng. Inf. 2019;8: 1324-1332. doi: 10.11591/eei.v8i4.1646
[3] Akçay MB, Oğuz K. Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 2020; 116: 56-76. doi: 10.1016/j.specom.2019.12.001
[4] Baird A, Amiriparian S, Cummins N, Alcorn AM, Batliner A, Pugachevskiy S, et al. “Automatic Classification of autistic child vocalisations: a novel database and results,” in Proceedings of the Interspeech 2017 (Stockholm), 2017; 849-853.
[5] American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5), Arlington, VA: APA. 2013.
[6] Kopp S, Beckung E, Gillberg C. Developmental coordination disorder and other motor control problems in girls with autism spectrum disorder and/or attention-deficit/hyperactivity disorder. Res. Develop. Disabil. 2010; 31: 350-361. doi: 10.1016/j.ridd.2009.09.017
[7] Lord C, Elsabbagh M, Baird G, Veenstra-Vanderweele J. Autism spectrum disorder. Lancet 2018; 392: 508-520. doi: 10.1016/S0140-6736(18)31129-2
[8] Hudson CC, Hall L, Harkness KL. Prevalence of depressive disorders in individuals with autism spectrum disorder: A meta-analysis. J. Abnormal Child Psychol. 2019; 47: 165-175. doi: 10.1007/s10802-018-0402-1
[9] Zaboski BA, Storch EA. Comorbid autism spectrum disorder and anxiety disorders: a brief review. Future Neurol. 2018; 13: 31-37. doi: 10.2217/fnl-2017-0030
[10] Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, et al. “The interspeech 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism,” in Proceedings Interspeech 2013, 14th Annual Conference of the International Speech Communication Association (Lyon). 2013.
[11] Nahar R, Kai A. “Effect of data augmentation on dnn-based vad for automatic speech recognition in noisy environment,” in 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE) Kobe, 2020; 368-372.
[12] Amiriparian S, Baird A, Julka S, Alcorn A, Ottl S, Petrović S, et al. “Recognition of echolalic autistic child vocalisations utilising convolutional recurrent neural networks,” in Proceedings of the Interspeech 2018 (Hyderabad), 2018; 2334-2338
[13] Schadenberg BR, Reidsma D, Evers V, Davison DP, Li JJ, Heylen DK, et al. Predictable robots for autistic children-variance in robot behaviour, idiosyncrasies in autistic children's characteristics, and child–robot engagement. ACM Trans. Comput. Human Interact. 2021; 28: 1-42. doi: 10.1145/3468849
[14] Schuller BW. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM 61, 2018; 90-99. doi: 10.1145/3129340
[15] Shen J, Ainger E, Alcorn A, Dimitrijevic SB, Baird A, Chevalier P, et al. Autism data goes big: A publicly-accessible multi-modal database of child interactions for behavioural and machine learning research. In International Society for Autism Research Annual Meeting (Kansas City, MO). 2018.
[16] Howlin P, Baron-Cohen S, Hadwin J. Teaching Children With Autism to Mind-Read: A Practical Guide for Teachers and Parents. Chichester: J. Wiley & Sons Chichester. 1999.
[17] Ringeval F, Schuller B, Valstar M, Cummins N, Cowie R, Tavabi L, et al. “Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition,” in Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop AVEC '19 (New York, NY: Association for Computing Machinery), 2019; 3-12.
[18] Ringeval F, Schuller B, Valstar M, Gratch J, Cowie R, Scherer S, et al. “Avec 2017: real-life depression, and affect recognition workshop and challenge,” in Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, AVEC '17 (New York, NY: Association for Computing Machinery), 2017;  3-9.
[19] Salishev S, Barabanov A, Kocharov D, Skrelin P, Moiseev M. “Voice activity detector (vad) based on long-term mel frequency band features,” in Text, Speech, and Dialogue, eds P. Sojka, A. Horák, I. Kopeček, and Pala, K. (Cham: Springer International Publishing), 2016; 352-358.
[20] Eyben F, Scherer KR, Schuller BW, Sundberg J, André E, Busso C, et al. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Trans. Affect. Comput. 2015; 7: 190-202. doi: 10.1109/TAFFC.2015.2457417
[21] Hagerer G, Pandit V, Eyben F, Schuller B. “Enhancing lstm rnn-based speech overlap detection by artificially mixed data,” in Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio, Erlangen. 2017.
[22] Eyben F, Wöllmer M, Schuller B. “Opensmile: the munich versatile and fast open-source audio feature extractor,” in MM '10 (New York, NY: Association for Computing Machinery), 2010; 1459-1462.
[23] Stappen L, Baird A, Rizos G, Tzirakis P, Du X, Hafner F, et al. “Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild,” in Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-Life Media Challenge and Workshop MuSe'20, 2020; 35-44.
[24] Van Rossum G, Drake FL. Python 3 Reference Manual. (Scotts Valley, CA: CreateSpace). 2009.
[25] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-scale Machine Learning on Heterogeneous Systems. Available online at: https://www.tensorflow.org/ (accessed December, 2015; 13: 2021).
[26] Lin L. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989; 45: 255-268. doi: 10.2307/2532051
 
Volume 5, Issue 4 - Serial Number 18
Spring 2024
Pages 194-206

  • Receive Date 06 April 2024
  • Revise Date 16 April 2024
  • Accept Date 17 May 2024
  • First Publish Date 17 May 2024
  • Publish Date 20 March 2024