Authors:
(1) Krist Shingjergji, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands ([email protected]);
(2) Deniz Iren, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands ([email protected]);
(3) Felix Bottger, Center for Actionable Research, Open University of the Netherlands, Heerlen, The Netherlands;
(4) Corrie Urlings, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands;
(5) Roland Klemke, Educational Sciences, Open University of the Netherlands, Heerlen, The Netherlands.
Editor’s note: This is Part 5 of 6 of a study detailing the development of a gamified method of acquiring annotated facial emotion data. Read the rest below.
Table of Links
V. RESULTS
A. Facegame Scores and Skill Improvement
B. Survey Results and Player Feedback
In the online survey, 36 participants (22 male; 14 female) provided feedback. The age of the participants ranged between 25 and 55 years (M = 33.77, SD = 7.66). Fig. 4 presents the satisfaction scores provided by all participants regarding various aspects of the design of the game. Fig. 5 shows the feedback of the participants who received natural language prescriptions (N = 18) in the experiment regarding their content, design, and usefulness. The results revealed that most of the participants were satisfied or neutral to the content and the design of the prescriptions while attention should be given to the usefulness prescriptions.
We manually semantically grouped the answers from the two open-ended questions. The open-ended question that inquires participants’ suggestions on the prescriptions shows that participants find the use of visualization, prioritization, and personalization useful. Specifically, four participants mentioned that on-screen visualization of the part of the face that they needed to change would help them follow the natural language prescriptions better. Three of the participants found the text too long to read in a short time span and suggested the display of a few of the most important instead. Two participants indicated that more personalized prescriptions would be helpful. Lastly, the 12 participants that commented on the natural language of the prescriptions stated that they found them understandable.
C. Facial Emotion Recognition with Facegame Data
The Facegame Data that were collected in the experiment consist of 636 images depicting the six basic emotions (86 angry, 89 disgusted, 127 fearful, 132 happy, 99 sad, and 103 surprised). The accuracy for each of the five samples for FER2013 and RAF-DB is shown in Fig. 6 and Fig. 7, respectively. The average accuracy of the model trained on instances from FER2013 is 32.80%, while for the model trained on the instances from the combination of FER2013 and the Facegame Data is 33,20%. Similarly, the average accuracy of the model trained on instances from RAF-DB is 51,27%, while for the model trained on the instances from the combination of FER2013 and the Facegame Data is 51,87%. The observed low accuracies of the models were expected considering the nature of the sets and the simplicity of our neural network architecture. The images in FER2013 and RAF-DB represent in-the-wild conditions with subtle facial expressions. Therefore, the emotion recognition task is more challenging, and it requires complex classifiers to achieve better accuracy. Moreover, our goal was to examine the quality of the data collected via Facegame in comparison with the existing in-the-wild sets, and for this purpose, a simple neural network sufficed. The results show the we were able to collect labelled data that can potentially improve FER in-the-wild.
D. The Mapping of AUs to Emotion Classes and the Variability
The results of the correlation between the six basic emotions and the AUs are shown in Fig. 8. For this analysis, the data from the trials that scored below 1/3 were excluded. The results suggest that we were able to capture some strong correlations between certain emotional facial expressions and their signature AUs.
For instance, lips part (AU25) and jaw drop (AU26) highly correlate with both surprise and fear, while lip corner puller (AU12) highly correlates with happiness. The high occurrence number of nasolabial deepener (AU11) is an indication of the underlying Py-Feat model creating a high false-positive rate for AU11, which needs further examination. The results show that we were able to define the emotion classes as a distribution of multiple AUs. Such naturally occurring variety in facial expression data can potentially be used to improve FER in-the-wild.