Authors:
(1) Raquel Blanco, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(2) Manuel Trinidad, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(3) María José Suárez-Cabal, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(4) Alejandro Calderón, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(5) Mercedes Ruiz, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(6) Javier Tuya, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]).
Editor’s note: This is part 4 of 7 of a study detailing attempts by researchers to create effective tests using gamification. Read the rest below.
Table of Links
4 Results
This section presents the experiment results and answers the research questions introduced in Section 1. The data collected to carry out the analysis are available at http://dx.doi.org/10.17811/ruo_datasets.64866.
4.1 RQ1: Student engagement
RQ1: Is the engagement of the software testing students who carry out gamified activities higher than the ones who carry out them in a non-gamified environment?
Analyzing each individual exercise, it can be observed that the results for the Number of Test Suite Executions and the Active Time are similar to the ones obtained in the Olympic race. In general, significant differences are found and the mean of the experimental group is greater than the mean of the control group. However, in the last exercise, the mean of the control group for both metrics is slightly higher (there is no significant difference for the Number of Test Suite Executions, whereas significant difference is found for the Active Time).
Regarding the Participation Rate, there is a significant difference between both groups in the first three exercises, with a small size effect, and the mean of the experimental group is greater than the mean of the control group. In the last exercise, there is no significant difference and the mean of the experimental group is slightly lower. On the other hand, no significant differences are found in any exercise for the Dropout Rate. The students in both groups started to drop out of the seminar exercises in the middle of the semester, which corresponds with exercise 3 (3% in the control group and 1% in the experimental group). In the last exercise, the dropout was slightly higher (6% in the control group and 7% in the experimental group).
Therefore, our findings from the Olympic race are in line with the first three exercises: the students in the experimental group worked more on the exercises and dropped out of them less than the control group. Only in the last exercise the control group seemed to be more engaged; however, no significant differences are found in three out of four metrics.
Overall, the null hypothesis is rejected in favor of the gamification experience when the engagement is measured with the four metrics. So, the engagement of the students who perform gamified software testing activities is higher than the ones who perform them in a non-gamified environment.
4.2 RQ2: Student performance
RQ2. Is the performance of the software testing students who carry out gamified activities higher than the ones who carry out them in a non-gamified environment?
For both metrics, the p-value obtained for the Olympic race is smaller than a, so once again, it can be assumed that there is significant difference between the control and experimental groups, although the effect size is small. Moreover, the mean of the Effectiveness, as well as the mean of the Effectiveness Increase, are higher in the experimental group. Therefore, the students in the experimental group achieved better performance and they worked harder in the test improvement activity to increase the effectiveness.
The analysis of each individual exercise also reveals that the experimental group performs better: the difference is significant in both metrics in exercises 2 and 3, with small size effect, and the mean of both metrics is higher in the experimental group in all exercises. Despite the benefits of the gamification experience to improve both the Effectiveness and the Effectiveness Increase, we can observe a downward trend, mainly in the last two exercises.
Therefore, the null hypothesis is also rejected in favor of the gamification experience when the performance is measured with both Effectiveness and Effectiveness Increase. So, the performance of the students who carried out gamified software testing activities is higher than the ones who carried them out in a non-gamified environment.