Authors:
(1) Raquel Blanco, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(2) Manuel Trinidad, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(3) María José Suárez-Cabal, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(4) Alejandro Calderón, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(5) Mercedes Ruiz, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(6) Javier Tuya, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]).
Editor’s note: This is part 6 of 7 of a study detailing attempts by researchers to create effective tests using gamification. Read the rest below.
Table of Links
6 Threats to validity
We have identified several threats to validity in the current study, which are classified into four categories: internal, external, construct and conclusion validity.
Internal validity: The identified threats are as follows:
- Professor influence. Students may perform differently, depending on the professor that conducts the classes (Micari & Pazos, 2012). In order to avoid the professor influence in the results, the same professor has conducted all seminar sessions in both the control and experimental groups.
- Data collection and activity monitoring. For implementing the experiment, demographic data were collected and the game activity of the students was monitored. However, students could be reluctant not only to supply these data, but to be monitored constantly. Toward mitigating the student’s reluctancy to participate in the gamified experience, the students were informed about the data to be collected and their treatment. Besides, the participation was voluntary and the students accepted the tools terms of use.
- Material to be used in class. This threat concerns the influence of the exercise domain and complexity, as well as the testing techniques to be used in the exercises, in the student’s motivation. If the domain is not appealing enough or the program to be tested is not complex enough, students could perceive that creating test suites is boring or irrelevant. On the contrary, if the program is too complex with regard to the testing techniques taught in class, students could perceive that they do not have enough knowledge yet to deal with it. To mitigate this threat, each exercise deals with a different domain and the complexity of each exercise was in line with the concepts and testing techniques taught in the lecture classes. Besides, this complexity was increased progressively in each exercise as students acquired more knowledge.
- Injected defects. Some of the injected defects are more difficult to detect than others, so students may perform differently according to the difficulty level of the defects they have to detect. To mitigate this threat, all students in both the experimental and control groups have to detect the same defects in each exercise.
External validity: The identified threats are as follows:
- Student knowledge and skills acquired in lectures. If the students acquire different knowledge in the lectures in the experimental and control groups before carrying out the seminar exercises, the generalization of the results can be threatened. To mitigate this threat, the knowledge all students need to carry out the seminar exercises is taught in the lecture classes, using the same materials and methods in both experimental and control groups. Besides, the same professor conducted the lecture classes in both groups.
- Student knowledge and skills before enrolling in the course. If the students have different previous knowledge in the experimental and control groups or some of them have been trained in testing skills because they are already working on areas related to software development or software testing, again the generalization of the results can be threatened. To mitigate this threat, the knowledge all students need is taught in the lectures classes, as we stated above. Besides, the testing process and the testing techniques are only addressed in this degree course. In addition, the previous knowledge of the students in both groups was similar: the percentage of the students enrolled in the course more than once is quite similar (13% in the control group and 16% in the experimental group) and, based on the conversations between the professor and the students, only a few students start working in the meantime before they are enrolled in the course every year.
- Program representativeness: If the programs to be tested lack complexity, they could not be representative enough of industrial practice. To mitigate this threat, the programs are based on real-life applications.
Construct validity: The threat concerns the metrics used to answer the research questions. The use of metrics that do not describe the student’s engagement and performance could performance could produce misleading results. In order to mitigate this threat, we used student’s performance. Both student’s participation and test suited effectiveness are widely accepted metrics in the literature (Fredricks et al., 2004; Papadakis et al., 2019; Ruiperez-Valiente et al., 2021).
Conclusion validity: The threat concerns the researcher conclusions. To avoid incorrect researcher’s interpretations of the results obtained, we carried out statistical analyses for each research question. We utilized the statistical test and effect sizes generally used when the normality of a distribution cannot be assumed.