A Replication Study On Software Testing Perception Vs Effectiveness

Table Of Links

Abstract

1 Introduction

2 Original Study: Research Questions and Methodology

3 Original Study: Validity Threats

4 Original Study: Results

5 Replicated Study: Research Questions and Methodology

6 Replicated Study: Validity Threats

7 Replicated Study: Results

8 Discussion

9 Related Work

10 Conclusions And References

5 Replicated Study: Research Questions And Methodology

We decide to further investigate the results of the original study in search of possible drivers behind misperceptions. Psychology considers that people’s perceptions can be affected by personal characteristics as attitudes, personal interests and expectations. Therefore, we decide to examine participants’ opinions by conducting a differentiated replication of the original study [47] that extends its goal as follows:

The survey of effectiveness perception is extended to include questions on programs.
We want to find out whether participants’ perceptions might be conditioned by their opinions. More precisely: their preferences (favourite technique), their performance (the technique that they think they applied best) and technique or program complexity (the technique that they think is easiest to apply, or the simplest program to be tested).

Therefore, the replicated study reexamines RQ1 stated in the original study (this time the survey taken by participants also includes questions regarding programs), and addresses the new following research questions:

– RQ1.6: Are participants perceptions related to the number of defects reported by participants? We want to assess if participants perceive as the most effective technique the one with which they have reported more defects.

– RQ2: Can participants’ opinions be used as predictors for testing effectiveness?

– RQ2.1: What are participants’ opinions about techniques and programs? We want to know if participants have different opinions about techniques or programs.

– RQ2.2: Do participants’ opinions predict their effectiveness? We want to assess if the opinions that participants have about techniques (or programs) predict which one is the most effective for them.

– RQ3: Is there a relationship between participants’ perceptions and opinions?

– RQ3.1: Is there a relationship between participants’ perceptions and opinions? We want to assess if the opinions that participants have about techniques (or programs) are related to their perceptions.

– RQ3.2: Is there a relationship between participants’ opinions? We want to assess if a certain opinion that participants have about techniques are related to other opinions.

To answer these questions, we replicate the original study with students of the same course in the following academic year. This time we have 46 students. The changes made to the replication of the experiment are as follows: – The questionnaire to be completed by participants at the end of the experiment is extended to include new questions. The information we want to capture with the opinion questions is: – Participants performance on techniques. With this question we are referring to process conformance. Best applied technique is the technique each participant thinks (s)he applied more thoroughly. It corresponds to OT1: Which technique did you apply best?

– Participants preferences. We want to know the favourite technique of each participant. They one (s)he felt more comfortable with when applied. It corresponds to OT2: Which technique do you like best?

– Technique complexity. We want to know the technique each participant thinks was easiest to get process conformance. It corresponds to OT3: Which technique is the easiest to apply?

– Program testability. We want to know the program it was easier to test. This is, the program in which process conformance could be obtained more easily. It corresponds to OP1: Which is the simplest program? Table 16 summarizes the survey questions. We have chosen these questions because we need to ask simple questions, that can be easily understood by participants, being at the same time meaningful. We do not want to overwhelm participants with complex questions that have lots of explanations. A complex questionnaire might discourage students to submit it.

– The program faults are changed. The original study is designed so that all techniques are effective at finding all defects injected. We choose faults detectable by all techniques so the techniques could be compared fairly. The replicated study is designed to cover the situation in which some faults cannot be detected by all techniques. Therefore, we inject some faults that techniques are not effective at detecting. For example, BT cannot detect a non-implemented feature (as participants are required to generate test cases from the source code only). Likewise,

EP cannot find a fault whose detection depends on the combination of two invalid equivalence classes. Therefore, in the replicated study, we inject some faults that can be detected by BT but not by EP and some faults that can be detected by EP but not by BT into each program (each program is seeded with six faults). Note that the design is balanced: we inject the same number of faults that BT can detect, but not EP, that the opposite –EP can detect, but not BT). This change is expected to affect the effectiveness of EP and BT, which might be lower than in the original study. It should not affect the effectiveness of CR.

– We change the program application order to further study maturation issues. The order is now: cmdline, ntree, nametbl. This change should not affect the results.

– Participants run their own test cases. It could be that the misperceptions obtained in the original study are due to the fact that participants are not running their own test cases.

– There are not two versions anymore but one. Faults and failures are not the goal of this study. This helps to simplify the experiment. Table 17 shows a summary of the changes made to the study.

Table 17 Changes Made to the Original Study

To measure technique effectiveness we proceed in the same way as in the original study. We do not rely on the reported failures, as participants could:

Report false positives (non-real failures).
Report the same failure more than once (although they were asked not to do so).
Miss failures corresponding to faults that have been exercised by the technique, but for some reason have not been seen.

We measure the new response variable (reported defects) by counting the number of faults/failures reported by each participant. We analyse RQ2.1 in the same manner as RQ1.1, and RQ1.6, RQ2.2, RQ3.1 and RQ3.2 like RQ1.2. Table 18 summarises the statistical tests used to answer each research question.

Table 18 Statistical Tests Used to Answer New Research Questions of the Replicated Study

6 Replicated Study: Validity Threats

The threats to validity listed in the original study apply to this replicated study. Additionally, we have identified the following ones:

6.1 Conclusion Validity

Reliability of treatment implementation. The replicated experiment is run by the same researchers that performed the original experiment. This assures that the two groups of participants do not implement the treatments differently.

6.2 Internal Validity

1. Evaluation Apprehension. The use of students and associating their performance in the experiment with their grade in the course might explain that participants consider that their performance and not the weaknesses of the techniques explain the effectiveness of a technique.

6.3 Construct Validity

Inadequate preoperational explanation of effect constructs. Since opinions are hard constructs to operationalize, there exists the possibility that the questions appearing in the questionnaire are not interpreted by participants the way we intended to. 6.4 External Validity
Reproducibility of results. It is not clear to what extent the results obtained here are reproducible. Therefore, more replications of the study are needed.

The steps that should be followed are:

(a) Replicate the study capturing the reasons for the answers given by participants.

(b) Perform the study with practitioners with the same characteristics as the students used in this study (people with little or no experience in software testing).

(c) Explore and define what types of experience could be influencing the results (academic, professional, programming, testing, etc.).

(d) Run new studies taking into consideration increasing levels of experience.

Again, of all threats affecting the replicated study, the only one that could affect the validity of the results of this study in an industrial context is the one related to generalisation to other subject types.

:::info
Authors:

Sira Vegas
Patricia Riofr´ıo
Esperanza Marcos
Natalia Juristo

:::

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 license.

:::

A Replication Study on Software Testing Perception vs Effectiveness | HackerNoon

5 Replicated Study: Research Questions And Methodology

6 Replicated Study: Validity Threats

6.1 Conclusion Validity

6.3 Construct Validity

Leave a Reply Cancel reply

Stay Connected

Latest News

Google Pixel 10 to ditch Qualcomm and Samsung modems in favor of MediaTek in 2025: report · TechNode

Government to invest £2bn in quantum procurement programme – UKTN

SEC eyes shift to twice-yearly earnings reports | News

China’s CATL aims to boost tech innovation by funding suppliers · TechNode

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

5 Replicated Study: Research Questions And Methodology

6 Replicated Study: Validity Threats

6.1 Conclusion Validity

6.3 Construct Validity

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News