Table of Links
Abstract and 1 Introduction
1.1 The twincode platform
1.2 Pilot Studies
1.3 Other Gender Identities and 1.4 Structure of the Paper
2 Related Work
3 Original Study (Seville Dec, 2021) and 3.1 Participants
3.2 Experiment Execution
3.3 Factors (Independent Variables)
3.4 Response Variables (Dependent Variables)
3.5 Confounding Variables
3.6 Data Analysis
4 First Replication (Berkeley May, 2022)
4.1 Participants
4.2 Experiment Execution
4.3 Data Analysis
5 Discussion and Threats to Validity and 5.1 Operationalization of the Cause Construct — Treatment
5.2 Operationalization of the Effect Construct — Metrics
5.3 Sampling the Population — Participants
6 Conclusions and Future Work
6.1 Replication in Different Cultural Background
6.2 Using Chatbots as Partners and AI-based Utterance Coding
Datasets, Compliance with Ethical Standards, Acknowledgements, and References
A. Questionnaire #1 and #2 response items
B. Evolution of the twincode User Interface
C. User Interface of tag-a-chat
A Questionnaire #1 and #2 response items
In this section, the response items of the scales used in questionnaires #1 and #2 are enumerated. Those scales were analyzed for internal consistency using the data collected during the pilot studies, and the results of those analysis consisting in the Pearson’s correlations, Cronbach’s α, and principal components scree plot are also reported [57], indicating whether some response items were dropped or not according to the obtained results.
A.1 Response items for perceived productivity scale (pp)
All the items in this questionnaire section, entitled as “Solo programming or pair programming?”, are 0–10 numerical response items in which 0 means “programming solo”, 5 means “the same in both cases”, 10 means “programming in pairs”.
pp1 Regarding the programming exercises you just did, how do you think you would have been more productive, programming solo or programming with the partner assigned to you?
pp2 Regarding the programming exercises you just did, how do you think you would have achieved a better program quality, programming solo or programming with the partner assigned to you?
pp3 Regarding the programming exercises you just did, how do you think you would have developed a more reliable program, i.e., a program more likely to run without failures, programming solo or programming with the partner assigned to you?
pp4 Regarding the programming exercises you just did, how do you think you would have enjoyed more, programming solo or programming with the partner assigned to you?
As shown in Figure 12, all the items presented high Pearson correlations with Cronbach’s α = 0.83, and the scree plot confirmed they were unidimensional according to the Kaiser criterion. As a result, all of them were kept after the reliability analysis on the data from the pilot studies.
A.2 Response items for partner’s perceived technical competency (pptc)
All the items in this questionnaire section, entitled as “My partner or me?”, are 0–10 numerical response items in which 0 means “me”, 5 means “both equally”, 10 means “my partner”.
pptc1 During the programming exercises you just did, who do you think had more knowledge and technical skills, you or the partner assigned to you?
pptc2 During the programming exercises you just did, who do you think has been more cooperative, you or the partner assigned to you?
pptc3 During the programming exercises you just did, who do you think has had a faster pace at solving the exercises, you or the partner assigned to you?
pptc4 During the programming exercises you just did, who do you think has led more to the solutions, you or the partner assigned to you?
As shown in Figure 13, in the initial version of the scale used in the pilot studies, the pptc5 item, which asked whether the assigned partner had been condescending, presented low correlations with the rest of the items in the scale and the scree plot indicated two factors. After removing that uncorrelated item, the Cronbach’s α increased from 0.73 to 0.85, and the scree plot indicated only one factor, as shown in Figure 14.
A.3 Response item for partner’s perceived positive and negative aspects (ppa and pna)
The only item in this questionnaire section, entitled as “Describe your partner”, is a free text field in which subjects are instructed to describe the most positive and most negative aspects of the partner assigned to them in the programming exercises they just did, indicating the positive ones with a ”+” sign and the negative ones with a ”-” sign in front of each aspect.
A.4 Response items for compared partners’ skills (cps)
All the items in this questionnaire section, entitled as “First or second partner?”, are 0–10 numerical response items in which 0 means “first partner”, 5 means “both equally”, 10 means “second partner”.
cps1 Comparing your assigned partners in sessions 1 and 3, who do you think provided more clear and constructive feedback, your first partner or your second partner?
cps2 Comparing your assigned partners in sessions 1 and 3, who do you think was easier to communicate with, your first partner or your second partner?
cps3 Comparing your assigned partners in sessions 1 and 3, who do you think who do you think was more knowledgeable about the subject material, your first partner or your second partner?
cps4 Comparing your assigned partners in sessions 1 and 3, who do you think would be a better project partner, your first partner or your second partner?
cps5 Comparing your assigned partners in sessions 1 and 3, who do you think would be a better teaching assistant, your first partner or your second partner
As shown in Figure 15, all the items presented high Pearson correlations with Cronbach’s α = 0.88, and the scree plot confirmed they were unidimensional according to the Kaiser criterion. As a result, all of them were kept after the reliability analysis on the data from the pilot studies.
B Evolution of the twincode User Interface
The twincode user interface used in the external replication at UC Berkeley is shown in Figure 16(a) and 16(b).
C User Interface of tag-a-chat
The user interface of the tag-a-chat tool used for collaboratively coding chat utterances is shown in Figure 17.
Authors:
(1) Amador Duran, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(2) Pablo Fernandez, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(3) Beatriz Bernardez, I3US Institute, Universidad de Sevilla, Sevilla, Spain and SCORE Lab, Universidad de Sevilla, Sevilla, Spain ([email protected]);
(4) Nathaniel Weinman, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]);
(5) Aslıhan Akalın, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]);
(6) Armando Fox, Computer Science Division, University of California, Berkeley, Berkeley, USA ([email protected]).