How Developers Use ChatGPT In GitHub Pull Requests And Issues

Table Of Links

Abstract

1 Introduction

2 Data Collection

3 RQ1: What types of software engineering inquiries do developers present to ChatGPT in the initial prompt?

4 RQ2: How do developers present their inquiries to ChatGPT in multi-turn conversations?

5 RQ3: What are the characteristics of the sharing behavior?

6 Discussions

7 Threats to Validity

8 Related Work

9 Conclusion and Future Work

References

Threats to Validity

==Internal Validity:== Open coding was employed in our study as the major methodology we conducted. This approach introduces subjectiveness to our results as the annotators may hold different understandings of the coding schema. To mitigate this threat, we follow below best practices below:

(1) we conduct multiple rounds of individual coding, and we ensure the Cohen’s Kappa score reflecting the agreement between the annotators is at least substantial;

(2) all disagreement is discussed with all annotators until agreement reached;

(3) two rounds of revisit on the entire label-set are performed after individual labeling to mitigate any miscoded. Moreover, there were cases where their labels were difficult to decide as they lay within the boundaries of two categories. As we employed a single label for all data, these boundary cases were labeled with a single label that the annotators thought was the best label to describe them.

==External Validity:== Our results are derived from the specific dataset, DevGPT, where only developers shared conversations in GitHub Pull Request and Issues are considered. Furthermore, the dataset only contains data collected up to October 2023. Thus, our results may not generalize to all DeveloperChatGPT conversations. Developers shared conversations and usage of these shared conversations existing in other open-source platforms might vary.

Developers shared conversations and usage in general contexts, i.e., when these shared conversations are not posted in GitHub PRs and issues may not follow the findings discussed in this paper. We encourage researchers to reproduce our study on a larger dataset or data collected from other open-source communities.

==Construct Validity:== To ensure our taxonomy presented in this study best captures the data, we develop our coding book based on previous studies and expand on them to include ChatGPT-specific scenarios. We build our coding book for RQ1 based on the taxonomy proposed by Beyer et al. 2020 and Hata et al. 2022. Our RQ2 coding book is expanded on the taxonomy proposed by Huang et al. 2018 and Qu et al. 2019.

Related Work

In this section, we first present related work on the usability of FM-powered tools (Section 8.1). As our study analyzes shared conversations in GitHub issues and pull requests, we also present existing studies on analyzing conversations related to software engineering (Section 8.2) and link sharing in SE platforms, i.e., GitHub and Stack Overflow (Section 8.3). 8.1 Understanding Developers’ Interactions with FM-powered Software Development Tools Several studies investigated the interactions between FM-powered software development tools and developers.

These studies adopted either user studies or surveys as their methodology, with a significant emphasis on GitHub Copilot, a code completion tool powered by OpenAI’s Codex model. User studies, such as those conducted by Vaithilingam et al. 2022, Mozannar et al. 2022, Barke et al. 2023 and Ross et al. 2023, were performed to understand how developers interact with GitHub Copilot. Vaithilingam et al. 2022 reported a general preference among developers for incorporating Copilot into their daily programming tasks via a user study with 24 participants.

However, they also found that developers face challenges in understanding and debugging the code that Copilot generates. Mozannar et al. 2022 developed a taxonomy of 12 programmer activities associated with GitHub Copilot use, such as verifying suggestions and debugging. Their analysis of 21 developers’ interactions with Copilot revealed that a substantial portion of coding sessions were spent on activities like double-checking and editing Copilot’s suggestions.

Barke et al.’s user study 2023 with 20 participants reported a bimodal usage pattern among developers: they employ Copilot both to speed up familiar tasks (“acceleration mode”) and to explore solutions for new challenges (“exploration mode”). Moreover, programmers may defer thoughts on suggestions for later and simply accept them when displayed. Ross et al. 2023 conducted a user study with 42 developers with follow-up surveys using their proposed program assistant, which relies on OpenAI’s codex API to support code edit suggest and chat function. They found that 83% of participants rated the importance of the ability to ask follow-up questions as being “somewhat” or “a great deal”.

On the survey front, large-scale studies by Ziegler et al. 2022 and Liang et al. 2024 explored developer perceptions of productivity and usability challenges with AI programming assistants, including GitHub Copilot. Ziegler et al. (researchers from GitHub) collected survey responses from 2,631 developers with IDE usage data. They identified the identifying the acceptance rate of Copilot’s suggestions as a significant predictor of perceived productivity. Interestingly, acceptance rates varied across programming languages, i.e., 23.3%, 27.9%, and 28.8% of Copilot’s suggestions were accepted for TypeScript, JavaScript, and Python, respectively, and 22.2% for all other languages.

More recently, Liang et al. 2024 performed another survey with 410 developers with a focus on the usability challenges of many AI programming assistants, including GitHub Copilot and ChatGPT. Their results show developers are most motivated to use AI programming assistants because they help developers reduce key-strokes, finish programming tasks quickly, and recall syntax. They also found that the most frequent usability challenges were understanding how input leads to output code and how to control the tool’s generations.

Similar to prior studies, our study aims to explore the potentials and limitations of FM-powered tools, i.e., ChatGPT, in software development by analyzing the interaction between developers and ChatGPT. But different from them, we explore ChatGPT’s role in supporting developers’ collaboration in coding on GitHub by analyzing developers’ shared conversations with ChatGPT in GitHub issues and pull requests. We deployed a mixed research method, including qualitative and quantitative studies, rather than using user studies or user surveys.

8.2 Conversation Analysis in Software Engineering

In software development, conversations play a crucial role in facilitating collaboration, knowledge sharing, and problem-solving among developers. These interactions vary widely, ranging from email lists to issue discussions on opensource software (OSS) platforms like GitHub to technical queries on community forums such as Stack Overflow. Each type of conversation has unique characteristics and serves distinct purposes within the software development lifecycle.

Di Sorbo et al. 2015 manually annotated 100 emails taken from the Qt project development mailing list and proposed a taxonomy that categorized communications between developers into six types, including feature request, opinion asking, problem discovery, solution proposal, information seeking, and information giving. Building upon this framework, Huang et al. 2018 sampled 5,408 sentences from comments recorded in issue tracking systems of four large and popular projects hosted on GitHub.

Their manual categorization refined Di Sorbo et al.’s taxonomy by adding new categories, such as “aspect evaluation” and “meaningless (less informative)”. Besides general categorization, there are also studies that focus on specific topics from conversations. Arya et al. 2019 also investigated issue discussions, but with a focus on fine-grained information types mentioned in conversations. They identified 16 distinct types of information and compiled a labeled corpus of 4,656 sentences. Nurwidyantoro et al. 2022 annotated 1,097 issue discussions collected from three Android projects to understand human values in software development artifacts.

Their results show that value themes could be found in issue discussions (33% of the inspected issues). Beyond issues, studies have also explored conversations in pull requests and online chat rooms. Viviani et al. 2019 proposed an automated solution to locate the points of the discussion where developers discuss design in pull requests. Shi et al. 2021 conducted a comprehensive empirical study on developers’ live chat in gitter, exploring interaction dynamics, community structures, discussion topics, and interaction patterns. In contrast to existing research that primarily focuses on developer-to-developer communication, our study focuses on developer-to-tool interactions.

We introduce two new taxonomies: one that categorizes SE-related inquiry types and another that characterizes the role of sentences within multi-turn conversations, drawing inspiration from literature for certain categories like “information giving”. Our findings in RQ1 highlight that most shared conversations with ChatGPT in GitHub issues and pull requests revolve around seeking assistance with SE tasks, i.e., these conversations are predominantly information-seeking conversations.

Before the advent of FM-powered tools in software development, such inquiries were commonly posed on Q&A sites like Stack Overflow (SO). In the literature, analyzing questions developers post on SO is a common practice to understand developers’ faced challenges in a specific domain, such as mobile development (Rosen and Shihab, 2016), blockchain (Wan et al., 2019), and deploying deep learning application (Zhang et al., 2019).

Yet, the studies most relevant to our research are those aimed at categorizing the general types of questions posed by developers on SO. Treude et al. 2011 were the first ones investigating the question categories of posts of SO. In 385 manually analyzed posts, they found 10 question categories: how-to, discrepancy, environment, error, decision help, conceptual, review, non-functional, novice, and noise. Similarly, Rosen and Shihab 2016 manually categorized 384 posts of SO for the mobile operating systems Android, Apple, and Windows, each into three main question categories:

How, What, and Why. More recently, Beyer et al. 2020 have advanced this line of work by manually classifying 1,000 SO questions, proposing a new taxonomy that amalgamates all previously identified question categories. This taxonomy comprises seven high-level categories: API usage, conceptual issues, discrepancies, errors, reviews, API changes, and learning. Our taxonomy in RQ1, inspired by Beyer et al., aligns with many SE-related inquiries observed on SO but also distinguishes the unique types of inquiries directed at ChatGPT, such as those related to documentation improvement, code comprehension, data generation, and data formatting. Moreover, we discuss the role of sentences (developers’ prompts) in multi-turn conversation and developers’ sharing behavior, which is not covered in prior studies.

8.3 Linking Sharing in GitHub and Stack Overflow

Link sharing serves as a pivotal method for knowledge sharing within developer communities, particularly on Q&A sites like Stack Overflow (SO) and social coding platforms such as GitHub. This practice, widely recognized for facilitating the exchange of insights about software development tools and libraries, underpins much of the collaborative ethos in these communities.

There has been extensive research on developers’ link-sharing behaviors on Stack Overflow. G´omez et al. 2013 observed that a considerable portion of the shared links on SO are aimed at spreading knowledge about new software development tools and libraries. Ye et al. 2017 further explored this phenomenon, examining the structure and dynamics of SO’s knowledge network through link sharing.

Their findings highlight that developers share links for a variety of reasons, with referencing information for problem-solving being the most common purpose. Baltes et al. 2020 provided an in-depth analysis of how and why documentation links are cited within SO posts by examining 759 shared links to understand their roles and the importance of context in interpreting these references.

More recent studies by Liu et al. 2021; 2022 have analyzed the characteristics of broken shared links and the patterns of repeatedly referenced links within SO posts. Beyond Stack Overflow, research has extended to the use of links within GitHub’s ecosystem, particularly in issues and pull requests. Zampetti et al. 2017 explored the extent and purposes behind developers’ references to external online resources in pull requests, indicating a strong inclination towards acquiring new knowledge or addressing specific problems.

Zhang et al. 2018 observed that developers tend to link more cross-project or cross-ecosystem issues over time. Li et al. 2018 conducted a comprehensive study to understand why developers create links within issues and pull requests on GitHub and how these links impact software development. They manually identified six types of relationships in linking behavior: dependent, duplicate, relevant, referenced, fixed, and enhanced. Hata et al. 2019 collected 9.6 million links in source code comments.

They found that over 80% of repositories contain at least one link, pointing to the prevalence of link sharing for purposes such as providing metadata or attributions. Wang et al. 2021 emphasized the critical role of shared links in code review, demonstrating how they serve as vital resources for both authors and review teams. Xiao et al. 2023 highlighted the issue of link decay in commit messages and the various purposes for which developers include links, primarily for context provision.

Our results in RQ3 align with findings from prior studies. For instance, similar to these investigations, we identified instances where shared links, specifically those encapsulating dialogues between developers and ChatGPT, were broken. In response to this issue, we observed a practice among some developers who, instead of solely sharing a link, opted to embed the entire conversation within issue comments or pull requests as a form of reference (when manually analyzing PRs and issues in RQ3).

We also found the shared links play an important role in supporting collaboration in coding. Unlike the predominant trend observed in SO, where documentation links are frequently cited, or in GitHub, where cross-project and ecosystem software artifact links prevail, our study sheds light on the unique nature of links to ChatGPT conversations. These links represent a novel vector for knowledge sharing and collaboration.

:::info
Authors