By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Soft Forks, Silent License Changes, and Self-Promo: Etor Sees It All | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Soft Forks, Silent License Changes, and Self-Promo: Etor Sees It All | HackerNoon
Computing

Soft Forks, Silent License Changes, and Self-Promo: Etor Sees It All | HackerNoon

News Room
Last updated: 2025/09/23 at 7:02 AM
News Room Published 23 September 2025
Share
SHARE

Table of Links

Abstract and 1. Introduction

  1. Background and Related Work

  2. Study of Unethical Behavior in OSS

    3.1 RQ1: Types of unethical behavior

    3.2 RQ2: Affected software artifacts

  3. Methodology

    4.1 Modeling via SWRL rules

    4.2 Automatic detection of unethical behavior

  4. Evaluation

  5. Discussion and Implications

  6. Threats to Validity

  7. Conclusion and References

4.2 Automatic detection of unethical behavior

We designed Etor to auto-detect six types. We excluded nine types because (1) they involve artifacts (e.g., product names, software features) that are difficult to automatically isolate from other artifacts (i.e., “No opt-in or no option allowed”, “Privacy Violation”, “Naming confusion”, and “Offensive language”), (2) they require sophisticated analysis of configuration files, API or source code (i.e., “Plagiarism”, “Depending on proprietary software”, and “Vulnerable code/API”), (3) their detection requires advanced natural language processing (i.e., “Closing issue/PR without explanation” as it requires automatically checking if the explanation for closing the PR/issue exists), and (4) approaches for “License incompatibility” [52, 60, 82] exist so we exclude it to avoid reinventing the wheels.

Figure 4: Overall architecture of Etor (GH denotes GitHub).

Overview of Etor. Figure 4 presents the overall architecture of our automatic detection tool, Etor. Etor supports detection of unethical behavior for two levels, including: (1) repository (denoted as repo), and (2) GitHub issue/pull request (we denote an issue as issue and a pull request as PR). Given a repo or an issue/PR, and the type of unethical behavior eType to be checked, the Etor relies on its set of SWRL rules for its detection, and produces as output whether there is a violation of eType in the given input. Apart from GitHub attributes in Table 2 that can be detected using the GitHub API, our SWRL rule reasoner uses two additional components for its detection: (1) license detector that checks for licenses at the repository level, and (2) code similarity checker that identifies similar code.

Supported types. Etor supports six types of unethical behavior. We include the SWRL rules for all supported types in the supplementary material. We next describe how Etor checks each supported type. (S1) No attribution to the author in code. Etor checks if an issue or a PR has a Stack Overflow link representing a reference code, and the code snippet copied from Stack Overflow cites the reference link. Although there can be many resources from which stakeholders copy the reference code, Etor only check for Stack Overflow links because (1) we learned from our study and from existing work [42] that contributors are required to give credit to copied code snippets in Stack Overflows as they are protected by the CC-BY-SA Creative Commons license, and (2) to support other online resources (e.g., GitHub links), we need to automatically extract the original reference code (requires parsing Web pages of different formats), and identify the appropriate license for the code snippet (requires detecting the license for partial code, which is beyond the scope of this paper). Given an issue/PR, Etor checks if a comment b in the issue/PR posted by a stakeholder u1 contains the Stack Overflow link (w) (we use regular expression to extract w). Etor reports a potential violation if: (1) u1 is not the owner of the Stack Overflow comment, (2) the code snippets from Stack Overflow is found in one of the files in the repository (F) with at least 10% similarity (copyright law permits the use of up to 10% of work without permission [20]), and (3) w is not found in F.

(S2) Soft forking. Given two repositories r1 and r2, Etor compares the contents of all source files in the two repositories to check if one repository is a soft-fork (the repository has the same content but it is not listed as an official fork of another repository) of another repository. Specifically, we use AC2 [21] to detect the similarities between files. AC2 is a source code plagiarism detection tool that has been widely used by graders to detect plagiarism within a group of assignments. We select AC2 because (1) it supports many programming languages (e.g., C, C++, Java, and PHP), (2) it can be run in a local environment without connection to remote servers, and (3) it is quite robust as it incorporates multiple algorithms found in the literature. Etor reports a violation if it detects: (1) 100% similarity between r1 and r2, and (2) r2 is not in the fork list of r1. (S5) No license provided in public repository. Given a repository r, Etor detects the repo-level license by checking if it exists in the: (1) LICENSE file [22] in the main directory of r, (we check only in the main directory to avoid mistakenly finding API license or package license) or (2) README.md file with license information (we use the list of licenses provided by GitHub [23] for repo-level license detection). Etor reports a potential violation if no license is found after searching for the two files.

(S6) Uninformed license change. We consider a change to be uninformed if (1) it is not announced in the CHANGELOG.md or (2) the license change is not done via PR. Given a repository r, Etor checks if the repo-level license has been changed by: (1) extracting commit lists of the license file, and (2) checking if commit changes include license updates. If the license changes occur in more than one commit (we ignore the first commit as it is the initial license creation), Etor checks whether the changes have been announced in the CHANGELOG.md by checking whether the CHANGELOG.md mentions license information. If license information is not found, Etor checks the PR count for the commit (pullRequestCountByCommit). If the count is less than one, Etor marks it as a potential violation.

(S8) Self-promotion. We consider self-promotion to be the scenario where a contributor u opens a GitHub issue/PR where the content of the issue/PR includes links to another repository in GitHub to promote his or her own repository. Given an issue/PR for r1 as input, Etor first (1) checks that the issue/PR includes a link L to another repository r2, and (2) identifies the stakeholder u who opens the issue/PR. Then, it reports a violation if: (1) r1 is not r2, (2) u is not a contributor of r1 (i.e., u is an outsider for r1), and (3) u is a contributor of r2. To reduce false positives, Etor also checks if L includes specific keywords that usually indicate that the contributor is sharing the link L for demonstration purposes (e.g., [DEMO]) instead of promoting a repository/library (“issues”, “pull”, “commit”, “tree”, “releases”, “blob”, and “runs”).

Table 3: Number of issues detected and TP/FP rate

(S9) Unmaintained Android Project with Paid Service. This type checks whether an Android project offered paid service in Google Play, but stop actively maintaining the GitHub repository. On average, 115 APIs are updated per month [65], and 49% of app updates have at least one update within 47 days [67]. Based on this frequency of app updates, we define an unmaintained Android project to be an Android project where the latest update is released less than 0.5 year. Given a repository r as input, Etor first checks for unmaintained Android projects by examining whether (1) the latest release date (D) of r is less than 0.5 year, and (2) r is an original repository (not forked from other repositories). Then, it checks whether the app offers a paid service by (1) identifying the Google Play link l from r, and (2) searching for the “in-app purchase”.

:::info
Authors:

(1) Hsu Myat Win, Southern University of Science and Technology, China ([email protected]);

(2) Haibo Wang, Southern University of Science and Technology, China ([email protected]);

(3) Shin Hwei Tan, a corresponding author from Southern University of Science and Technology, China ([email protected]).

:::


:::info
This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Some iPhone 17 Owners Report Wi-Fi Dropping When Unlocking Their Phone
Next Article DEFRA minister declares government support for agtech sector – UKTN
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

New DOD Rule May Encourage More Whistleblowing
Computing
Google Gemini Can Help Travelers In A Big Way With This Feature – BGR
News
MoonBull Whitelist Rewards: Win Big $15K Giveaway With Best Crypto To Watch In 2025 While Bonk And Notcoin Push Higher
Gadget
Is Tesla shares a bargain prior to his winning report next month?
News

You Might also Like

Computing

New DOD Rule May Encourage More Whistleblowing

0 Min Read
Computing

47 Social Media Statistics Every Marketer Should Know – Blog

29 Min Read
Computing

The Decentralized Internet Is a Mirage | HackerNoon

7 Min Read
Computing

New Patches Optimize EXT4 Online Defragmentation For Better Performance

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?