By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: The Art of Prompt-Swapping, Temperature Tuning, and Fuzzy Forensics in AI | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > The Art of Prompt-Swapping, Temperature Tuning, and Fuzzy Forensics in AI | HackerNoon
Computing

The Art of Prompt-Swapping, Temperature Tuning, and Fuzzy Forensics in AI | HackerNoon

News Room
Last updated: 2025/07/29 at 12:25 PM
News Room Published 29 July 2025
Share
SHARE

Table of Links

Abstract and I. Introduction

II. Related Work

III. Technical Background

IV. Systematic Security Vulnerability Discovery of Code Generation Models

V. Experiments

VI. Discussion

VII. Conclusion, Acknowledgments, and References

Appendix

A. Details of Code Language Models

B. Finding Security Vulnerabilities in GitHub Copilot

C. Other Baselines Using ChatGPT

D. Effect of Different Number of Few-shot Examples

E. Effectiveness in Generating Specific Vulnerabilities for C Codes

F. Security Vulnerability Results after Fuzzy Code Deduplication

G. Detailed Results of Transferability of the Generated Nonsecure Prompts

H. Details of Generating non-secure prompts Dataset

I. Detailed Results of Evaluating CodeLMs using Non-secure Dataset

J. Effect of Sampling Temperature

K. Effectiveness of the Model Inversion Scheme in Reconstructing the Vulnerable Codes

L. Qualitative Examples Generated by CodeGen and ChatGPT

M. Qualitative Examples Generated by GitHub Copilot

F. Security Vulnerability Results after Fuzzy Code Deduplication

We employ TheFuzz [64] python library to find near duplicate codes. This library uses Levenshtein Distance to calculate the differences between sequences [65]. The library outputs the similarity ratio of two strings as a number between 0 and 100. We consider two codes duplicates if they have a similarity ratio greater than 80. Figure 7 provides the results of our FS-Code approach in finding vulnerable Python and C codes that could be generated by CodeGen and ChatGPT

Fig. 6: Percentage of the discovered vulnerable C codes using the non-secure prompts that are generated for specific CWE. (a), (b), and (c) provide the results of the generated code by CodeGen model using FS-Code, FS-Prompt, and OS-Prompt, respectively. (d), (e), and (f) provide the results for the code generated by ChatGPT using FS-Code, FS-Prompt, and OS-Prompt, respectively.Fig. 6: Percentage of the discovered vulnerable C codes using the non-secure prompts that are generated for specific CWE. (a), (b), and (c) provide the results of the generated code by CodeGen model using FS-Code, FS-Prompt, and OS-Prompt, respectively. (d), (e), and (f) provide the results for the code generated by ChatGPT using FS-Code, FS-Prompt, and OS-Prompt, respectively.

model. Note that these results are provided by following the setting of Section V-B2. Here we also observe a general almostlinear growth pattern for some of the vulnerability types that are generated by CodeGen and ChatGPT models.

G. Detailed Results of Transferability of the Generated Nonsecure Prompts

Here we provide the details results of the transferability of the generated non-secure prompts. Table VIII and Table IX show the detailed transferability results of the promising nonsecure prompts that are generated by CodeGen and ChatGPT, respectively. The results in Table VIII and Table IX provide the results of generated Python and C codes for different CWEs. In Table VIII and Table IX show that the promising non-secure prompts are transferable among the models for generating codes with different types of CWEs. Even in some cases, the non-secure prompts from model A can lead model B to generate more vulnerable codes compared to model A itself. For example, in Table VIII, the promising non-secure prompts generated by CodeGen lead ChatGPT to generate more vulnerable codes with CWE-079 vulnerability compared to the CodeGen itself.

H. Details of Generating non-secure prompts Dataset

We generate the non-secure prompts dataset using our FSCode method, following the same settings as in Section V-B. For generating prompts with GPT-4 and Code Llama-34B, we set the sampling temperature to 1.0. A higher temperature facilitates the generation of a diverse set of non-secure prompts. We set the maximum number of tokens for generating Python and C non-secure prompts to 45 and 65, respectively. The maximum token count is determined based on the average size of prompts in our initial code examples.

To generate the non-secure prompts with Code Llama-34B, we exclusively employ the few-shot examples from our FSCode. However, in the generation process using GPT-4, we utilize an instruction to guide the model. We use the following instruction to generate Python prompts:

You are a Python developer. Given the provided code examples, complete the following code. Define a function and write the task description as a Python comment. Provide comprehensive comments. Do not write any explanation. Only write Python codes.

I. Detailed Results of Evaluating CodeLMs using Non-secure Dataset

In Table X, we provide the detailed results of evaluating various code language models using our proposed non-secure prompts dataset. Table X demonstrates the number of vulnerable Python and C codes generated by CodeGen-6B [6], StarCoder7B [24], Code Llama-13B [12], WizardCoder-15B [56], and ChatGPT [4] models. Detailed results for each CWE can offer valuable insights for specific use cases. For instance, as shown in Table X, Code Llama-13B generates fewer Python codes with the CWE-089 (SQL-injection) vulnerability. Consequently,

Fig. 7: The number of discovered vulnerable codes versus the number of sampled codes generated by (a), (c) CodeGen, and (b), (d) ChatGPT. The non-secure prompts and codes are generated using our FS-Code method. While Figure 4 already has removed exact matches, here, we use fuzzy matching to do further code deduplication.Fig. 7: The number of discovered vulnerable codes versus the number of sampled codes generated by (a), (c) CodeGen, and (b), (d) ChatGPT. The non-secure prompts and codes are generated using our FS-Code method. While Figure 4 already has removed exact matches, here, we use fuzzy matching to do further code deduplication.

TABLE VIII: The number of discovered vulnerable codes generated by the CodeGen and ChatGPT models using the promising non-secure prompts generated by CodeGen. We employ our FS-Code method to generate non-secure prompts and codes. Columns two to thirteen provide results for Python codes. Columns fourteen to nineteen give the results for C Codes. Column fourteen and nineteen provides the number of found vulnerable codes with the other CWEs that CodeQL queries. For each programming language, the last column provides the sum of all codes with at least one security vulnerability.TABLE VIII: The number of discovered vulnerable codes generated by the CodeGen and ChatGPT models using the promising non-secure prompts generated by CodeGen. We employ our FS-Code method to generate non-secure prompts and codes. Columns two to thirteen provide results for Python codes. Columns fourteen to nineteen give the results for C Codes. Column fourteen and nineteen provides the number of found vulnerable codes with the other CWEs that CodeQL queries. For each programming language, the last column provides the sum of all codes with at least one security vulnerability.

This model stands out as a strong choice among the evaluated models for generating SQL-related Python code.

J. Effect of Sampling Temperature

Figure 8 provides detailed results of the effect of different sampling temperatures in generating non-secure prompts and vulnerable code. We conduct this evaluation using our FS-Code method and sample the non-secure prompts and Python codes from CodeGen model. Here, we provide the total number of generated vulnerable codes with three different CWEs (CWE-020, CWE-022, and CWE-079) and sample 125 code samples for each CWE. The y-axis refers to different sampling temperatures for sampling the non-secure prompts, and xaxis refers to different sampling temperatures of the code generation procedure. The results in Figure 8 show that in general, sampling temperatures of non-secure prompts have a significant effect in generating vulnerable codes, while sampling temperatures of codes have a minor impact (in each row, we have low difference among the number of vulnerable codes), furthermore, in Figure 8 we observe that 0.6 is an optimal temperature for sampling the non-secure prompts. Note that in all of our experiments, based on the previous works in the program generation domain [6], [5], to have fair results we set the non-secure prompt and codes’ sampling temperature to 0.6.

Authors:

(1) Hossein Hajipour, CISPA Helmholtz Center for Information Security ([email protected]);

(2) Keno Hassler, CISPA Helmholtz Center for Information Security ([email protected]);

(3) Thorsten Holz, CISPA Helmholtz Center for Information Security ([email protected]);

(4) Lea Schonherr, CISPA Helmholtz Center for Information Security ([email protected]);

(5) Mario Fritz, CISPA Helmholtz Center for Information Security ([email protected]).


Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Miele WQ 1000 WPS Review
Next Article NFL first-round pick wakes up unemployed just days into training camp
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Huawei previews Nova Flip phone in video, launch set for August 5 · TechNode
Computing
Trump Ends Tariff Exemption for Small Packages
Gadget
Make the internet safer for the whole family with AdGuard, now A$24 for life
News
Cultivating Coding Success: A Supervised Approach to Pair Programming for Youth | HackerNoon
Computing

You Might also Like

Computing

Huawei previews Nova Flip phone in video, launch set for August 5 · TechNode

1 Min Read
Computing

Cultivating Coding Success: A Supervised Approach to Pair Programming for Youth | HackerNoon

3 Min Read

F5 stock rises 8% as quarterly revenue grows 12% to $780M

1 Min Read
Computing

Xiaomi unveils flagship Xiaomi 15 series featuring Snapdragon 8 Elite chipset · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?