Is Your AI-Generated Code Really Secure? | HackerNoon

Software development and programming, once regarded as endeavors that required deep expertise, can now be done by anyone using natural language. A feature that used to take days or months to develop can now be developed in minutes or hours thanks to code from an AI model. For example, OpenAI Codex and Google BERT are trained on programming web blogs, Stack overflow questions, etc.

These intelligent models create code through mathematical probabilities while they are also known to hallucinate and present false information. A research done by academia claims that AI code generation is the leading cause for top 10 vulnerabilities and nearly 40% of code has security bugs. Many of the leading players, along with new SaaS providers are leveraging AI for making their offerings smarter. SaaS programmers too, must be more knowledgeable about AI-based SaaS tools.

What Makes AI-Generated Code Unsafe?

Following programming standards and code quality dictates software safety. But, AI models are trained on every bit of information available on the internet. The code quality, reliability, security, and more might differ from that which is generated by human developers. A model trained on web development examples may contain poor data validation practices, for instance. This lack of validation can lead to security issues when the model generates code that adopts the same poor practices.

5 Indicators That Suggest Code Contains Security Weaknesses

No matter the size (million or billion of parameters) the mоdels are known to hallucinate and make incorrect predictions. When a typical developer sees the code that an AI produces, they will miss the subtle but serious security vulnerabilities. However, for a developer who has complete knowledge of design and development patterns, flaws are a review away from being identified. Developers can leverage these patterns to discover vulnerabilities and align with SaaS security best practices.

1. Type Inference and Input Validations are Not Enforced

Modern frameworks and libraries rely heavily on interface/enum for inference and validation. This guarantees that the code does its job accurately and imposes security. AI-generated code won’t infer unless we direct it. Even after crafting a careful prompt, type mismatch and validation enforcement may not match the use case. To locate and amend code mismatches, developers must be well aware of the domain and business requirements.

def reciprocal(user_input):
     # Insecure implementation with no type inference or validation
     result = 100 / user_input
     return result

Programs share objects in Public/Private/Protected ways. Higher order functions and classes inherit object state by accessing public/protected variables directly to perform computations. If something is done incorrectly in implementation or execution, security or performance bottlenecks may easily occur. SaaS developers must implement their state and context management logic appropriately and review it for correct and safe use.

class InsecureClass:
    def __init__(self, owner, balance, password):
        self.owner = owner  # Public attribute
        self._balance = balance  # Protected attribute
        self.__password = password  # Private attribute

    # Public def
    def get_balance(self):
        return self._balance

    # Protected def
    def _update_balance(self, amount):
        self._balance += amount

    # Private def
    def __validate_password(self, input_password):
        return self.__password == input_password

    # Insecure def exposing private data
    def insecure_password_exposure(self):
        return self.__password

Services share and receive information over the network. These days, secure connectivity and data handling have been crucial to the success of cloud-based systems. When reading, processing, and sharing sensitive data on organizations through distributed data networks, strong protocols and security techniques must be in place to prevent data interception. Using AI, the SaaS developer has to implement every single aspect of the architecture in full-fledged applications.

#Insecure Data Sharing
@app.route("/user/<int:user_id>", methods=["GET"])
def get_user(user_id):
    user = users.get(user_id)
    if user:
        return jsonify(user)  # All user data exposed, including secrets

# Insecure Data Handling
@app.route("/update_email", methods=["POST"])
def update_email():
    data = request.json()
    user_id = data.get("user_id")
    new_email = data.get("new_email")
    if user_id in users:
        users[user_id]["email"] = new_email  # No validation of new_email
        return jsonify({"message": "Email updated successfully"})

4. Inadequate Secrets and Auth Handling

In today’s cyber sensitive world, tight RBAC implementation is a must for access control to secure identity while maintaining privacy and compliance. When the code is generated by an LLM, by default there will be a mechanism that connects and authenticates with the auth provider with basic implementation. Simple solutions are not enough to stay secure with emerging cyber threats. While adding custom functionalities developers must test thoroughly to ensure security and auth handling is strongly implemented.

# Insecure authentication
@app.route("/login", methods=["POST"])
def login():
    data = request.json()
    email = data.get("email")
    password = data.get("password")
    for user_id, user in users.items():
        if user["email"] == email and user["password"] == password:
            return jsonify({"message": "Login successful", "user_id": user_id})

5. Outdated Dependencies with Deprecated Functionality Usage

AI programming is being directed by libraries and frameworks made by the community and open-source. People support new technology by using these promising tools and creating new ones. The data these models were trained on is not up to date and the model capabilities are frozen, and so is their knowledge. With the development of technology, a lot of features will become obsolete and some libraries won’t be relevant to current needs. A SaaS developer is tasked with the review and usage of valid dependencies to ensure functionality and security.

import md5  # Outdated library

def insecure_hash_password(password):
    # Insecure password hashing done using the deprecated MD5 algorithm.
    return md5.new(password).hexdigest()

Tips to Make AI-Generated Code Safe to Use

The advanced coding capabilities of Large Language Models is due to their extensive mathematical calculations. No fancy techniques are needed to make it compliant with security and programming standards. We can use these simple checks to make AI-generated code safe and compliant with standards:

Code review with security and architecture teams should be a standard part of your life cycle.
Integrate automated security testing and validation steps in version control tools.
Include dependency and compliance checks in testing KPIs.
Adopt Zero-Trust architecture with static and dynamic security testing tools.
Leverage DevSecOps practices and shadow AI.

Handling Unsafe AI-Generated Code with a Simple Github Action

No matter how carefully we review and audit the code, chances of human error are always there. Relying solely on manual audits is not enough as we need to have some predefined checks that can test and validate the code as soon as it enters the version control system. What better check than to add a Github action which automatically runs security and quality checks when a PR is raised.

name: Simple Security Checks for AI generated Code

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
jobs:
  security-and-quality-check:
    runs-on: ubuntu-latest
    Steps:

      - name: Repository checkout
        uses: actions/checkout@v3

      - name: Python setup
        uses: actions/setup-python@v4
        with:
          python-version: ">=3.9"

      - name: Dependency installation
        run: |
          python -m pip install --upgrade pip
          pip install bandit pytest

      - name: Identifying insecure libraries and patterns
        run: |
          echo "Checking for insecure patterns..."
          if grep -r "md5.new(" .; then
            echo "ERROR: Insecure MD5 hashing detected. Use hashlib.sha256 or bcrypt instead."
            exit 1
          fi
          echo "No insecure patterns detected."

      - name: Scanning for security vulnerabilities
        run: |
          echo "Running Bandit security scanner..."
          bandit -r .

      - name: Running unit tests
        run: |
          echo "Running unit tests..."
          pytest test/unit --cmodopt=local

      - name: Notifying on failure
        if: failure()
        run: |
          send_slack_notification(“Unsafe code merge detected, fix immediately”)

Conclusion

Large language models are quite useful tools for SaaS developers to generate code and information with natural prompts. However, they pose security risks and sometimes deliver non-performant code that doesn’t suit enterprise needs. SaaS developers must be very careful when using these tools and implementing AI-generated code for real-life use cases. This practical guide focuses on the factors that arise and influence security posture while showing how to overcome these challenges.

Is Your AI-Generated Code Really Secure? | HackerNoon

What Makes AI-Generated Code Unsafe?

5 Indicators That Suggest Code Contains Security Weaknesses

1. Type Inference and Input Validations are Not Enforced

4. Inadequate Secrets and Auth Handling

5. Outdated Dependencies with Deprecated Functionality Usage

Tips to Make AI-Generated Code Safe to Use

Handling Unsafe AI-Generated Code with a Simple Github Action

Conclusion

Leave a Reply Cancel reply

Stay Connected

Latest News

Meta’s Ai Chatbot is coming to europe, with limitation

Apple and Samsung in trouble? This camera phone is a total game changer!

Everyone can see the beavers – but can you spot the 5 differences in 12 seconds?

👨🏿‍🚀 Daily – A Valu-ed listing |

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

What Makes AI-Generated Code Unsafe?

5 Indicators That Suggest Code Contains Security Weaknesses

1. Type Inference and Input Validations are Not Enforced

2. Non-Standard State and Context Sharing Between Classes/Objects

3. Weak Implementation of Data Handling and Sharing Techniques

4. Inadequate Secrets and Auth Handling

5. Outdated Dependencies with Deprecated Functionality Usage

Tips to Make AI-Generated Code Safe to Use

Handling Unsafe AI-Generated Code with a Simple Github Action

Conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News