Detecting LLM-Generated Code

Large language models (LLMs) are a type of artificial intelligence that generates text, translates languages, writes different kinds of creative content, creates net-new artistic works, and answers questions in an informative way. They are trained on massive datasets of text and code, and can learn to produce human-quality output. Because of how they import massive troves of data and create similar content, they can then be fairly formulaic in their output.

A few things to look for to help determine if code was written by an LLM:

Use of natural language: LLMs often use natural language in their code, such as using complete sentences and paragraphs. This can be a clue that the code was written by an LLM.
Unusual code structures: LLMs may use unusual code structures that are not typically used by humans. This can be another clue that the code was written by an LLM.
Lack of comments: LLMs often do not include comments in their code. This can make it difficult to understand what the code is doing, and can be another clue that the code was written by an LLM.
Look for code that is grammatically correct but lacks style. LLMs are trained on large datasets of text, and they often learn to produce code that is grammatically correct but lacks the style and flair that a human programmer would use.
Look for code that is repetitive. LLMs can sometimes generate code that is repetitive, as they may not be able to identify the best way to express an idea.
Look for code that is overly complex. LLMs can sometimes generate code that is overly complex, as they may not be able to identify the simplest way to solve a problem.

If you are suspicious that a piece of code was written by an LLM, you can use a number of tools to help you verify your suspicions. There are a number of online tools that can analyze code and identify patterns that are common in code written by LLMs. You can also use a code review tool to have a human expert look at the code and give you their opinion on whether or not it was written by an LLM. It is important to note that no single tool or technique can definitively identify code that was written by an LLM. However, by looking for the signs listed above, you can increase your chances of making the correct identification. For example, when we add lots of sanity checking into code we inherently make it look overly complex, and yet it’s necessary.

It’s also worth mentioning that LLM-generated code will follow specific design patterns. For example, running some tests in order to write this article, it was obvious that the way Google Bard defines arguments in python is very specific and any python script generated by Bard will have the same block of commented text that wraps around the code itself. So let’s get a tad bit meta and use Google Bard to write a script to detect if code was written by an LLM like Google Bard or ChatGPT:

import re

def is_llm_code(file_path):
  """
  Returns True if the file at `file_path` contains code that was written by an LLM.
  Args:
    file_path: The path to the file to check.
  Returns:
    True if the file contains code that was written by an LLM, False otherwise.
  """

  with open(file_path, "r") as f:
    code = f.read()

  # Look for patterns that are common in code written by LLMs.
  patterns = [
      re.compile(r"^import"),
      re.compile(r"^class"),
      re.compile(r"^def"),
      re.compile(r"^for"),
      re.compile(r"^while"),
      re.compile(r"^if"),
      re.compile(r"^else"),
      re.compile(r"^try"),
      re.compile(r"^except"),
      re.compile(r"^finally"),
  ]

  for pattern in patterns:
    if pattern.search(code):
      return True
  return False

if __name__ == "__main__":
  file_path = "/Users/charlesedge/cvemacoscheck.py"
  if is_llm_code(file_path):
    print("The file contains code that was written by an LLM.")
  else:
    print("The file does not contain code that was written by an LLM.")

This script will check for common patterns that are found in LLM-generated code. If any of these patterns are found, the script will return True, indicating that the code was likely written by an LLM. Otherwise, the script will return False. It is important to note that this script is not perfect and may not be able to accurately detect all LLM-generated code. LLMs also tend to import functions from many of the same frameworks. Some additional logic that could go into a script to detect the use of the following modules:

`random` module
`numpy` module
`pandas` module
`matplotlib` module
`seaborn` module
`scikit-learn` module

These modules are commonly used in machine learning and data science, which are two areas where LLMs are often used. By adding these patterns to your script, you can improve its accuracy at detecting LLM-generated code. Once identified, though, what do we want to do with that code, and what are the use cases we’d actually use automatically generated code. Let’s start with a few use cases:

Generating new features for software applications.
Automating tasks that are currently done manually by developers and systems administrators.
Creating new programming languages.
Writing microservices that a developer can then call to get more done with less unique code. In other words, if we break down our code into really digestable chunks, we can punt the easy stuff so we can spend more time on the hard stuff. Get more done faster, or augmenting human intellect.

There are a few things you should do with code that was generated by an LLM before you use it in production:

Test the code thoroughly: LLMs can generate code that is grammatically correct and even semantically correct. However, it is important to test the code thoroughly to make sure that it works as expected. You should test the code on a variety of inputs and outputs to make sure that it is robust and reliable.
Review the code: Even if the code passes all of your tests, it is still a good idea to have a human expert review the code before you use it in production. A human expert can identify any potential security vulnerabilities or bugs in the code.
Document the code: It is important to document the code so that you can understand how it works and how to maintain it. The documentation should include a description of the code, as well as any assumptions that were made when the code was generated.
Monitor the code: Once you start using the code in production, it is important to monitor it to make sure that it is working as expected. You should monitor the code for any errors or performance issues.
Use the code as a starting point: LLMs can generate code that is grammatically correct and even semantically correct. However, the code may not be optimal. You can use the code as a starting point and then optimize it for your specific needs.
Use the code in conjunction with other tools: LLMs can generate code that is useful for a variety of purposes. However, they are not a replacement for other tools. You can use the code in conjunction with other tools, such as code editors and debuggers, to help you create high-quality software.
Be aware of the limitations of LLMs: LLMs are still under development, and they have some limitations. For example, they may not be able to generate code that is as efficient or as secure as code that was written by a human. You should be aware of the limitations of LLMs and use them accordingly.
Run code through a security scanner: This is good for all code, but particularly to make sure good practices were used with LLM-generated code.
Write test cases: We should write test cases for all code, but specifically it’s good to do this when an LLM is used.
Run it through a plagiarism detector: We don’t want to accidentally step on anyone elses intellectual property. If you can get an extra developer worth of time by using LLMs, then there should be plenty of time and money left over for a quick and dirty API-level integration with a plagiarism detector in your devops flow.

So if it’s not obvious, I’m in favor of using LLMs to get more done, quicker. However, as with creative content, do so with care. Plan on a parreto rule application, so maybe the LLM can be leveraged to get 80% of code ready to go, but then spend some of the time anticipated – to make it safe for people. This applies even more heavily for environments that use scripts to talk to a Mac. For example, if you use a device management tool that runs scripts on devices and some insecure piece of LLM code is used, the potential impact could be devastating – best case scenario, the device gets wiped, worst case it’s used to exfiltrate data.