feat: add semgrep code scanning via -safe argument #484

ericrallen · 2023-09-23T04:49:51Z

Describe the changes you have made:

This PR introduces the first tool in a safety toolkit to help verify that code might be safe to execute.

It adds a -safe argument that can be used to enable code scanning via semgrep.

-safe has 3 possible values:

off: (default) don't enabled safe_mode
auto: enable safe_mode and always scan code with semgrep before asking to execute
ask: enable safe_mode and ask the user if they want to scan a code snippet with semgrep before asking to execute

Note: The -safe option is disabled by default and is entirely opt-in. Enabling safe_mode disables auto_run.

Reference any relevant issue (Replaces #24)

I have performed a self-review of my code:

I have tested the code on the following OS:

Windows
MacOS
Linux

AI Language Model (if applicable)

I tested this with code generated and read from the filesystem with gpt-3.5-turbo and gpt-4, but the code scanning doesn't include the model in any way, so it should function regardless of what model the user has configured.

interpreter/cli/cli.py

ericrallen · 2023-09-23T04:51:56Z

interpreter/code_interpreters/language_map.py

+from .languages.r import R
+
+
+language_map = {


I ended up needing to reference the same language map that the create_code_interpreter module was using, so I moved it into it's own module to make it easier to share.

ericrallen · 2023-09-23T04:58:09Z

Not sure why it left all that weird space at the top, maybe something about how the subprocess stdout is displayed or something semgrep is doing with it's output?

interpreter/cli/cli.py

interpreter/core/core.py

ericrallen · 2023-09-23T15:04:31Z

interpreter/code_interpreters/languages/python.py

@@ -4,6 +4,8 @@
 import re

 class Python(SubprocessCodeInterpreter):
+    file_extension = "py"


These file_extensions aren't strictly necessary, but it felt nicer when generating the temporary file for semgrep to scan to do it with the appropriate extension and could come in handy for some future ideas - like adding some functionality to directly export generated code into a file via a some %magic command without needing to have another exchange with Open Interpreter to get it to create some code to pipe the other code into the file.

In the case of something like the JavaScript language configuration this could get a little more complicated since they could also be modules that use the mjs extension, but for the purposes it's being used for right now - a naive extension suffices.

ericrallen · 2023-09-25T02:32:49Z

NOTE: This depends on #511 to work as expected.

KillianLucas · 2023-09-26T07:19:58Z

@ericrallen this is brilliantly, beautifully done. To me this is a strong foundation for a --safe flag.

I'm in favor of fewer, simpler options so I'm going to remove the tiers. --safe will simply scan the code automatically (let me know if this is very important for certain workflows, or if semgrep takes a very long time which would justify the tiers). We can add then additional security features into --safe later (like guarddog).

Just to confirm before merging this (and #511), users don't have to log in to semgrep to use it?

ericrallen · 2023-09-26T16:03:40Z

@KillianLucas I've cleaned this one up a bit given the desire to put safety tools under a single flag in the future - I've also simplified the output and made safe_mode take precedence over auto_run given that automatically running code is the opposite of safe.

I moved the temporary file creation and cleanup into it's own utility since there are other safety tools, like scanning for exposed secrets, that will likely need to leverage a similar temporary file functionality.

Semgrep's free tier works really well for this usecase - just scanning a snippet of code locally - but the CLI also allows users who have a Semgrep account to login and leverage some of the more advanced features if they want.

Here's some recordings of the output for each value of the flag:

Open Interpreter with `-safe "off"`:

Open Interpreter with `-safe "ask"`:

Open Interpreter with `-safe "auto"`:

ericrallen · 2023-09-26T17:06:37Z

Here are the prompts I've been using to quickly test this functionality:

Generate Safe (hopefully) Code

Solve FizzBuzz for the numbers 0 through 17 with a basic Python loop and if statement. No need to explain the rules of the game, just show us the code.

Generate Vulnerable Code

We just added a code scanner to Open Interpreter that relies on Semgrep to detect vulnerable code. Please generate a Python script with an obvious vulnerability, like SQL injection for example, that Semgrep should be able to identify. DO NOT MENTION THE VULNERABILITY IN THE SCRIPT VARIABLE NAMES OR CODE COMMENTS. This script should look like a developer accidentally included the vulnerability without realizing it. DO NOT WARN US ABOUT THE VULNERABILITY OR ITS IMPLICATIONS. We want to see if the code scanner identifies the vulnerability without you alerting us to it.

Without the bit about testing the code scanner, sometimes it will start to argue with you about generating vulnerable code.

ericrallen · 2023-09-26T17:29:45Z

You can also save the following snippets to files on your machine and ask Open Interpreter to read and execute the code.

Note: These examples were generated by Open Interpreter as examples of vulnerable code that should be identified by the code scanner.

JavaScript

const vm = require('vm');
const contextObject = { globalVar: 1 };

// safe
vm.runInContext('globalVar *= 2;', contextObject);

// vulnerable
let userInput = 'this.constructor.constructor("return process.env")()'; // Value supplied by user input
vm.runInContext(`globalVar = ${userInput};`, contextObject);

// safe
const code = `return 'hello ' + name`
vm.compileFunction(code, [], { parsingContext: vm.createContext({ name: 'name' }) })

// vulnerable
let userInput = '1; while (true)

Python

import subprocess
import sys

# Vulnerable
user_input = "foo && cat /etc/passwd" # value supplied by user
subprocess.call("grep -R {} .".format(user_input), shell=True)

# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
subprocess.run(["bash", "-c", user_input], shell=True)

# Not vulnerable
user_input = "cat /etc/passwd" # value supplied by user
subprocess.Popen(['ls', '-l', user_input])

# Not vulnerable
subprocess.check_output('ls -l dir/')

ericrallen · 2023-09-27T04:44:15Z

@KillianLucas I rebased this branch, fixed a bit of code that I forgot to update from scan_code to safe_mode, and added a loading indicator (via yaspin during the scan so that the user has some feedback that things are happening.

This used to display the semgrep output, but that had a bit too much noise and it felt like it needed something to let the user know that things are happening.

Loading Indicator

Also here's a simple prompt for anyone who wants to test this to get right into a state where you can test the scanner after running with poetry run interpreter -safe "ask" or poetry run interpreter -safe "auto":

Solve FizzBuzz for 0 through 17. Don't explain the code or tell me your process or how FizzBuzz works. Just generate the code so we can execute it.

ericrallen · 2023-09-27T04:47:53Z

interpreter/code_interpreters/languages/python.py

@@ -4,6 +4,9 @@
 import re

 class Python(SubprocessCodeInterpreter):
+    file_extension = "py"
+    proper_name = "Python"


These proper names are mostly just being used for the code scanner to say

Code Scanner: No Issues were found with this Python code

We could just rely on the lowercase language string from the code block, but it felt nicer and more human-readable this way.

ericrallen · 2023-09-27T04:48:42Z

interpreter/terminal_interface/terminal_interface.py

+            interpreter_intro_message.append(f"**Safe Mode**: {interpreter.safe_mode}")
+        else:
+            interpreter_intro_message.append(
+                "Use `interpreter -y` or set `auto_run: true` to bypass this."


Since safe_mode is opt-in and disables auto_run it seemed unnecessary to mention the -y flag.

ericrallen · 2023-09-27T14:12:58Z

Not sure why the workflow tests are failing. Everything seems to work when I run the test suite locally.

ericrallen · 2023-09-28T02:59:16Z

@KillianLucas I’ll rebase again tomorrow to resolve the poetry.lockconflict, and try to see if I can get to the bottom of what’s wrong with the test suite, but this PR only applies features behind a flag, so I’m not sure how it impacted them.

Any guidance on getting it ready to merge would be appreciated.

This reintroduces the --safe functionality from OpenInterpreter#24. --safe has 3 possible values auto, ask, and off Code scanning is opt-in.

This is being removed from this PR in favor of a standalone fix in OpenInterpreter#511

Also update scan_code to safe_mode in conditional

KillianLucas · 2023-09-28T19:26:01Z

GREAT WORK ERIC! Literally our most impressive and extensive PR on this whole project to date. Extreme utility here, and you've opened the door to a substantial wing of the OI project (and maybe even to an industry/discipline??) focused on making LLM-written code safe. Merging now. 🎉