-
Notifications
You must be signed in to change notification settings - Fork 1
2. SAST vs Vulncov demo
The demo/src folder contains a dummy Flask app that checks if a user inputs the correct credentials.
If you look at the code, you'll find the following vulnerabilities marked with comments:
@app.route('/login', methods=['GET'])
def login():
username = request.args.get('username')
password = request.args.get('password')
# This will NEVER be triggered
if 1==2:
ping()
conn = get_db_connection()
cursor = conn.cursor()
# Vulnerable to SQL injection
query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
cursor.execute(query)
user = cursor.fetchone()
conn.close()
if user:
# Vulnerable to XSS
return f"Welcome {username}!"
else:
return "Invalid credentials!", 403
#@app.route('/ping', methods=['GET'])
def ping():
ip = request.args.get('ip')
# Vulnerable to command injection
command = f"ping -c 1 {ip}"
output = os.popen(command).read()
# Vulnerable to XSS
return f"<pre>{output}</pre>"
As you can see, the ping()
function has its route commented out, so it’s inaccessible externally. It’s also referenced in the login function, but note that it's called only if an impossible condition (if 1==2
) is met, making it still unreachable.
Additionally, there's a folder called demo/tests containing a file with two pytest unit tests. Each test verifies that the credential verification workflow works as expected.
Now, let’s see how the process looks when running a standalone SAST versus using vulncov
.
If we run semgrep
like this:
semgrep --config 'p/python' --json --quiet -o /tmp/semgrep_results.json demo/src/
And then filter the relevant lines from the JSON file:
cat /tmp/semgrep_results.json | jq | grep check_id
"check_id": "python.django.security.injection.sql.sql-injection-using-db-cursor-execute.sql-injection-db-cursor-execute",
"check_id": "python.django.security.injection.sql.sql-injection-using-db-cursor-execute.sql-injection-db-cursor-execute",
"check_id": "python.flask.security.injection.tainted-sql-string.tainted-sql-string",
"check_id": "python.flask.db.generic-sql-flask.generic-sql-flask",
"check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
"check_id": "python.lang.security.dangerous-system-call.dangerous-system-call",
"check_id": "python.flask.os.tainted-os-command-stdlib-flask-secure-if-array.tainted-os-command-stdlib-flask-secure-if-array",
"check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
"check_id": "python.django.security.injection.raw-html-format.raw-html-format",
"check_id": "python.flask.security.injection.raw-html-concat.raw-html-format",
The main issue here is that it's flagging functions like dangerous-system-call
which is in ping()
function (dead code), as explained in the Demo section.
Simply run this command:
vulncov -p demo/tests -t demo/src/ -e django -o /tmp/vulncov.json -req demo/requirements.txt
Explanation:
-
-p demo/tests
: the pytest folder where the coverage will be extracted. -
-t demo/src/
: the folder containing the target application’s source code. -
-e django
: regex to exclude rules from the output. Since we usedp/python
, it includes different frameworks like Flask and Django, as well as generic rules (lang
). This parameter helps omit those from the output. -
-o /tmp/vulncov.json
: the output file for storingvulncov
's results. -
-req demo/requirements.txt
: the library dependencies required to run the application, since we wantvulncov
to run coverage for us.
When we inspect the output JSON, it filtered the initial 10 Semgrep findings down to just 3! 🎉
cat /tmp/vulncov.json | jq | grep check_id
"check_id": "python.flask.security.injection.tainted-sql-string.tainted-sql-string",
"check_id": "python.flask.db.generic-sql-flask.generic-sql-flask",
"check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
It omitted vulnerabilities from dead code, like dangerous-system-call
, and removed redundant findings, such as one of the directly-returned-format-string
checks and the Django references.
The output includes information on which test cases can trigger each vulnerability. Let's review an example:
{
"semgrep": {
"fingerprint": "babc5e12b8a3765aa6b292fbc07947825755a8ef203a0ef83775983593273e5596e3ce2f25dde92ecdaf40847a576c05508c6cb50f0fd37c61cfad9c5e8f2146_0",
"check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
"rule_category": "security",
"vulnerability_class": [
"Cross-Site-Scripting (XSS)"
],
"impact": "MEDIUM",
"message": "Detected Flask route directly returning a formatted string. This is subject to cross-site scripting if user input can reach the string. Consider using the template engine instead and rendering pages with 'render_template()'.",
"lines": " return f\"Welcome {username}!\"",
"vuln_lines": [
43
]
},
"test_cases": [
{
"name": "login_test.test_login_success",
"executed_lines": [
11,
12,
24,
25,
28,
31,
32,
35,
36,
38,
39,
41,
43
],
"matched_lines": [
43
],
"coverage_match_percentage": 100.0
}
]
}
Check out the detail of the Output JSON structure.
As you can see, this third item has only one test case associated with it, login_test.test_login_success
, while the others also have login_test.test_login_failure
. This is because line 43 contains a potential XSS vulnerability, triggered only when a login is successful:
if user:
# Vulnerable to XSS
return f"Welcome {username}!"
else:
return "Invalid credentials!", 403
This way, vulncov
also provides a clue on how to trigger each Semgrep finding, as we have the corresponding test case. 🤓
To get private bug fixes powered by a local LLM, you need to install ollama and pull a model.
In this case, I will be using codellama:latest
:
ollama pull codellama:latest
Serve the model by running:
ollama serve
Finally, in a separate terminal, run vulncov
, passing the URL of the ollama
instance (it runs on port 11434 by default):
vulncov -p demo/tests -t demo/src/ -req demo/requirements.txt -lls http://localhost:11434
This way, the generated JSON output will now include a field called llm_suggested_fix
. Let's see an example for SQL Injection:
Code fix:
```
query = f"SELECT * FROM users WHERE username=? AND password=?"
cursor.execute(query, (username, password))
```
Fix description:
The vulnerability is caused by manually constructing a SQL string using user input. This allows an attacker to inject malicious SQL code that can compromise the database. To fix this vulnerability, we should use parameterized queries instead of concatenating user input into the SQL string. In Python, parameterized queries are available by default in most database engines.
In the suggested fix, we replace the manual construction of the SQL string with a parameterized query. The `?` placeholder is used to represent the user input, and the `cursor.execute()` method takes care of substituting the values for the placeholders. This ensures that any malicious SQL code in the user input is properly escaped and cannot be executed.
Additionally, we can also consider using an object-relational mapper (ORM) such as SQLAlchemy which will protect our queries from SQL injection attacks.