Skip to content

2. SAST vs Vulncov demo

mllamazares edited this page Oct 13, 2024 · 1 revision

Vulnerable Demo App

The demo/src folder contains a dummy Flask app that checks if a user inputs the correct credentials.

If you look at the code, you'll find the following vulnerabilities marked with comments:

@app.route('/login', methods=['GET'])
def login():
    username = request.args.get('username')
    password = request.args.get('password')
    
    # This will NEVER be triggered
    if 1==2:
        ping()

    conn = get_db_connection()
    cursor = conn.cursor()
    
    # Vulnerable to SQL injection
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    cursor.execute(query)
    
    user = cursor.fetchone()
    conn.close()
    
    if user:
        # Vulnerable to XSS
        return f"Welcome {username}!"
    else:
        return "Invalid credentials!", 403

#@app.route('/ping', methods=['GET'])
def ping():
    ip = request.args.get('ip')
    
    # Vulnerable to command injection
    command = f"ping -c 1 {ip}"
    output = os.popen(command).read()
    
    # Vulnerable to XSS
    return f"<pre>{output}</pre>"

As you can see, the ping() function has its route commented out, so it’s inaccessible externally. It’s also referenced in the login function, but note that it's called only if an impossible condition (if 1==2) is met, making it still unreachable.

Additionally, there's a folder called demo/tests containing a file with two pytest unit tests. Each test verifies that the credential verification workflow works as expected.

Now, let’s see how the process looks when running a standalone SAST versus using vulncov.

Using a standalone SAST 🥱

If we run semgrep like this:

semgrep --config 'p/python' --json --quiet -o /tmp/semgrep_results.json demo/src/

And then filter the relevant lines from the JSON file:

cat /tmp/semgrep_results.json | jq | grep check_id
      "check_id": "python.django.security.injection.sql.sql-injection-using-db-cursor-execute.sql-injection-db-cursor-execute",
      "check_id": "python.django.security.injection.sql.sql-injection-using-db-cursor-execute.sql-injection-db-cursor-execute",
      "check_id": "python.flask.security.injection.tainted-sql-string.tainted-sql-string",
      "check_id": "python.flask.db.generic-sql-flask.generic-sql-flask",
      "check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
      "check_id": "python.lang.security.dangerous-system-call.dangerous-system-call",
      "check_id": "python.flask.os.tainted-os-command-stdlib-flask-secure-if-array.tainted-os-command-stdlib-flask-secure-if-array",
      "check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
      "check_id": "python.django.security.injection.raw-html-format.raw-html-format",
      "check_id": "python.flask.security.injection.raw-html-concat.raw-html-format",

The main issue here is that it's flagging functions like dangerous-system-call which is in ping() function (dead code), as explained in the Demo section.

Using vulncov 🪄

Simply run this command:

vulncov -p demo/tests -t demo/src/ -e django -o /tmp/vulncov.json -req demo/requirements.txt

Explanation:

  • -p demo/tests: the pytest folder where the coverage will be extracted.
  • -t demo/src/: the folder containing the target application’s source code.
  • -e django: regex to exclude rules from the output. Since we used p/python, it includes different frameworks like Flask and Django, as well as generic rules (lang). This parameter helps omit those from the output.
  • -o /tmp/vulncov.json: the output file for storing vulncov's results.
  • -req demo/requirements.txt: the library dependencies required to run the application, since we want vulncov to run coverage for us.

When we inspect the output JSON, it filtered the initial 10 Semgrep findings down to just 3! 🎉

cat /tmp/vulncov.json | jq | grep check_id
        "check_id": "python.flask.security.injection.tainted-sql-string.tainted-sql-string",
        "check_id": "python.flask.db.generic-sql-flask.generic-sql-flask",
        "check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",

It omitted vulnerabilities from dead code, like dangerous-system-call, and removed redundant findings, such as one of the directly-returned-format-string checks and the Django references.

The output includes information on which test cases can trigger each vulnerability. Let's review an example:

{
  "semgrep": {
    "fingerprint": "babc5e12b8a3765aa6b292fbc07947825755a8ef203a0ef83775983593273e5596e3ce2f25dde92ecdaf40847a576c05508c6cb50f0fd37c61cfad9c5e8f2146_0",
    "check_id": "python.flask.security.audit.directly-returned-format-string.directly-returned-format-string",
    "rule_category": "security",
    "vulnerability_class": [
      "Cross-Site-Scripting (XSS)"
    ],
    "impact": "MEDIUM",
    "message": "Detected Flask route directly returning a formatted string. This is subject to cross-site scripting if user input can reach the string. Consider using the template engine instead and rendering pages with 'render_template()'.",
    "lines": "        return f\"Welcome {username}!\"",
    "vuln_lines": [
      43
    ]
  },
  "test_cases": [
    {
      "name": "login_test.test_login_success",
      "executed_lines": [
        11,
        12,
        24,
        25,
        28,
        31,
        32,
        35,
        36,
        38,
        39,
        41,
        43
      ],
      "matched_lines": [
        43
      ],
      "coverage_match_percentage": 100.0
    }
  ]
}

Check out the detail of the Output JSON structure.

As you can see, this third item has only one test case associated with it, login_test.test_login_success, while the others also have login_test.test_login_failure. This is because line 43 contains a potential XSS vulnerability, triggered only when a login is successful:

    if user:
        # Vulnerable to XSS
        return f"Welcome {username}!"
    else:
        return "Invalid credentials!", 403

This way, vulncov also provides a clue on how to trigger each Semgrep finding, as we have the corresponding test case. 🤓

Suggesting Fixes with the AppSec Wizard 🧙‍♂️

To get private bug fixes powered by a local LLM, you need to install ollama and pull a model.

In this case, I will be using codellama:latest:

ollama pull codellama:latest

Serve the model by running:

ollama serve

Finally, in a separate terminal, run vulncov, passing the URL of the ollama instance (it runs on port 11434 by default):

vulncov -p demo/tests -t demo/src/ -req demo/requirements.txt -lls http://localhost:11434

This way, the generated JSON output will now include a field called llm_suggested_fix. Let's see an example for SQL Injection:

Code fix:
```
query = f"SELECT * FROM users WHERE username=? AND password=?"
cursor.execute(query, (username, password))
```

Fix description:
The vulnerability is caused by manually constructing a SQL string using user input. This allows an attacker to inject malicious SQL code that can compromise the database. To fix this vulnerability, we should use parameterized queries instead of concatenating user input into the SQL string. In Python, parameterized queries are available by default in most database engines.

In the suggested fix, we replace the manual construction of the SQL string with a parameterized query. The `?` placeholder is used to represent the user input, and the `cursor.execute()` method takes care of substituting the values for the placeholders. This ensures that any malicious SQL code in the user input is properly escaped and cannot be executed.

Additionally, we can also consider using an object-relational mapper (ORM) such as SQLAlchemy which will protect our queries from SQL injection attacks.