Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noseyparker version 16 and 22 support for git history and without git history scans #11615

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/content/en/changelog/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ Here are the release notes for **DefectDojo Pro (Cloud Version)**. These release

For Open Source release notes, please see the [Releases page on GitHub](https://github.com/DefectDojo/django-DefectDojo/releases), or alternatively consult the Open Source [upgrade notes](../../open_source/upgrading/upgrading_guide).

## Jan 21, 2025: v2.42.2

- **(Classic UI)** Corrected link to Smart Upload form.
- **(CLI Tools)** Fixed issue with .exe extensions not getting added to Windows binaries
- **(Findings)** `Mitigated` filter now uses datetime instead of date for filtering.
- **(OAuth)** Clarified Azure AD labels to better align with Azure's language. Default value for Azure Resource is now set. <span style="background-color:rgba(242, 86, 29, 0.5)">(Pro)</span>
- **(RBAC)** Request Review now applies RBAC properly with regard to User Groups.

## Jan 13, 2025: v2.42.1

- **(API)** Pro users can now specify the fields they want to return in a given API payload. For example, this request will only return the title, severity and description fields for each Finding. <span style="background-color:rgba(242, 86, 29, 0.5)">(Pro)</span>
Expand All @@ -15,6 +23,10 @@ curl -X 'GET' \
'https://localhost/api/v2/findings/?response_fields=title,severity,description' \
-H 'accept: application/json'
```
- **(Findings)** Excel and CSV exports now include tags.
- **(Reports)** Reports now exclude unenforced SLAs from Executive Summary to avoid confusion.
- **(Risk Acceptance)** Simple Risk Acceptances now have a 'paper trail' created - when they are added or removed, a note will be added to the Finding to log the action.
- **(Tools)** ImageTags are now included with AWS SecurityHub and AWS inspector parsers.

## Jan 6, 2025: v2.42.0

Expand Down
8 changes: 4 additions & 4 deletions docs/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
},
"devDependencies": {
"prettier": "^3.3.3",
"vite": "^6.0.0"
"vite": "^6.0.9"
},
"engines": {
"node": ">=20.11.0"
Expand Down
193 changes: 133 additions & 60 deletions dojo/tools/noseyparker/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,85 +17,158 @@ def get_label_for_scan_types(self, scan_type):

def get_description_for_scan_types(self, scan_type):
return "Nosey Parker report file can be imported in JSON Lines format (option --jsonl). " \
"Supports v0.16.0 of https://github.com/praetorian-inc/noseyparker"
"Supports v0.16.0 and v0.22.0 of https://github.com/praetorian-inc/noseyparker"
Copy link
Contributor

@manuel-sommer manuel-sommer Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Himan10 ,
I suggest to close this PR and you just review mine. Reviews are welcome and I will add your suggestions. This PR also targets branch master which is not the right target.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay but could you give me permissions to push changes to your PR ? It looks like I don't have permissions. I reviewed your changes tho but it was failing with scans where --git-history was set to none.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a comment in the code inside the PR and submit the review. (Do a code review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manuel-sommer 👍 on keeping the original #11565 open and closing this one


def get_findings(self, file, test):
"""
Returns findings from jsonlines file and uses filter
to skip findings and determine severity
"""
dupes = {}

self.dupes = {}
# Turn JSONL file into DataFrame
if file is None:
return None
if file.name.lower().endswith(".jsonl"):
# Process JSON lines into Dict
data = [json.loads(line) for line in file]

# Check for empty file
if len(data[0]) == 0:
return []

# Parse through each secret in each JSON line
for line in data:
# Set rule to the current secret type (e.g. AWS S3 Bucket)
try:
rule_name = line["rule_name"]
secret = line["match_content"]
except Exception:
msg = "Invalid Nosey Parker data, make sure to use Nosey Parker v0.16.0"
if line.get("rule_name") is not None and line.get("match_content") is not None:
self.version_0_16_0(line, test)
elif line.get("rule_name") is not None and line.get("finding_id") is not None:
self.version_0_22_0(line, test)
else:
msg = "Invalid Nosey Parker data, make sure to use Nosey Parker v0.16.0 and above"
raise ValueError(msg)

# Set Finding details
for match in line["matches"]:
# The following path is to account for the variability in the JSON lines output
num_elements = len(match["provenance"]) - 1
json_path = match["provenance"][num_elements]

title = f"Secret(s) Found in Repository with Commit ID {json_path['commit_provenance']['commit_metadata']['commit_id']}"
filepath = json_path["commit_provenance"]["blob_path"]
line_num = match["location"]["source_span"]["start"]["line"]
description = f"Secret found of type: {rule_name} \n" \
f"SECRET starts with: '{secret[:3]}' \n" \
f"Committer Name: {json_path['commit_provenance']['commit_metadata']['committer_name']} \n" \
f"Committer Email: {json_path['commit_provenance']['commit_metadata']['committer_email']} \n" \
f"Commit ID: {json_path['commit_provenance']['commit_metadata']['commit_id']} \n" \
f"Location: {filepath} line #{line_num} \n" \
f"Line #{line_num} \n"

# Internal de-duplication
key = hashlib.md5((filepath + "|" + secret + "|" + str(line_num)).encode("utf-8")).hexdigest()

# If secret already exists with the same filepath/secret/linenum
if key in dupes:
finding = dupes[key]
finding.nb_occurences += 1
dupes[key] = finding
else:
dupes[key] = True
# Create Finding object
finding = Finding(
test=test,
cwe=798,
title=title,
description=description,
severity="High",
mitigation="Reset the account/token and remove from source code. Store secrets/tokens/passwords in secret managers or secure vaults.",
date=datetime.today().strftime("%Y-%m-%d"),
verified=False,
active=True,
is_mitigated=False,
file_path=filepath,
line=line_num,
static_finding=True,
nb_occurences=1,
dynamic_finding=False,

)
dupes[key] = finding
else:
msg = "JSON lines format not recognized (.jsonl file extension). Make sure to use Nosey Parker v0.16.0"
msg = "JSON lines format not recognized (.jsonl file extension). Make sure to use Nosey Parker v0.16.0 and above"
raise ValueError(msg)

return list(dupes.values())
return list(self.dupes.values())

def version_0_16_0(self, line, test):
"""Supports noseyparker git history and without git history reports for v0.16"""

rule_name = line["rule_name"]
#secret = line["match_content"]
for match in line["matches"]:
# The following path is to account for the variability in the JSON lines output
secret = match.get("snippet", {}).get("matching")
num_elements = len(match["provenance"]) - 1
json_path = match["provenance"][num_elements]
line_num = match["location"]["source_span"]["start"]["line"]
description = f"Secret found of type: {rule_name} \n" \
f"SECRET starts with: '{secret}' \n" \
f"Line #{line_num} \n" # initially for the secret, we were showing only starting 3 letters i.e., secret[:3]

# check if the json_path contains commit history or not (without --git-history scan)
# scanned without git history
if not json_path.get("commit_provenance"):
title = f"Secret(s) Found in Repository"
filepath = json_path["path"]
description += f"Location: {filepath} line #{line_num} \n"
else:
# scanned with git history
title = f"Secret(s) Found in Repository with Commit ID {json_path['commit_provenance']['commit_metadata']['commit_id']}"
filepath = json_path["commit_provenance"]["blob_path"]
description += f"Committer Name: {json_path['commit_provenance']['commit_metadata']['committer_name']} \n" \
f"Committer Email: {json_path['commit_provenance']['commit_metadata']['committer_email']} \n" \
f"Commit ID: {json_path['commit_provenance']['commit_metadata']['commit_id']} \n" \
f"Location: {filepath} line #{line_num} \n" \
f"Line #{line_num} \n"

# Internal de-duplication
key = hashlib.md5((filepath + "|" + secret + "|" + str(line_num)).encode("utf-8")).hexdigest()

# If secret already exists with the same filepath/secret/linenum
if key in self.dupes:
finding = self.dupes[key]
finding.nb_occurences += 1
self.dupes[key] = finding
else:
self.dupes[key] = True
# Create Finding object
finding = Finding(
test=test,
cwe=798,
title=title,
description=description,
severity="High",
mitigation="Reset the account/token and remove from source code. Store secrets/tokens/passwords in secret managers or secure vaults.",
date=datetime.today().strftime("%Y-%m-%d"),
verified=False,
active=True,
is_mitigated=False,
file_path=filepath,
line=line_num,
static_finding=True,
nb_occurences=1,
dynamic_finding=False,

)
self.dupes[key] = finding

def version_0_22_0(self, line, test):
"""Supports noseyparker git history and without git history reports for v0.22"""

rule_name = line["rule_name"]
rule_text_id = line["rule_text_id"]
for match in line["matches"]:
# The following path is to account for the variability in the JSON lines output
secret = match.get("snippet", {}).get("matching", None)
num_elements = len(match["provenance"]) - 1
json_path = match["provenance"][num_elements]
line_num = match["location"]["source_span"]["start"]["line"]
description = f"Secret found of type: {rule_name} \n" \
f"SECRET : '{secret}' \n" \
f"Line #{line_num} \n"

# scanned without git history
if not json_path.get("first_commit"):
title = f"Secret(s) Found in Repository"
filepath = json_path["path"]
description += f"Location: {filepath} line #{line_num} \n"
else:
# scanned with git history
title = f"Secret(s) Found in Repository with Commit ID {json_path['first_commit']['commit_metadata']['commit_id']}"
filepath = json_path["first_commit"]["blob_path"]
description += f"Committer Name: {json_path['first_commit']['commit_metadata']['committer_name']} \n" \
f"Committer Email: {json_path['first_commit']['commit_metadata']['committer_email']} \n" \
f"Commit ID: {json_path['first_commit']['commit_metadata']['commit_id']} \n"

# Internal de-duplication
key = hashlib.md5((filepath + "|" + rule_text_id + "|" + str(line_num)).encode("utf-8")).hexdigest()

# If secret already exists with the same filepath/secret/linenum
if key in self.dupes:
finding = self.dupes[key]
finding.nb_occurences += 1
self.dupes[key] = finding
else:
self.dupes[key] = True
# Create Finding object
finding = Finding(
test=test,
cwe=798,
title=title,
description=description,
severity="High",
mitigation="Reset the account/token and remove from source code. Store secrets/tokens/passwords in secret managers or secure vaults.",
date=datetime.today().strftime("%Y-%m-%d"),
verified=False,
active=True,
is_mitigated=False,
file_path=filepath,
line=line_num,
static_finding=True,
nb_occurences=1,
dynamic_finding=False,
)
self.dupes[key] = finding



Loading