Handle sensitive information being inside a list in resource_dict #2178

dbasunag · 2024-10-23T19:40:32Z

Short description:

Current code hash_resource_dict() does not provide a flexible way to hide fields like userData, for virtual machines (it could be present in spec.template.spec.volumes)

More details:

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for reviewer:

Bug:

Summary by CodeRabbit

New Features
- Introduced a new function to securely replace sensitive values in nested dictionaries.
- Updated key representation format for resource management.
- Added a new property to access user data in virtual machine configurations.
Bug Fixes
- Enhanced error handling for invalid inputs in the new function.
Tests
- Implemented a comprehensive test suite for the new sensitive value replacement function.
Chores
- Added a new testing environment configuration for unit tests.

redhat-qe-bot2 · 2024-10-23T19:40:40Z

Report bugs in Issues

The following are automatically added:

Add reviewers from OWNER file (in the root of the repository) under reviewers section.
Set PR size label.
New issue is created for the PR. (Closed when PR is merged/closed)
Run pre-commit if .pre-commit-config.yaml exists in the repo.

Available user actions:

To mark PR as WIP comment /wip to the PR, To remove it from the PR comment /wip cancel to the PR.
To block merging of PR comment /hold, To un-block merging of PR comment /hold cancel.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To cherry pick a merged PR comment /cherry-pick <target branch to cherry-pick to> in the PR.
- Multiple target branches can be cherry-picked, separated by spaces. (/cherry-pick branch1 branch2)
- Cherry-pick will be started when PR is merged
To build and push container image command /build-and-push-container in the PR (tag will be the PR number).
- You can add extra args to the Podman build command
  - Example: /build-and-push-container --build-arg OPENSHIFT_PYTHON_WRAPPER_COMMIT=<commit_hash>
To add a label by comment use /<label name>, to remove, use /<label name> cancel
To assign reviewers based on OWNERS file use /assign-reviewers
To check if PR can be merged use /check-can-merge

Supported /retest check runs

/retest tox: Retest tox
/retest python-module-install: Retest python-module-install
/retest all: Retest all

Supported labels

hold
verified
wip
lgtm

coderabbitai · 2024-10-23T19:40:41Z

Walkthrough

The changes in this pull request introduce a new function, replace_key_with_hashed_value, which recursively searches nested dictionaries to replace specified key values with a hashed representation. The hash_resource_dict method in the Resource class has been updated to utilize this new function, simplifying the process of masking sensitive information. Additionally, the keys_to_hash property has been modified in multiple classes to reflect a new keypath format. A new testing environment has been added to tox.toml, and a test suite for the new function has been implemented in tests/test_unittests.py.

Changes

File	Change Summary
ocp_resources/resource.py	Added method `replace_key_with_hashed_value`; updated `hash_resource_dict` method to use the new function; updated `keys_to_hash` property to change keypath format.
ocp_resources/sealed_secret.py	Updated `keys_to_hash` property to change return values from `["spec..data", "spec..encryptedData"]` to `["spec>data", "spec>encryptedData"]`.
ocp_resources/virtual_machine.py	Added new property method `keys_to_hash` returning a specific path for user data in virtual machine configuration.
tox.toml	Added new testing environment `validate-unittests` with dependencies and commands for running unit tests.
tests/test_unittests.py	Introduced a test suite for `replace_key_with_hashed_value` function, including various tests for functionality and edge cases.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed9e74 and 9a37a9b.

📒 Files selected for processing (2)

tests/test_unittests.py (1 hunks)
tox.toml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_unittests.py
tox.toml

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

ocp_resources/virtual_machine.py (1)

179-181: LGTM! Consider adding documentation.

The implementation correctly identifies the sensitive field that needs to be masked. Consider adding a docstring to explain the purpose of this property and its security implications.
 @property
 def keys_to_hash(self):
+    """
+    Returns a list of keys containing sensitive information that should be masked.
+    
+    Returns:
+        list[str]: List of keys to be masked in the resource dictionary.
+        Currently masks 'userData' which contains sensitive boot-time configuration.
+    """
     return ["userData"]

ocp_resources/resource.py (1)

136-156: LGTM with suggestions for improvements.

The implementation correctly handles nested dictionaries and lists. However, consider these improvements:

Add protection against circular references to prevent stack overflow
Consider making the mask value configurable for flexibility

Here's a suggested improvement:

-def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
+def change_dict_value_to_hashed(
+    resource_dict: Dict[Any, Any],
+    key_name: str,
+    mask: str = "******",
+    _seen: set | None = None
+) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
+        mask: The value to use for masking sensitive data.
+        _seen: Internal parameter to track circular references.
 
     Returns:
         The modified dictionary.
     """
+    if _seen is None:
+        _seen = set()
+
+    # Handle circular references
+    if id(resource_dict) in _seen:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
+        _seen.add(id(resource_dict))
         for key, value in resource_dict.items():
             if key == key_name:
-                resource_dict[key] = "******"
+                resource_dict[key] = mask
             elif isinstance(value, dict):
-                resource_dict[key] = change_dict_value_to_hashed(value, key_name)
+                resource_dict[key] = change_dict_value_to_hashed(value, key_name, mask, _seen)
             elif isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    value[key_list] = change_dict_value_to_hashed(value_list, key_name, mask, _seen)
     return resource_dict

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0c5830d and dafe9d0.

📒 Files selected for processing (3)

ocp_resources/resource.py (2 hunks)
ocp_resources/sealed_secret.py (1 hunks)
ocp_resources/virtual_machine.py (1 hunks)

🧰 Additional context used

🔇 Additional comments (3)

ocp_resources/sealed_secret.py (1)
53-53: Verify the impact of simplified key paths.

The simplified key paths in keys_to_hash might affect how sensitive data is masked in the resource dictionary. Let's verify this change works correctly with the updated hash_resource_dict implementation.

Consider adding a docstring to the keys_to_hash property to document:

The expected format of the keys

How these keys are used in the resource dictionary masking process

Why the "spec.." prefix was removed

This will help future maintainers understand the design decision and prevent accidental reintroduction of the prefix.

✅ Verification successful

Let me gather more information about how the key paths are actually processed in the codebase.

This will help us understand:

How the key paths are processed in the hashing function

If there are tests verifying this functionality

Where the SealedSecret class is used in the codebase

Let me check one more thing to verify the test coverage and actual usage patterns.

This will help us understand:

The test coverage for this functionality

The complete context of SealedSecret class

How these specific keys are actually used in hashing

Based on the gathered information, I can now provide the final response:

Simplified key paths in keys_to_hash are correct and safe

The change from ["spec..data", "spec..encryptedData"] to ["data", "encryptedData"] is correct because:

The change_dict_value_to_hashed function performs a recursive dictionary search for exact key matches, making the "spec.." prefix unnecessary

The SealedSecret class directly accepts data and encrypted_data as top-level parameters in its constructor

The implementation aligns with the actual structure of the resource where sensitive data is stored

The change maintains security while simplifying the implementation, as the hashing mechanism in hash_resource_dict will still mask all sensitive information correctly.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the usage of keys_to_hash in the codebase
# and ensure it's compatible with the new implementation

# Check if there are any direct references to the old format "spec..data"
echo "Checking for old key format usage..."
rg -l "spec\.\.data|spec\.\.encryptedData" 

# Look for other implementations of keys_to_hash to ensure consistency
echo "Checking other keys_to_hash implementations..."
ast-grep --pattern 'def keys_to_hash'

# Check the Resource class implementation that uses these keys
echo "Checking Resource class implementation..."
rg -A 10 "def hash_resource_dict"
Length of output: 1254

Script:
#!/bin/bash
# Check the implementation of change_dict_value_to_hashed function
echo "Checking change_dict_value_to_hashed implementation..."
rg -A 15 "def change_dict_value_to_hashed"

# Check if there are any tests that verify this functionality
echo "Checking related tests..."
fd -e py test_sealed_secret test_resource --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the actual usage of SealedSecret class
echo "Checking SealedSecret usage..."
rg -l "class SealedSecret"
Length of output: 1542

Script:
#!/bin/bash
# Check for test files with correct path
echo "Checking test files..."
fd -e py -g "*test*" --exec rg -l "keys_to_hash|hash_resource_dict" {}

# Check the complete SealedSecret class implementation
echo "Checking SealedSecret class implementation..."
rg -A 20 "class SealedSecret" ocp_resources/sealed_secret.py

# Check actual usage of hash_resource_dict with these keys
echo "Checking actual usage patterns..."
rg -B 5 -A 5 "hash_resource_dict.*data|hash_resource_dict.*encryptedData"
Length of output: 1177
ocp_resources/resource.py (2)
1198-1198: LGTM! Clean integration of the new function.

The change correctly integrates the new change_dict_value_to_hashed function while maintaining the existing behavior and safety measures (deep copy, condition checks).

136-156: Verify the impact on existing consumers.

Let's verify the impact of these changes on existing code.

Also applies to: 1198-1198

✅ Verification successful

Let me gather more information about the keys_to_hash property since the previous search didn't show its complete usage pattern.

This will help us understand:

Where and how keys_to_hash is defined and used

The usage pattern of hash_log_data which controls the hashing behavior

Any potential impact on existing implementations

Based on the gathered information, I can now provide a final response as we have a clear picture of the usage and impact:

The changes to change_dict_value_to_hashed are safe and properly integrated

The function is used within a well-defined pattern where:

It's only called from the base Resource class's hash_resource_dict method

The hashing behavior is controlled by two factors:

The hash_log_data boolean parameter (defaults to True)

Resource-specific keys_to_hash property implementations

Only specific resource types implement keys_to_hash:

ConfigMap: ["data", "binaryData"]

VirtualMachine: ["userData"]

SealedSecret: ["data", "encryptedData"]

Secret: ["data", "stringData"]

The changes maintain the existing behavior while improving the implementation, and all current consumers are properly integrated with this pattern.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for usages of hash_resource_dict and keys_to_hash property

echo "Checking for direct calls to hash_resource_dict..."
rg -A 2 "hash_resource_dict\("

echo -e "\nChecking for classes that override keys_to_hash property..."
ast-grep --pattern 'class $_ {
  $$$
  @property
  def keys_to_hash($_) {
    $$$
  }
  $$$
}'
Length of output: 1839

Script:
#!/bin/bash
# Let's search for any references to keys_to_hash in the codebase
echo "Searching for keys_to_hash references..."
rg -A 2 "keys_to_hash"

# Also check for hash_log_data since it's used in conjunction
echo -e "\nSearching for hash_log_data references..."
rg -A 2 "hash_log_data"
Length of output: 2647

ocp_resources/virtual_machine.py

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

ocp_resources/resource.py (1)

136-156: Improve function documentation and edge case handling.

The function implementation looks good, but there are a few suggestions for improvement:

Documentation could be enhanced:
- Add a note about in-place modification
- Document the behavior for edge cases
- Add example usage
Consider handling edge cases:
- None values
- Non-dict/non-list objects in nested structures
Performance optimization:
- Consider using a generator for list iteration
- Add early return for empty/None input

Here's how you could improve the implementation:

 def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]:
     """
     Recursively search a nested dictionary for a given key and changes its value to "******" if found.
+    The function modifies the input dictionary in-place.
 
     Args:
         resource_dict: The nested dictionary to search.
         key_name: The key to find.
 
     Returns:
         The modified dictionary.
+
+    Example:
+        >>> d = {"a": {"sensitive": "secret"}, "b": [{"sensitive": "hidden"}]}
+        >>> change_dict_value_to_hashed(d, "sensitive")
+        {'a': {'sensitive': '******'}, 'b': [{'sensitive': '******'}]}
     """
+    if not resource_dict:
+        return resource_dict
+
     if isinstance(resource_dict, dict):
         for key, value in resource_dict.items():
             if key == key_name:
                 resource_dict[key] = "******"
-            elif isinstance(value, dict):
+            elif value is not None and isinstance(value, dict):
                 resource_dict[key] = change_dict_value_to_hashed(value, key_name)
-            elif isinstance(value, list):
+            elif value is not None and isinstance(value, list):
                 for key_list, value_list in enumerate(value):
-                    value[key_list] = change_dict_value_to_hashed(value_list, key_name)
+                    if value_list is not None:
+                        value[key_list] = change_dict_value_to_hashed(value_list, key_name)
     return resource_dict

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dafe9d0 and 462ee06.

📒 Files selected for processing (1)

ocp_resources/resource.py (2 hunks)

🧰 Additional context used

🔇 Additional comments (2)

ocp_resources/resource.py (2)

1199-1199: LGTM! Clean implementation of sensitive data handling.

The modification correctly utilizes the new change_dict_value_to_hashed function to handle sensitive information in resource dictionaries.

1199-1199: Verify handling of sensitive fields across the codebase.

Let's ensure that all sensitive fields, particularly userData in VM specs, are properly handled by the new implementation.

Run the following script to verify the usage:

ocp_resources/resource.py

ocp_resources/sealed_secret.py

ocp_resources/resource.py

Co-authored-by: Meni Yakove <[email protected]>

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)

tests/test_unittests.py (1)

6-20: Consider enhancing the fixture with more edge cases

The fixture provides a good base structure, but consider adding more variations of sensitive data scenarios:

Nested lists within lists

Multiple feature entries

Empty or null values

Special characters in sensitive data

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f931eeb and 1ed9e74.

📒 Files selected for processing (2)

tests/test_unittests.py (1 hunks)
tox.toml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tox.toml

🔇 Additional comments (1)

tests/test_unittests.py (1)

1-72: Consider adding security-focused test cases

Given that this function handles sensitive data, consider adding test cases that verify:

No sensitive data leakage in error messages
Handling of special characters that could be used in injection attacks
Memory cleanup after processing sensitive data

tests/test_unittests.py

tox.toml

myakove · 2024-11-26T14:41:33Z

And please check coderabbitai comments

dbasunag · 2024-11-27T13:54:19Z

/verified

Handle sensitive information being inside a list in resource_dict

dafe9d0

redhat-qe-bot2 added needs-rebase size/S labels Oct 23, 2024

redhat-qe-bot2 requested a review from omrirh October 23, 2024 19:40

redhat-qe-bot2 mentioned this pull request Oct 23, 2024

Handle sensitive information being inside a list in resource_dict - 2178 #2179

Closed

redhat-qe-bot2 added the branch-main label Oct 23, 2024

redhat-qe-bot2 requested review from myakove and rnetser October 23, 2024 19:40

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

ocp_resources/virtual_machine.py Outdated Show resolved Hide resolved

rh-bot-1 added the commented-coderabbitai[bot] label Oct 23, 2024

Merge branch 'main' into hash_cloudinit

462ee06

redhat-qe-bot2 removed commented-coderabbitai[bot] needs-rebase labels Oct 23, 2024

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

redhat-qe-bot2 added the commented-coderabbitai[bot] label Oct 23, 2024

myakove requested changes Oct 25, 2024

View reviewed changes

rh-bot-1 added changes-requested-myakove commented-rnetser commented-dbasunag commented-myakove labels Oct 25, 2024

Update ocp_resources/resource.py

6527412

Co-authored-by: Meni Yakove <[email protected]>

redhat-qe-bot1 added needs-rebase and removed commented-myakove commented-coderabbitai[bot] commented-rnetser commented-dbasunag changes-requested-myakove labels Oct 28, 2024

redhat-qe-bot1 requested a review from myakove October 28, 2024 13:35

redhat-qe-bot removed changes-requested-myakove commented-coderabbitai[bot] labels Nov 25, 2024

redhat-qe-bot requested a review from myakove November 25, 2024 19:28

rename test file and update tox

1ed9e74

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

tests/test_unittests.py Outdated Show resolved Hide resolved

tests/test_unittests.py Outdated Show resolved Hide resolved

tests/test_unittests.py Outdated Show resolved Hide resolved

tests/test_unittests.py Show resolved Hide resolved

rh-bot-1 added the commented-coderabbitai[bot] label Nov 25, 2024

github-actions bot added the Stale label Nov 26, 2024

myakove requested changes Nov 26, 2024

View reviewed changes

tox.toml Outdated Show resolved Hide resolved

redhat-qe-bot2 added the changes-requested-myakove label Nov 26, 2024

redhat-qe-bot2 removed commented-coderabbitai[bot] changes-requested-myakove labels Nov 26, 2024

redhat-qe-bot2 requested a review from myakove November 26, 2024 14:48

redhat-qe-bot added the commented-dbasunag label Nov 26, 2024

rh-bot-1 added the commented-coderabbitai[bot] label Nov 26, 2024

updates based on reviews

d9bfa53

dbasunag force-pushed the hash_cloudinit branch from 5efa9e6 to d9bfa53 Compare November 26, 2024 19:32

Merge branch 'main' into hash_cloudinit

9a37a9b

redhat-qe-bot2 removed commented-dbasunag commented-coderabbitai[bot] labels Nov 26, 2024

redhat-qe-bot added the commented-dbasunag label Nov 26, 2024

redhat-qe-bot1 added the commented-coderabbitai[bot] label Nov 26, 2024

github-actions bot removed the Stale label Nov 27, 2024

myakove approved these changes Nov 27, 2024

View reviewed changes

redhat-qe-bot added the approved-myakove label Nov 27, 2024

redhat-qe-bot2 added the verified label Nov 27, 2024

myakove merged commit 9d20d55 into main Nov 27, 2024
5 of 6 checks passed

myakove deleted the hash_cloudinit branch November 27, 2024 13:54

redhat-qe-bot added the can-be-merged label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle sensitive information being inside a list in resource_dict #2178

Handle sensitive information being inside a list in resource_dict #2178

dbasunag commented Oct 23, 2024 •

edited by coderabbitai bot

Loading

redhat-qe-bot2 commented Oct 23, 2024

coderabbitai bot commented Oct 23, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

myakove commented Nov 26, 2024

dbasunag commented Nov 27, 2024

Handle sensitive information being inside a list in resource_dict #2178

Handle sensitive information being inside a list in resource_dict #2178

Conversation

dbasunag commented Oct 23, 2024 • edited by coderabbitai bot Loading

Short description:

More details:

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for reviewer:

Bug:

Summary by CodeRabbit

redhat-qe-bot2 commented Oct 23, 2024

coderabbitai bot commented Oct 23, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

myakove commented Nov 26, 2024

dbasunag commented Nov 27, 2024

dbasunag commented Oct 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2024 •

edited

Loading