-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle sensitive information being inside a list in resource_dict #2178
Conversation
Report bugs in Issues The following are automatically added:
Available user actions:
Supported /retest check runs
Supported labels
|
WalkthroughThe changes in this pull request introduce a new function, Changes
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
ocp_resources/virtual_machine.py (1)
179-181
: LGTM! Consider adding documentation.The implementation correctly identifies the sensitive field that needs to be masked. Consider adding a docstring to explain the purpose of this property and its security implications.
@property def keys_to_hash(self): + """ + Returns a list of keys containing sensitive information that should be masked. + + Returns: + list[str]: List of keys to be masked in the resource dictionary. + Currently masks 'userData' which contains sensitive boot-time configuration. + """ return ["userData"]ocp_resources/resource.py (1)
136-156
: LGTM with suggestions for improvements.The implementation correctly handles nested dictionaries and lists. However, consider these improvements:
- Add protection against circular references to prevent stack overflow
- Consider making the mask value configurable for flexibility
Here's a suggested improvement:
-def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: +def change_dict_value_to_hashed( + resource_dict: Dict[Any, Any], + key_name: str, + mask: str = "******", + _seen: set | None = None +) -> Dict[Any, Any]: """ Recursively search a nested dictionary for a given key and changes its value to "******" if found. Args: resource_dict: The nested dictionary to search. key_name: The key to find. + mask: The value to use for masking sensitive data. + _seen: Internal parameter to track circular references. Returns: The modified dictionary. """ + if _seen is None: + _seen = set() + + # Handle circular references + if id(resource_dict) in _seen: + return resource_dict + if isinstance(resource_dict, dict): + _seen.add(id(resource_dict)) for key, value in resource_dict.items(): if key == key_name: - resource_dict[key] = "******" + resource_dict[key] = mask elif isinstance(value, dict): - resource_dict[key] = change_dict_value_to_hashed(value, key_name) + resource_dict[key] = change_dict_value_to_hashed(value, key_name, mask, _seen) elif isinstance(value, list): for key_list, value_list in enumerate(value): - value[key_list] = change_dict_value_to_hashed(value_list, key_name) + value[key_list] = change_dict_value_to_hashed(value_list, key_name, mask, _seen) return resource_dict
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- ocp_resources/resource.py (2 hunks)
- ocp_resources/sealed_secret.py (1 hunks)
- ocp_resources/virtual_machine.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (3)
ocp_resources/sealed_secret.py (1)
53-53
: Verify the impact of simplified key paths.The simplified key paths in
keys_to_hash
might affect how sensitive data is masked in the resource dictionary. Let's verify this change works correctly with the updatedhash_resource_dict
implementation.Consider adding a docstring to the
keys_to_hash
property to document:
- The expected format of the keys
- How these keys are used in the resource dictionary masking process
- Why the "spec.." prefix was removed
This will help future maintainers understand the design decision and prevent accidental reintroduction of the prefix.
✅ Verification successful
Let me gather more information about how the key paths are actually processed in the codebase.
This will help us understand:
- How the key paths are processed in the hashing function
- If there are tests verifying this functionality
- Where the SealedSecret class is used in the codebase
Let me check one more thing to verify the test coverage and actual usage patterns.
This will help us understand:
- The test coverage for this functionality
- The complete context of SealedSecret class
- How these specific keys are actually used in hashing
Based on the gathered information, I can now provide the final response:
Simplified key paths in
keys_to_hash
are correct and safeThe change from
["spec..data", "spec..encryptedData"]
to["data", "encryptedData"]
is correct because:
- The
change_dict_value_to_hashed
function performs a recursive dictionary search for exact key matches, making the "spec.." prefix unnecessary- The SealedSecret class directly accepts
data
andencrypted_data
as top-level parameters in its constructor- The implementation aligns with the actual structure of the resource where sensitive data is stored
The change maintains security while simplifying the implementation, as the hashing mechanism in
hash_resource_dict
will still mask all sensitive information correctly.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of keys_to_hash in the codebase # and ensure it's compatible with the new implementation # Check if there are any direct references to the old format "spec..data" echo "Checking for old key format usage..." rg -l "spec\.\.data|spec\.\.encryptedData" # Look for other implementations of keys_to_hash to ensure consistency echo "Checking other keys_to_hash implementations..." ast-grep --pattern 'def keys_to_hash' # Check the Resource class implementation that uses these keys echo "Checking Resource class implementation..." rg -A 10 "def hash_resource_dict"Length of output: 1254
Script:
#!/bin/bash # Check the implementation of change_dict_value_to_hashed function echo "Checking change_dict_value_to_hashed implementation..." rg -A 15 "def change_dict_value_to_hashed" # Check if there are any tests that verify this functionality echo "Checking related tests..." fd -e py test_sealed_secret test_resource --exec rg -l "keys_to_hash|hash_resource_dict" {} # Check the actual usage of SealedSecret class echo "Checking SealedSecret usage..." rg -l "class SealedSecret"Length of output: 1542
Script:
#!/bin/bash # Check for test files with correct path echo "Checking test files..." fd -e py -g "*test*" --exec rg -l "keys_to_hash|hash_resource_dict" {} # Check the complete SealedSecret class implementation echo "Checking SealedSecret class implementation..." rg -A 20 "class SealedSecret" ocp_resources/sealed_secret.py # Check actual usage of hash_resource_dict with these keys echo "Checking actual usage patterns..." rg -B 5 -A 5 "hash_resource_dict.*data|hash_resource_dict.*encryptedData"Length of output: 1177
ocp_resources/resource.py (2)
1198-1198
: LGTM! Clean integration of the new function.The change correctly integrates the new
change_dict_value_to_hashed
function while maintaining the existing behavior and safety measures (deep copy, condition checks).
136-156
: Verify the impact on existing consumers.Let's verify the impact of these changes on existing code.
Also applies to: 1198-1198
✅ Verification successful
Let me gather more information about the
keys_to_hash
property since the previous search didn't show its complete usage pattern.This will help us understand:
- Where and how
keys_to_hash
is defined and used- The usage pattern of
hash_log_data
which controls the hashing behavior- Any potential impact on existing implementations
Based on the gathered information, I can now provide a final response as we have a clear picture of the usage and impact:
The changes to
change_dict_value_to_hashed
are safe and properly integratedThe function is used within a well-defined pattern where:
- It's only called from the base
Resource
class'shash_resource_dict
method- The hashing behavior is controlled by two factors:
- The
hash_log_data
boolean parameter (defaults to True)- Resource-specific
keys_to_hash
property implementations- Only specific resource types implement
keys_to_hash
:
- ConfigMap: ["data", "binaryData"]
- VirtualMachine: ["userData"]
- SealedSecret: ["data", "encryptedData"]
- Secret: ["data", "stringData"]
The changes maintain the existing behavior while improving the implementation, and all current consumers are properly integrated with this pattern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for usages of hash_resource_dict and keys_to_hash property echo "Checking for direct calls to hash_resource_dict..." rg -A 2 "hash_resource_dict\(" echo -e "\nChecking for classes that override keys_to_hash property..." ast-grep --pattern 'class $_ { $$$ @property def keys_to_hash($_) { $$$ } $$$ }'Length of output: 1839
Script:
#!/bin/bash # Let's search for any references to keys_to_hash in the codebase echo "Searching for keys_to_hash references..." rg -A 2 "keys_to_hash" # Also check for hash_log_data since it's used in conjunction echo -e "\nSearching for hash_log_data references..." rg -A 2 "hash_log_data"Length of output: 2647
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
ocp_resources/resource.py (1)
136-156
: Improve function documentation and edge case handling.The function implementation looks good, but there are a few suggestions for improvement:
Documentation could be enhanced:
- Add a note about in-place modification
- Document the behavior for edge cases
- Add example usage
Consider handling edge cases:
- None values
- Non-dict/non-list objects in nested structures
Performance optimization:
- Consider using a generator for list iteration
- Add early return for empty/None input
Here's how you could improve the implementation:
def change_dict_value_to_hashed(resource_dict: Dict[Any, Any], key_name: str) -> Dict[Any, Any]: """ Recursively search a nested dictionary for a given key and changes its value to "******" if found. + The function modifies the input dictionary in-place. Args: resource_dict: The nested dictionary to search. key_name: The key to find. Returns: The modified dictionary. + + Example: + >>> d = {"a": {"sensitive": "secret"}, "b": [{"sensitive": "hidden"}]} + >>> change_dict_value_to_hashed(d, "sensitive") + {'a': {'sensitive': '******'}, 'b': [{'sensitive': '******'}]} """ + if not resource_dict: + return resource_dict + if isinstance(resource_dict, dict): for key, value in resource_dict.items(): if key == key_name: resource_dict[key] = "******" - elif isinstance(value, dict): + elif value is not None and isinstance(value, dict): resource_dict[key] = change_dict_value_to_hashed(value, key_name) - elif isinstance(value, list): + elif value is not None and isinstance(value, list): for key_list, value_list in enumerate(value): - value[key_list] = change_dict_value_to_hashed(value_list, key_name) + if value_list is not None: + value[key_list] = change_dict_value_to_hashed(value_list, key_name) return resource_dict
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- ocp_resources/resource.py (2 hunks)
🧰 Additional context used
🔇 Additional comments (2)
ocp_resources/resource.py (2)
1199-1199
: LGTM! Clean implementation of sensitive data handling.The modification correctly utilizes the new
change_dict_value_to_hashed
function to handle sensitive information in resource dictionaries.
1199-1199
: Verify handling of sensitive fields across the codebase.Let's ensure that all sensitive fields, particularly
userData
in VM specs, are properly handled by the new implementation.Run the following script to verify the usage:
Co-authored-by: Meni Yakove <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (1)
tests/test_unittests.py (1)
6-20
: Consider enhancing the fixture with more edge casesThe fixture provides a good base structure, but consider adding more variations of sensitive data scenarios:
- Nested lists within lists
- Multiple feature entries
- Empty or null values
- Special characters in sensitive data
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
tests/test_unittests.py
(1 hunks)tox.toml
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tox.toml
🔇 Additional comments (1)
tests/test_unittests.py (1)
1-72
: Consider adding security-focused test cases
Given that this function handles sensitive data, consider adding test cases that verify:
- No sensitive data leakage in error messages
- Handling of special characters that could be used in injection attacks
- Memory cleanup after processing sensitive data
And please check coderabbitai comments |
5efa9e6
to
d9bfa53
Compare
/verified |
Short description:
Current code hash_resource_dict() does not provide a flexible way to hide fields like userData, for virtual machines (it could be present in spec.template.spec.volumes)
More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
Bug:
Summary by CodeRabbit