-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding issue by removing BOM from provider output files #1302
Conversation
cb110b3
to
1e1d9bd
Compare
There is one TODO related to this. Currently, only the Provider JSON is included in this fix, but the ScubaResults JSON used with -MergeJson is also written, at this time, with a BOM as it uses Set-Content to write out the file. Since the ScubaResults.json file is not further processed by OPA Rego, there is no issue, but any tools that might process ScubaResults must either handle the BOM correctly, or likely suffer the same issue. Either this is a TODO to change the orchestrator to similarly write out the ScubaResults with no BOM, or just a note that is out of scope and should be handled by any external code processing the ScubaResults file. |
Appears to be an issue with import of the new functions when modifying cached functional tests. Until those are resolved, will need to wait on merge. Functional tests run fine manually, but not in the runner. Will need to sort that out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Testing I performed:
- I tested against 4 different tenants, it worked as expected in all cases.
- I tested naming a conditional access policy
\/ \A \B Ĥ \
to see how it handled backslashes and unicode, it also worked as expected. - I tested with a non-default OutPath, also worked as expected
- Finally, I tested
Invoke-ScubaCached
to see if it could read BOM-less json, worked as expected.
One minor comment left below, also the PSLinter is unhappy about Write-Host
. Both things are minor enough, would like to see them addressed, but approving either way.
Comments on usage of specific System.IO.File methods
I tried making these changes and it seemed to work okay but you should do some testing. I also ran some limited benchmark time comparisons between WriteAllText versus WriteAllLines and didn't see any difference. Regardless I think using the *AllText versions make sense. |
5c59e19
to
ebb2dbe
Compare
I removed the Write-Host(s) that were left in from testing. |
1cf51e8
to
0500090
Compare
Tested on a Windows system that uses \\UNC paths and the fix code works fine. |
Tests of various strings and escape charactersConducted tests with the data described below against the E5 tenant. No defects found yet. Defender DLP Policy comment fieldGenerally ugly test stringThis is a non-escaping comment. αۂ a "\/Date(1698260458733)\/" custom policy from )\/scratch.\nYou "\/Date will choose \ the type of content\u0000 to protect and how you want to protect it.\" tabs\t, newlines\n, or \uD83D\uDE00 emoji 😀 for testing. Special JSON characters: { "key": "value" } and SQL injection: '; DROP TABLE users; --' Emoji test string🧬🦠🧫🧪🛸🤖🦾🦿🧍♂️🧍♀️🧑🤝🧑🦹♂️🦹♀️🦸♂️🦸♀️🧙♂️🧙♀️🧛♂️🧛♀️🧜♂️🧜♀️🧞♂️🧞♀️🧝♂️🧝♀️🧟♂️🧟♀️🦧🦣🦦🦨🦩🦫🦭🦬🦙🦘🦡🦔🦏🦈🦭🦅🦩 Exchange transport rule Regex^[a-zA-Z0-9.%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$|^[A-Z0-9.%+-]+@[A-Z0-9.-]+.[A-Z]{2,}$|^user[A-Z0-9]+$ Exchange transport rule nameαۂ a "\/Date(1698260458733)\/" custom policy Exchange transport rule comment𝖘𝖚𝖕𝖊𝖗⚡𝖈𝖔𝖒𝖕𝖑𝖊𝖝🚀𝖓𝖆𝖒𝖊❗✨ NullChar:\0 Backspace:\b BellChar:\a ZeroWidthSpace:\u200B ByteOrderMark:\uFEFF💣 [Also pasted an HTML document into the comment which I have redacted here for brevity] |
Yes, that's a good point. I went ahead and switched out the Lines for Text methods which also meant being able to remove some usage of Out-String and .Trim() to aggregate the string arrays they produced. Cleaner approach and tests pass with it as well. |
@tkol2022 Updates and functional testing fixes are in. So this code should be final form at this point. |
All tests are passing, including smoke test and nightly functional tests. Note: There were some individual failures on the functional tests, but they were individual test failures unrelated to the changes in this PR. None of the errors were attributable to changes in this PR. |
dc82aa9
to
918b196
Compare
Tested on a Windows system running ScubaGear in a OneDrive folder and the fix code works fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
In your Get-Utf8NoBom.Tests.ps1, can you use the approach below to avoid hard coding the C$ or does this not matter? I added the $SystemDrive variable.
|
Yes, although slightly altered as SYSTEMDRIVE is an env itself and includes the extraneous drive letter separator ':'. So, I added the $env:SYSTEMDRIVE and a .Trim(':') to construct the share name which works. Nice catch. See 0ad71e8 for full change. |
0ad71e8
to
217a972
Compare
* Move FunctionalTestUtils.ps sourcing to BeforeEach * Remove Write-Host debug statements
* Remove debug statement
Co-authored-by: Alden Hilton <[email protected]>
Co-authored-by: David Bui <[email protected]>
Co-authored-by: David Bui <[email protected]>
8da1517
to
4294ff2
Compare
@nanda-katikaneni Smoke test passed and ready for merge. |
🗣 Description
This update changes ScubaGear such that its tenant configuration captured in the provider output JSON (ProviderSettingsExport.json, by default) is encoded as a UTF-8 file without a byte order mark (BOM) rather than including a BOM.
Updates to the code include:
💭 Motivation and context
OPA Rego (as of 0.68) does not recognize the byte order mark (BOM). As a result, parsing of the file can fail when some character sequences, such as unescaped backslashes, are present in the input. These characters are common in certain tenant configurations, such as Exchange transport rule regular expressions or Windows file paths. By removing the BOM from provider output files and encoding as UTF-8 without a BOM, Rego can properly handle the provider JSON as input without issue.
Closes #935
Closes #990
Closes #1138
Closes #1214
Closes #1242
Closes #1299
🧪 Testing
To test this PR, first configure a test tenant such that it contains an input that triggers the existing bug. This can be done by adding or modifying an existing DLP policy description to include the string:
Create a "\/Date(1698260458733)\/" custom policy from )\/scratch. You "\/Date will choose the type of content to protect and how you want to protect it.
Run ScubaGear against the test tenant including the Defender product, at a minimum.
Invoke-Scuba -p defender
This should result in an error similar to the one shown here:
✅ Pre-approval checklist
✅ Pre-merge checklist
PR passed smoke test check.
Feature branch has been rebased against changes from parent branch, as needed
Use
Rebase branch
button below or use this reference to rebase from the command line.Resolved all merge conflicts on branch
Notified merge coordinator that PR is ready for merge via comment mention
✅ Post-merge checklist