-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve unhandled exceptions when analyzing folders that contain large files #2494
Conversation
…ove logic comparing file size to limit to base class, and udpated accordingly.
…leSizeInKilobytes if passed a negative number.
src/Sarif/IAnalysisContext.cs
Outdated
/// Gets or sets the maximum file size (in kilobytes) that will be analyzed. | ||
/// If not set, it will analyze all sizes. | ||
/// </summary> | ||
int FileSizeInKilobytes { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mentioned in the other PR, it would be a breaking change for SPAM. Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if that would make things clearer and we tell people, yes.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another strategy could be the following:
- we rename the C# property
- we keep the exposed argument.
So, internally, we know the good name. and we tell that in the next release we will rename the argument :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not take this change now. A breaking change of this kind needs to come with a thoughtful implementation, this is new work that is slowing down the immediate change.
So, go ahead and file an issue. We have other patterns of deprecating command-line args (like --hashes is support to be obsoleted!!). We do this in a manner such as Eddy describes, retain the old and mark it obsolete and/or update the code to properly transform old command-lines to new.
So, Ed, be sure to capture your excellent thoughts in the issue that you file. :)
[Fact] | ||
public void MultithreadedAnalyzeCommandBase_TargetFileSizeTestCases() | ||
{ | ||
Random random = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer, you just need to use the TestRule.s_seed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do step through other tests that show the output that we expect for any random use. Eddy, be sure to explain this sort of thing in detail, it's important for others to understand completely (you may have done this already offline). Basically, any test that uses a random value generator MUST emit the seed used to initialize the PRNG in its output. For tests that fail only due to specific randomized values, this seed can be hard-coded on the dev box to ensure reproducibility of the problem.
By the way! I recently fixed a test bug of this kind in an MS service. I was co-developing a change and my build failed, test rerun succeeded. When I looked at history, I could clearly see this was a flaky test, but it only failed once every several hundred pipeline builds. :) The issue was a random value generator that only provoked a negative condition in a very small % of cases. Ouch! That bug had lived in this code base for over a year...
int randomMaxFileSize = random.Next(1, int.MaxValue - 1); | ||
long randomFileSize = (long)random.Next(2, int.MaxValue - 1); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would so dearly love to not see this kind of feedback in code reviews, i.e., for all style suggestions to mostly be resolved by an automated code formatter. Is there not an autofix for this in a Roslyn analyzer or command-line tool? It's important that we stay clean and detail-oriented, so I appreciate all of us pointing this sort of thing out, to be clear. Just encouraging us to remove as many issues of this kind from ever occurring, wherever we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually have a step that removes these extra lines as part of buildAndTest.cmd for most of our repos.
{ | ||
long fileSize = FileSystem.GetFileSize(path) / 1024; | ||
|
||
return (maxFileSize == -1 || fileSize < maxFileSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test cases and added some explicit follow up confirming this method separately.
@@ -96,6 +96,13 @@ public Uri TargetUri | |||
|
|||
public DefaultTraces Traces { get; set; } | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line #Resolved
@@ -122,6 +129,11 @@ internal bool ValidateOutputFileCanBeCreated(IAnalysisContext context, string ou | |||
return succeeded; | |||
} | |||
|
|||
internal bool ValidateFileSizeInKilobytes(int fileSizeInKilobytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This factoring looks questionable to be. Your method name is as long as its implementation. :)
In general, a utility method only has value if it contains some specialized logic that you don't want to maintain in multiple places. A simple 'greater than 0' comparison doesn't seem to meet that test.
There is another subtle issue 'ValidateFileSizeInKilobytes' is suggestive, it suggests more validation is done. You encode 'kilobytes' in the name and 'file size' and yet the implementation of this helper seems to be 'VerifyIsGreaterThanZero', and that's it.
Having said all this, I don't even see a caller for this method. :) So maybe I'm making a whole lot of pointing out you left some intermediate code in the file. :)
Final point, though, just to keep beating this to death, this helper doesn't access instance data and, if it were to survive, could be static.
#Resolved
new { | ||
expectedExitReason = ExitReason.None, | ||
fileSize = (long)ulong.MinValue, | ||
maxFileSize = randomMaxFileSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this. :) there really isn't a good use of a random value in this test pattern, in my opinion. Why is that? Because the test matrix of use values is so clear. You need all boundaries (max & min), you need zero, -1, and then a couple values within the interior range.
Using random to generate something just won't buy you much, and meantime you've opted into a more complex test pattern which, for example, leads others to require you to follow specific pattern. Don't do that unless you really think you derive value from it.
#Resolved
|
||
var options = new TestAnalyzeOptions | ||
{ | ||
Threads = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you opting into this behavior, and enabling hashes, etc? This is a nice test, but it's doing a lot, why is that? Can't we make this a bit more focused on the matter at hand, file size validation?
When you start adding things, you increase performance costs, but you also risk introducing non-obvious breaks in tests. So, let's say we had a problem introduced by our hashing argment to --insert. This test may fail, and we'd all be confused why the file size test broken.
This is just an example. But the principle here is that we want targeted unit tests that directly focus on some test work, as far as possible
It isn't always possible to create a clean union. But for this example, I definitely don't see why we'd explicitly set any args, would prefer defaults and minimal options otherwise. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashes, etc were part of the original customer escalation. It's fairly conclusive that with and without these the repro and fix are the same. I've removed.
src/ReleaseHistory.md
Outdated
@@ -1,5 +1,9 @@ | |||
# SARIF Package Release History (SDK, Driver, Converters, and Multitool) | |||
|
|||
## Unreleased | |||
|
|||
* BUGFIX: Resolve OutofMemoryException and NullReferenceException' failures by adding `--file-size-in-kb` argument to specify a max file size to analyze or default to 1024KB. [#2494](https://github.com/microsoft/sarif-sdk/pull/2494) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Sarif/FileSystem.cs
Outdated
/// <returns> | ||
/// A long representing the size of the file in bytes. | ||
/// </returns> | ||
public long GetFileSize(string path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter, | ||
SearchOption.TopDirectoryOnly)) | ||
foreach (string file in FileSystem.DirectoryEnumerateFiles(directory, filter, SearchOption.TopDirectoryOnly) | ||
.Where(file => IsTargetWithinFileSizeLimit(file, _rootContext.FileSizeInKilobytes))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter, | ||
SearchOption.TopDirectoryOnly)) | ||
foreach (string file in FileSystem.DirectoryEnumerateFiles(directory, filter, SearchOption.TopDirectoryOnly) | ||
.Where(file => IsTargetWithinFileSizeLimit(file, _rootContext.FileSizeInKilobytes))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Gets or sets the maximum file size (in kilobytes) that will be analyzed. | ||
/// If not set, it will analyze all sizes. | ||
/// </summary> | ||
public int FileSizeInKilobytes { get; set; } = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Sarif/IAnalysisContext.cs
Outdated
/// Gets or sets the maximum file size (in kilobytes) that will be analyzed. | ||
/// If not set, it will analyze all sizes. | ||
/// </summary> | ||
int FileSizeInKilobytes { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -30,6 +30,8 @@ public class AnalyzeTestContext : IAnalysisContext | |||
|
|||
public DefaultTraces Traces { get; set; } | |||
|
|||
public int FileSizeInKilobytes { get; set; } = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Gets or sets the maximum file size (in kilobytes) that will be analyzed. | ||
/// If not set, it will analyze all sizes. | ||
/// </summary> | ||
public int FileSizeInKilobytes { get; set; } = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9,6 +9,7 @@ | |||
|
|||
using Microsoft.CodeAnalysis.Sarif; | |||
using Microsoft.CodeAnalysis.Sarif.Driver; | |||
using Microsoft.VisualBasic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Gets or sets the maximum file size (in kilobytes) that will be analyzed. | ||
/// If not set, it will analyze all sizes. | ||
/// </summary> | ||
public int FileSizeInKilobytes { get; set; } = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in here? See if you can delete it. I don't understand why a test file is in the production code, looks like it might be mistake. In reply to: 1176781298 In reply to: 1176781298 Refers to: src/Sarif.Multitool/AnalyzeTestContext.cs:9 in eb91a3a. [](commit_id = eb91a3a, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/ReleaseHistory.md
Outdated
@@ -1,5 +1,9 @@ | |||
# SARIF Package Release History (SDK, Driver, Converters, and Multitool) | |||
|
|||
## Unreleased | |||
|
|||
* BUGFIX: Resolve OutofMemoryException and NullReferenceException' failures by adding `--file-size-in-kb` argument to specify a max file size to analyze or default to 1024KB. [#2494](https://github.com/microsoft/sarif-sdk/pull/2494) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eddy provided the PR for historical context. Not part of this change, but this and related files should eventually be moved to test project. In reply to: 1176781298 Refers to: src/Sarif.Multitool/AnalyzeTestContext.cs:9 in eb91a3a. [](commit_id = eb91a3a, deletion_comment = False) |
MaxFileSizeInKilobytes = testCase.maxFileSize | ||
}; | ||
|
||
var command = new TestMultithreadedAnalyzeCommand(mockFileSystem.Object); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ingle threaded code paths from test.
if (exitReason != ExitReason.None) | ||
{ | ||
var exception = command.ExecutionException as ExitApplicationException<ExitReason>; | ||
exception.Should().NotBeNull(); |
Check warning
Code scanning / CodeQL
Dereferenced variable may be null
bool expectedToBeWithinLimits = testCase.maxFileSize == -1 || | ||
testCase.fileSize / 1024 < testCase.maxFileSize; |
Check warning
Code scanning / CodeQL
Useless assignment to local variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OutOfMemory
andNullReferenceException
exceptions are being thrown when analyzing folders containing large files. This change adopts a file size limit in kilobytes and defaults to 1024.Added unit testing to cover combinations of file sizes and file size limits.
This is related to #621.