-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added max file limit #16775
added max file limit #16775
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's for personal review only, but I couldn't resist, sorry...
} | ||
catch (Exception e) | ||
{ | ||
throw new IOException(e.Message, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to just use http://source.roslyn.io/Microsoft.CodeAnalysis/R/9c1e0a07e83f2ac3.html instead?
(Also, should that be changed to use an exception filter to avoid ever catching IOException in the first place)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. ya. will change it to use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we wrap in this way? My intuition is "just throw because catching non-Exception exceptions is trying to win a game of whack a mole".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We consistently distinguish between IOExceptions and anything else across the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change would argue that distinction is very questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are diligent and are wrapping them. It's worked well so far.
Catch where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. We are diligent to the places we know about and that's the problem. There are plenty of more cases where IO occurs that we don't know about. Every miss is just a crash that we're passing onto the customer instead of fixing now.
This is why I'm always very skeptical of approaches like this. Every time a customer reports the exception we will just catch ,wrap in IOException and move on. Hence the end behavior is indistinguishable from just catching Exception in the first place. This approach just causes us more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to be careful whenever we call outside of our code base. I'm not saying it's easy. But the solution is not to just catch and ignore all exceptions everywhere. That's a nightmare - things will stop working and nobody will know why. The customer won't be able to create a small repro and we won't be able to fix anything. So yes, you won't have VS crashing. But you won't have it working either. Better approach is to catch all exceptions in a filter with NFW reported on some recovery boundaries where we can report an error and continue running with no (best effort) state corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave it here and do what we do in the rest of the FileUtilities - which is to turn non-IOExceptions to IOExceptions. It's consistent with the rest of the utils code.
internal static class FileTextLoaderOptions | ||
{ | ||
[ExportOption] | ||
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 10 * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per mail-thread, I don't think this is large enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
if (fileLength > workspace.Options.GetOption(FileTextLoaderOptions.FileLengthThreshold)) | ||
{ | ||
// log max file length to AI | ||
Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m => m["FileLength"] = fileLength)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this path will also be hit by AdditionalDocuments
, is there something more actionable than just size that we can log here without capturing PII? File extension at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we can do that.
@@ -610,4 +610,7 @@ | |||
<data name="The_first_word_0_must_begin_with_a_lower_case_character" xml:space="preserve"> | |||
<value>The first word, '{0}', must begin with a lower case character</value> | |||
</data> | |||
<data name="File_size_exceeds_maximum_allowed_size_0" xml:space="preserve"> | |||
<value>File size exceeds maximum allowed size: {0}</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're past loc cutoff, so it might be better to just use a hardcoded message in RTW, and update later. Not sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m => m["FileLength"] = fileLength)); | ||
|
||
var message = string.Format(WorkspacesResources.File_size_exceeds_maximum_allowed_size_0, path); | ||
throw new IOException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why IOExecption? Why not just throw Execption. Or InvalidOperationException. No IO is actually occuring here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I don't think we should be throwing an IOException here. I'd either change it to Try-pattern, or define a new exception type that's gonna be handled here: http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Workspace/Solution/TextDocumentState.cs,177
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pilchie how do you think. looks like FileTextLoader consider any exception happen inside FileTextLoader as IOException regardless they are generally considered IOException or not in OS standpoint.
6b21ae4
to
307bd8c
Compare
ping? |
@jaredpar @tmat @dotnet/roslyn-compiler @AlekseyTs @cston @VSadov can I get code review for changes in FileUtilities? |
// File Threshold 100MB | ||
[ExportOption] | ||
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 100 * 1024 * 1024, | ||
storageLocations: new LocalUserProfileStorageLocation(@"Roslyn\Internal\Performance\Text\FileLengthThreshold")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of "Internal" here seemed wrong given users may need to change it. Can't tell if this is an established pattern though. If it is then okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user doesn't change it through code, but through regkey, so code itself is internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I'm referring to the use of Internal in the reg key name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that one. ya, we put all reg key which is not public through tool->options, there.
@heejaechang It's not clear to me that this has the telemetry of hitting the issue from my review. Is that just the "Log" statements? |
FileUtilities LGTM. |
@jac009 Yes. Log is the one that send telemetry to AI. since this code is in lower layer than VS layer, I can't use VS Telemetry API directly. |
} | ||
} | ||
|
||
private static string TryGetFileExtension(string path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I believe we already have a utility function to do this: PathUtilities.GetExtension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, okay.
👍 |
@Pilchie should I ask matt for bar check for RTW? |
@natidea just FYI, looks like we will bring this for RTW. |
Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m => | ||
{ | ||
m["FileLength"] = fileLength; | ||
m["Ext"] = PathUtilities.GetExtension(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this get associated with some other project type information? My worry is there's some customer out there with some large auto-generated file (think LINQ, EF, etc.) and we'll see here it's a C# file but not know further details. Although I'm not sure what else could be logged here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonmalinowski non-fatal watson then?
@@ -610,4 +610,7 @@ | |||
<data name="The_first_word_0_must_begin_with_a_lower_case_character" xml:space="preserve"> | |||
<value>The first word, '{0}', must begin with a lower case character</value> | |||
</data> | |||
<data name="File_size_exceeds_maximum_allowed_size_0" xml:space="preserve"> | |||
<value>File size exceeds maximum allowed size: {0}</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say what the file size was? Something like "File '{0}' size of {1} exceeds maximum allowed size of {2}"? Just so that way if a customer does hit that they have some hint what they should try getting their file under, vs. just going "uh, I need to make it smaller I guess?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. @Pilchie who is localization champ by the way?
Tagging @dotnet/roslyn-ide since this is touching workspaces. |
Yes. Please always tag @dotnet/roslyn-ide if we're making changes that touch infrastructural pieces. |
@@ -366,6 +366,29 @@ internal static DateTime GetFileTimeStamp(string fullPath) | |||
{ | |||
return File.GetLastWriteTimeUtc(fullPath); | |||
} | |||
catch (IOException) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have other PerformIO helpers that handle all these sorts of cases. Is there a reason we're not using those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used FileUtilities since that was the one already used in this file. just looked PerformIO which seems very similar to FileUtilites, not sure why there is 2 utilities which does same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened tracking issue - #16836
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks #closed.
{ | ||
// File Threshold 100MB | ||
[ExportOption] | ||
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 100 * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what FileLengthThreshold means. Can we comment this. What happens at 100MB? Do we simply ignore hte file entirely? Do we have the document, but we truncate it at 100MB? Do we have the document, but treat its contents as empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
var fileLength = FileUtilities.GetFileLength(path); | ||
if (fileLength > workspace.Options.GetOption(FileTextLoaderOptions.FileLengthThreshold)) | ||
{ | ||
// log max file length to AI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is AI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
application insight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't abbr unnecessarily.
})); | ||
|
||
var message = string.Format(WorkspacesResources.File_size_exceeds_maximum_allowed_size_0, path); | ||
throw new InvalidDataException(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this manifest higher up the stack? What is the end user experience this causes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other failure in this file. showing in workspace failure output pane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment code to make that clear. It's also unclear to me what the user does at that point. Is the workspace unusable? Do they have any recourse on what they can do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but i don't know what this change actually does. The code (and PR) need explanations of what's going on here and why it's the appropriate "fix" for whatever issue we're encountering.
@CyrusNajmabadi have you read description? |
Yes. THe description is extremely vague and unclear. |
@CyrusNajmabadi you should ask @Pilchie and @MattGertz why it is appropriate fix since they are the one decide to do this. |
@CyrusNajmabadi hmm.. so want me to add comment on which part? on 100MB limit or what happen it throws? |
I'm asking for the description (and code) to actually explain what is going on. Reading the code, i cannot tell what the actual behavior of tihs will be. The code is inadequately commented, and i cannot sign off that this is any sort of appropriate fix. Please:
|
@CyrusNajmabadi in description or in code? where in commit description? |
Actually, it was JoC 's request... |
Code is better than commit description, since it's right there later. |
Both are fine. but, at the very least, this needs to be in the code.
Ideally in any codepath that performs these checks. People reading the code need to understand what the checks are doing and what impact it has on the workspace. For example (as i asked earlier):
Basically, nothing has actually been explained here. I want to know precisely what the impact of this change is; what we want to accomplish; and how this PR gets us that. -- Note: this PR may be perfect (as far as the code change goes). I'm simply stating that i have no way to effectively review this change as i do not know what you're trying to accomplish, and thus i have no idea if your code change is appropriate. |
@CyrusNajmabadi sure, trying to add comment. just to want to make sure where you want. since it is just throwing, it is other part that decide what to do with the caught exception. so, if I put comment what workspace does because of this exception in the code area I changed, it is far part from where it actually happens. so, if later some body change that behavior, that person will have no idea, the comment exist here which describe workspace behavior he is changing. basically this code has nothing to do with 1,2,3. it is the one who caught the exception decide 1,2,3 which is in DocumentState I believe. |
Sorry, i cannot sign off without a complete understanding of the impact of this change. It is not sufficient ot say (effectively) "i am throwing, and it is someone else's responsibility as to what happens then". I need the actual information as to what impact this change will have. Otherwise, taking this fix would be very irresponsible as it could cause any number of bad things to happen, right when we're in escrow. As for the comment location. If you feel like there are two distant piece of code that can be impacted by this, then feel free to comment both locations. i.e. in this location that you're changing, comment on which parts of the workspace that are affected by this. And in the workspace, please comment on which parts have to deal with the potential exceptions that can be thrown here.
If there is ever a concern about some 'person' having 'no idea' about what's happening, then the onus is on us to effectively comment hte places of concern so that that doesn't happen. |
20e29bd
to
63f3a47
Compare
@CyrusNajmabadi updated comments! |
try | ||
{ | ||
// save temp file | ||
tempPath = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just Path.GetTempFileName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better: Temp.CreateDirectory()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remove try-finally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will change to use that. I searched how other does this kind of thing in code base and did same thing.
this is added so that we can reduce chance of OOM when user added massive size text file to solution. current default max is 100MB. but user can override it through hidden reg key if they want to. if such file is added, FileTextLoader will throw InvalidDataException and Workspace will raise Workspace Failed event. in VS host case, we will show the failed event in output windows. made it to use test option service since option is mutable workspace state which affects all other tests.
e3e4b42
to
e83cd71
Compare
ping? |
{ | ||
// Validate file length is under our threshold. | ||
// Otherwise, rather than reading the content into the memory, we will throw | ||
// InvalidDataException to caller of FileTextLoader.LoadText to deal with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add this exception to the documentation of the virtual calls?
👍 Thanks for adding the better documentation! |
Approved to merge via 377245 |
Customer scenario
User has big file (over 100MB) in solution and VS crashes OOM.
Bugs this fixes:
#16796
Workarounds, if any
remove the massive file from solution or split the file into smaller chunk
Risk
since we are now limiting size of text file (100MB) we will read, if someone had a big file which is big but not big enought to make VS OOM, that file will be no longer read into VS (Roslyn).
but we provide regkey to change default threshold, so we do still provide a workaround for such user.
Performance impact
there should be no change on performance
Is this a regression from a previous update?
No
Root cause analysis:
we need to read in whole file into memory to operate, we do chunk them so that it doesn't go to the LOH, but still we read whole thing in memory to operate rather than reading in or map portion of a file to operate. but naturally that sets limit on how much big file we can support, especially since devenv is in 32bit process. this fix explicitly set limit so that we reduce chance of going OOM.
since we at least chunk, we won't OOM due to lack of giant continuous memory block.
How was the bug found?
customer report.