Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added max file limit #16775

Merged
merged 2 commits into from
Jan 31, 2017
Merged

added max file limit #16775

merged 2 commits into from
Jan 31, 2017

Conversation

heejaechang
Copy link
Contributor

@heejaechang heejaechang commented Jan 26, 2017

Customer scenario

User has big file (over 100MB) in solution and VS crashes OOM.

Bugs this fixes:

#16796

Workarounds, if any

remove the massive file from solution or split the file into smaller chunk

Risk

since we are now limiting size of text file (100MB) we will read, if someone had a big file which is big but not big enought to make VS OOM, that file will be no longer read into VS (Roslyn).

but we provide regkey to change default threshold, so we do still provide a workaround for such user.

Performance impact

there should be no change on performance

Is this a regression from a previous update?

No

Root cause analysis:

we need to read in whole file into memory to operate, we do chunk them so that it doesn't go to the LOH, but still we read whole thing in memory to operate rather than reading in or map portion of a file to operate. but naturally that sets limit on how much big file we can support, especially since devenv is in 32bit process. this fix explicitly set limit so that we reduce chance of going OOM.

since we at least chunk, we won't OOM due to lack of giant continuous memory block.

How was the bug found?

customer report.

@heejaechang heejaechang added PR For Personal Review Only The PR doesn’t require anyone other than the developer to review it. and removed cla-already-signed labels Jan 26, 2017
Copy link
Member

@Pilchie Pilchie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's for personal review only, but I couldn't resist, sorry...

}
catch (Exception e)
{
throw new IOException(e.Message, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to just use http://source.roslyn.io/Microsoft.CodeAnalysis/R/9c1e0a07e83f2ac3.html instead?

(Also, should that be changed to use an exception filter to avoid ever catching IOException in the first place)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. ya. will change it to use that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we wrap in this way? My intuition is "just throw because catching non-Exception exceptions is trying to win a game of whack a mole".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We consistently distinguish between IOExceptions and anything else across the code base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would argue that distinction is very questionable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are diligent and are wrapping them. It's worked well so far.

Catch where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We are diligent to the places we know about and that's the problem. There are plenty of more cases where IO occurs that we don't know about. Every miss is just a crash that we're passing onto the customer instead of fixing now.

This is why I'm always very skeptical of approaches like this. Every time a customer reports the exception we will just catch ,wrap in IOException and move on. Hence the end behavior is indistinguishable from just catching Exception in the first place. This approach just causes us more work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to be careful whenever we call outside of our code base. I'm not saying it's easy. But the solution is not to just catch and ignore all exceptions everywhere. That's a nightmare - things will stop working and nobody will know why. The customer won't be able to create a small repro and we won't be able to fix anything. So yes, you won't have VS crashing. But you won't have it working either. Better approach is to catch all exceptions in a filter with NFW reported on some recovery boundaries where we can report an error and continue running with no (best effort) state corruption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaredpar @tmat so what is the conclusion? I am thinking not changing FileUtilities and take file length thing in to FileTextLoader itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave it here and do what we do in the rest of the FileUtilities - which is to turn non-IOExceptions to IOExceptions. It's consistent with the rest of the utils code.

internal static class FileTextLoaderOptions
{
[ExportOption]
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 10 * 1024 * 1024,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per mail-thread, I don't think this is large enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if (fileLength > workspace.Options.GetOption(FileTextLoaderOptions.FileLengthThreshold))
{
// log max file length to AI
Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m => m["FileLength"] = fileLength));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this path will also be hit by AdditionalDocuments, is there something more actionable than just size that we can log here without capturing PII? File extension at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we can do that.

@@ -610,4 +610,7 @@
<data name="The_first_word_0_must_begin_with_a_lower_case_character" xml:space="preserve">
<value>The first word, '{0}', must begin with a lower case character</value>
</data>
<data name="File_size_exceeds_maximum_allowed_size_0" xml:space="preserve">
<value>File size exceeds maximum allowed size: {0}</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're past loc cutoff, so it might be better to just use a hardcoded message in RTW, and update later. Not sure though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m => m["FileLength"] = fileLength));

var message = string.Format(WorkspacesResources.File_size_exceeds_maximum_allowed_size_0, path);
throw new IOException(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why IOExecption? Why not just throw Execption. Or InvalidOperationException. No IO is actually occuring here.

Copy link
Member

@tmat tmat Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I don't think we should be throwing an IOException here. I'd either change it to Try-pattern, or define a new exception type that's gonna be handled here: http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Workspace/Solution/TextDocumentState.cs,177

Copy link
Contributor Author

@heejaechang heejaechang Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pilchie how do you think. looks like FileTextLoader consider any exception happen inside FileTextLoader as IOException regardless they are generally considered IOException or not in OS standpoint.

@heejaechang heejaechang force-pushed the filesizelimit branch 2 times, most recently from 6b21ae4 to 307bd8c Compare January 27, 2017 19:03
@heejaechang heejaechang removed the PR For Personal Review Only The PR doesn’t require anyone other than the developer to review it. label Jan 27, 2017
@heejaechang
Copy link
Contributor Author

ping?

@heejaechang
Copy link
Contributor Author

@jaredpar @tmat @dotnet/roslyn-compiler @AlekseyTs @cston @VSadov can I get code review for changes in FileUtilities?

// File Threshold 100MB
[ExportOption]
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 100 * 1024 * 1024,
storageLocations: new LocalUserProfileStorageLocation(@"Roslyn\Internal\Performance\Text\FileLengthThreshold"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of "Internal" here seemed wrong given users may need to change it. Can't tell if this is an established pattern though. If it is then okay.

Copy link
Contributor Author

@heejaechang heejaechang Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user doesn't change it through code, but through regkey, so code itself is internal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'm referring to the use of Internal in the reg key name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that one. ya, we put all reg key which is not public through tool->options, there.

@jac009
Copy link

jac009 commented Jan 27, 2017

@heejaechang It's not clear to me that this has the telemetry of hitting the issue from my review. Is that just the "Log" statements?

@AlekseyTs
Copy link
Contributor

FileUtilities LGTM.

@heejaechang
Copy link
Contributor Author

heejaechang commented Jan 27, 2017

@jac009 Yes. Log is the one that send telemetry to AI. since this code is in lower layer than VS layer, I can't use VS Telemetry API directly.

}
}

private static string TryGetFileExtension(string path)
Copy link
Member

@tmat tmat Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I believe we already have a utility function to do this: PathUtilities.GetExtension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, okay.

@tmat
Copy link
Member

tmat commented Jan 27, 2017

👍

@heejaechang
Copy link
Contributor Author

@Pilchie should I ask matt for bar check for RTW?

@heejaechang
Copy link
Contributor Author

@natidea just FYI, looks like we will bring this for RTW.

Logger.Log(FunctionId.FileTextLoader_FileLengthThresholdExceeded, KeyValueLogMessage.Create(m =>
{
m["FileLength"] = fileLength;
m["Ext"] = PathUtilities.GetExtension(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this get associated with some other project type information? My worry is there's some customer out there with some large auto-generated file (think LINQ, EF, etc.) and we'll see here it's a C# file but not know further details. Although I'm not sure what else could be logged here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonmalinowski non-fatal watson then?

@@ -610,4 +610,7 @@
<data name="The_first_word_0_must_begin_with_a_lower_case_character" xml:space="preserve">
<value>The first word, '{0}', must begin with a lower case character</value>
</data>
<data name="File_size_exceeds_maximum_allowed_size_0" xml:space="preserve">
<value>File size exceeds maximum allowed size: {0}</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say what the file size was? Something like "File '{0}' size of {1} exceeds maximum allowed size of {2}"? Just so that way if a customer does hit that they have some hint what they should try getting their file under, vs. just going "uh, I need to make it smaller I guess?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. @Pilchie who is localization champ by the way?

@jasonmalinowski
Copy link
Member

Tagging @dotnet/roslyn-ide since this is touching workspaces.

@CyrusNajmabadi
Copy link
Member

Yes. Please always tag @dotnet/roslyn-ide if we're making changes that touch infrastructural pieces.

@@ -366,6 +366,29 @@ internal static DateTime GetFileTimeStamp(string fullPath)
{
return File.GetLastWriteTimeUtc(fullPath);
}
catch (IOException)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have other PerformIO helpers that handle all these sorts of cases. Is there a reason we're not using those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used FileUtilities since that was the one already used in this file. just looked PerformIO which seems very similar to FileUtilites, not sure why there is 2 utilities which does same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened tracking issue - #16836

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks #closed.

{
// File Threshold 100MB
[ExportOption]
internal static readonly Option<long> FileLengthThreshold = new Option<long>(nameof(FileTextLoaderOptions), nameof(FileLengthThreshold), defaultValue: 100 * 1024 * 1024,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what FileLengthThreshold means. Can we comment this. What happens at 100MB? Do we simply ignore hte file entirely? Do we have the document, but we truncate it at 100MB? Do we have the document, but treat its contents as empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

var fileLength = FileUtilities.GetFileLength(path);
if (fileLength > workspace.Options.GetOption(FileTextLoaderOptions.FileLengthThreshold))
{
// log max file length to AI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is AI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

application insight

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't abbr unnecessarily.

}));

var message = string.Format(WorkspacesResources.File_size_exceeds_maximum_allowed_size_0, path);
throw new InvalidDataException(message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this manifest higher up the stack? What is the end user experience this causes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other failure in this file. showing in workspace failure output pane.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment code to make that clear. It's also unclear to me what the user does at that point. Is the workspace unusable? Do they have any recourse on what they can do?

Copy link
Member

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but i don't know what this change actually does. The code (and PR) need explanations of what's going on here and why it's the appropriate "fix" for whatever issue we're encountering.

@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi have you read description?

@CyrusNajmabadi
Copy link
Member

Yes. THe description is extremely vague and unclear.

@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi you should ask @Pilchie and @MattGertz why it is appropriate fix since they are the one decide to do this.

@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi hmm.. so want me to add comment on which part? on 100MB limit or what happen it throws?

@CyrusNajmabadi
Copy link
Member

I'm asking for the description (and code) to actually explain what is going on. Reading the code, i cannot tell what the actual behavior of tihs will be. The code is inadequately commented, and i cannot sign off that this is any sort of appropriate fix.

Please:

  1. State what behavior we have now.
  2. State what behavior we want to have.
  3. Explain how this change produces the behavior we want.

@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi in description or in code? where in commit description?

@MattGertz
Copy link
Contributor

Actually, it was JoC 's request...

@Pilchie
Copy link
Member

Pilchie commented Jan 30, 2017

Code is better than commit description, since it's right there later.

@CyrusNajmabadi
Copy link
Member

in description or in code

Both are fine. but, at the very least, this needs to be in the code.

where

Ideally in any codepath that performs these checks. People reading the code need to understand what the checks are doing and what impact it has on the workspace. For example (as i asked earlier):

  1. Is the workspace usable after this?
  2. What impact does this have in the document in the workspace? i.e. is the document in the workspace? Is it there with truncated text? Is it there with empty text? etc. etc.
  3. What options does the user have at this point? Do they need to close the project? Do they need to close VS? etc. etc.

Basically, nothing has actually been explained here. I want to know precisely what the impact of this change is; what we want to accomplish; and how this PR gets us that.

--

Note: this PR may be perfect (as far as the code change goes). I'm simply stating that i have no way to effectively review this change as i do not know what you're trying to accomplish, and thus i have no idea if your code change is appropriate.

@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi sure, trying to add comment. just to want to make sure where you want. since it is just throwing, it is other part that decide what to do with the caught exception. so, if I put comment what workspace does because of this exception in the code area I changed, it is far part from where it actually happens. so, if later some body change that behavior, that person will have no idea, the comment exist here which describe workspace behavior he is changing.

basically this code has nothing to do with 1,2,3. it is the one who caught the exception decide 1,2,3 which is in DocumentState I believe.

@CyrusNajmabadi
Copy link
Member

Sorry, i cannot sign off without a complete understanding of the impact of this change. It is not sufficient ot say (effectively) "i am throwing, and it is someone else's responsibility as to what happens then". I need the actual information as to what impact this change will have. Otherwise, taking this fix would be very irresponsible as it could cause any number of bad things to happen, right when we're in escrow.

As for the comment location. If you feel like there are two distant piece of code that can be impacted by this, then feel free to comment both locations. i.e. in this location that you're changing, comment on which parts of the workspace that are affected by this. And in the workspace, please comment on which parts have to deal with the potential exceptions that can be thrown here.

that person will have no idea

If there is ever a concern about some 'person' having 'no idea' about what's happening, then the onus is on us to effectively comment hte places of concern so that that doesn't happen.

@Pilchie Pilchie added Area-IDE Tenet-Reliability Customer telemetry indicates that the product is failing in a crash/hang/dataloss manner. labels Jan 30, 2017
@Pilchie Pilchie added this to the 2.0 (RTM) milestone Jan 30, 2017
@heejaechang
Copy link
Contributor Author

@CyrusNajmabadi updated comments!

try
{
// save temp file
tempPath = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just Path.GetTempFileName.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better: Temp.CreateDirectory()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove try-finally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will change to use that. I searched how other does this kind of thing in code base and did same thing.

this is added so that we can reduce chance of OOM when user added massive size text file to solution.

current default max is 100MB. but user can override it through hidden reg key if they want to.

if such file is added, FileTextLoader will throw InvalidDataException and Workspace will raise Workspace Failed event.

in VS host case, we will show the failed event in output windows.

made it to use test option service since option is mutable workspace state which affects all other tests.
@heejaechang
Copy link
Contributor Author

ping?

{
// Validate file length is under our threshold.
// Otherwise, rather than reading the content into the memory, we will throw
// InvalidDataException to caller of FileTextLoader.LoadText to deal with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this exception to the documentation of the virtual calls?

@CyrusNajmabadi
Copy link
Member

👍 Thanks for adding the better documentation!

@natidea
Copy link
Contributor

natidea commented Jan 31, 2017

Approved to merge via 377245

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved to merge Area-IDE cla-already-signed Needs Shiproom Approval Tenet-Reliability Customer telemetry indicates that the product is failing in a crash/hang/dataloss manner.
Projects
None yet
Development

Successfully merging this pull request may close these issues.