Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows AccessViolationException in FileSystemWatcher when monitoring network share for changes #42419

Merged
merged 3 commits into from
Sep 18, 2020

Conversation

carlossanlop
Copy link
Member

This issue was reported for 3.1 twice:

When monitoring a network share with FileSystemWatcher, if there are network issues that could cause the data to arrive malformed, then after casting the byte array to a FILE_NOTIFY_INFORMATION, attempts to access the struct fields can throw an AV.

The issue was also reported internally by one of the customers. Our CSS partner was able to reproduce it locally (I couldn't).

I provided him a private binary of System.IO.FileSystem.Watcher.dll with the changes in this fix, and he confirmed the AV no longer happens.

We need this change to be backported to 5.0 and 3.1.

Here is a small repro code, which was executed along with clumsy, which can alter the network quality.

Code
using System;
using System.IO;

namespace FileWatcherConsole
{
  class Program
  {
      static void Main()
      {
          try
          {
              var w = new Watcher();
              w.InitializeFilewatcher();
          }
          catch (Exception ex)
          {
              Console.WriteLine(ex.Message);
              Console.WriteLine(ex.StackTrace);
          }
      }
  }

  class Watcher
  {
      const string WatchPath = @"\\remote\location\goes\here";
      FileSystemWatcher _watcher = new FileSystemWatcher();

      public void InitializeFilewatcher()
      {
          _watcher = new FileSystemWatcher(WatchPath);
          _watcher.InternalBufferSize = 65536;
          _watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.FileName;
          _watcher.Changed += new FileSystemEventHandler(FileChanged);
          _watcher.Created += new FileSystemEventHandler(FileCreated);
          _watcher.Renamed += new RenamedEventHandler(FileRenamed);
          _watcher.Deleted += new FileSystemEventHandler(FileDeleted);
          _watcher.Error += Watcher_Error;
          _watcher.EnableRaisingEvents = true;
      }

      private void FileRenamed(object sender, RenamedEventArgs e) => Console.WriteLine($"Renamed: {e.OldName} -> {e.Name}");

      private void FileCreated(object sender, FileSystemEventArgs e) => Console.WriteLine($"Created: {e.Name} - {e.FullPath}");

      private void FileChanged(object sender, FileSystemEventArgs e) => Console.WriteLine($"Changed: {e.Name} - {e.FullPath}");

      private void FileDeleted(object sender, FileSystemEventArgs e) => Console.WriteLine($"Deleted: {e.Name} - {e.FullPath}");

      private void Watcher_Error(object sender, ErrorEventArgs e)
      {
          Exception ex = e.GetException();
          Console.WriteLine(ex.Message);
          Console.WriteLine(ex.StackTrace);
      }
  }
}

@carlossanlop
Copy link
Member Author

/backport to release/5.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/260302385

Copy link
Member Author

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas @ericstj @danmosemsft I would like to point out that we sent a private DLL to the customer with the changes up to this point, and they got back to us last night confirming that the issue is gone with the fix.

I address almost all feedback I got, but I would like to emphasize that we are actively trying to backport this change to 5.0 and 3.1, so I would like to minimize the number of changes as much as we can.

The FileSystemWatcher unit tests passed locally.

I'm working on getting these suggestions addressed in the 5.0 and 3.1 backports as well.

cc @jeffhandley @tarekgh

{
Debug.Assert(buffer != null);
Debug.Assert(buffer.Length > 0);

numBytes = Math.Min(numBytes, (uint)buffer.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect wrong numBytes to be actually passed in, or are we just doing this defensively?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done defensively. it was suggested by @ericstj

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add assert and comment to make it clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defensively in case the number of bytes provided by Windows does not match the length in the buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the assert and removed it to add the check :-) I think it is fine to return back the assert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to return back the assert.

This whole PR's premise is we can't trust the data format being reported by Windows. We can keep the assert, but we need to have the check to protect against potential AV at runtime.

From the same token, we could be asserting in the break statements added below, since those also represent cases which shouldn't happen. I believe @jkotas suggested the same here: https://github.com/dotnet/runtime/pull/42419/files#r491093990

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought back the assert and I updated the backport PRs to ensure the CI finishes on time before 4pm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could be asserting in the break statements added below

The suggestion makes sense to me. Can I add it those asserts in a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Also, please add comments like: "This is defensive check in case the number of bytes provided by Windows is wrong. We have not seen this situation in practice.".

@jkotas
Copy link
Member

jkotas commented Sep 18, 2020

we are actively trying to backport this change to 5.0 and 3.1, so I would like to minimize the number of changes as much as we can.

It is not unusual to do a servicing change that has minimal changes and that is coded in less than ideal way; and do a different cleaner change for the tip.

@carlossanlop
Copy link
Member Author

It is not unusual to do a servicing change that has minimal changes and that is coded in less than ideal way; and do a different cleaner change for the tip.

I'd be happy to update the backport PRs with the suggestions provided here, if there's enough time after merging this change.

@tarekgh
Copy link
Member

tarekgh commented Sep 18, 2020

I'd be happy to update the backport PRs with the suggestions provided here, if there's enough time after merging this change.

I think the changes here is reasonable for the service release too. I am not seeing high risk with it. I am not sure if there is any more strong feedback need to be addressed here.

@carlossanlop
Copy link
Member Author

/backport to release/5.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/261654138

@carlossanlop
Copy link
Member Author

/backport to release/5.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/261682407

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits, hold-off addressing my comments to avoid restarting CI.

// if we are watching files over the network with a poor connection.

// Verify the info object is within the expected bounds
if (info < b || (((byte*)info) + sizeof(Interop.Kernel32.FILE_NOTIFY_INFORMATION)) > pBufferLimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (info < b || (((byte*)info) + sizeof(Interop.Kernel32.FILE_NOTIFY_INFORMATION)) > pBufferLimit)
if (info < b || ((byte*)info) + sizeof(Interop.Kernel32.FILE_NOTIFY_INFORMATION) > pBufferLimit)

}

// Verify the file path is within the bounds
byte* pFileEnd = ((byte*)&(info->FileName)) + info->FileNameLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a local for info->FileName and info->FileNameLength since they are also referenced below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve readability or to improve performance? It won't help to improve performance.

Copy link
Member

@jozkee jozkee Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve readability. I was thinking that it could help both, but it may be worth for the former still.

Copy link
Member

@jkotas jkotas Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have submitted #42474 that shows how this can be rewritten using Span - to both improve readability and improve security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas is the intention of that fix to be the backport for 3.1 and 5.0? In that case, I can close my 3 PRs and I can let you submit the backports as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to use your original fix for servicing. It is less a bit less invasive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great. So the only one I'll close is the one for master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, if we want to preserve consistency, I should also merge the master change and your PR can go on top of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging your PR to master sounds good to me.

@carlossanlop
Copy link
Member Author

CI failures:

  • OSX x64 release and Browser wasm Release - Unrelated to this change. Timeouts while attempting to build.
    ##[error]The job running on agent Azure Pipelines 52 ran longer than the maximum time of 120 minutes.
    

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

w3wp.exe crash in FileSystemWatcher after 2.2 -> 3.1 migration (c0000005 Access Violation)
6 participants