Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK: Use Volatile.Read over Interlocked.Read #3384

Closed
wants to merge 1 commit into from

Conversation

CodeBlanch
Copy link
Member

Noticed working on #3349 that Volatile.Read has an optimization for 64bit systems that make it faster than Interlocked.Read. This PR updates some of the other places where Interlocked.Read was being used.

Simple benchmark:

    public class VolatileBenchmarks
    {
        private long value;

        [Benchmark]
        public long UsingInterlocked()
        {
            var v = Interlocked.Read(ref value);
            return v;
        }

        [Benchmark]
        public long UsingVolatile()
        {
            var v = Volatile.Read(ref value);
            return v;
        }
    }

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
12th Gen Intel Core i9-12900HK, 1 CPU, 20 logical and 14 physical cores
.NET SDK=6.0.400-preview.22301.10
[Host] : .NET 6.0.6 (6.0.622.26707), X64 RyuJIT
DefaultJob : .NET 6.0.6 (6.0.622.26707), X64 RyuJIT

Method Mean Error StdDev Median
UsingInterlocked 5.6288 ns 0.1398 ns 0.4012 ns 5.3640 ns
UsingVolatile 0.0021 ns 0.0041 ns 0.0038 ns 0.0000 ns

@CodeBlanch CodeBlanch requested a review from a team June 18, 2022 16:49
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@utpilla
Copy link
Contributor

utpilla commented Jun 20, 2022

From the docs:

On a multiprocessor system, a volatile read operation is not guaranteed to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.

Does Interlocked.Read guarantee that? I checked the docs for Interlocked.Read but they don't mention anything about the value returned being the latest.

@utpilla
Copy link
Contributor

utpilla commented Jun 20, 2022

    public class VolatileBenchmarks
    {
        private long value;

        [Benchmark]
        public long UsingVolatile()
        {
            var v = Volatile.Read(ref value);
            return v;
        }
    }

Is the best way to test the performance here? I'm not sure if there is any possibility of reordering of memory operations here as there is only one memory read operation and moreover the benchmarks are not run on multiple threads.

@reyang
Copy link
Member

reyang commented Jun 20, 2022

From the docs:

On a multiprocessor system, a volatile read operation is not guaranteed to obtain the latest value written to that memory location by any processor. Similarly, a volatile write operation does not guarantee that the value written would be immediately visible to other processors.

Does Interlocked.Read guarantee that? I checked the docs for Interlocked.Read but they don't mention anything about the value returned being the latest.

I think volatile was seen initially in the C programming language, with the intention of handling device I/O that are backed by memory. The semantic wasn't clearly defined until much later, and different programming languages kind of inherited the spirit but have slightly different interpretations https://en.wikipedia.org/wiki/Volatile_(computer_programming). In general, there are two meanings:

  • a volatile read/write has to be backed by a real read/write operation
    e.g. if x is not declared as volatile, a smart compiler could just figure out x = x + 10; with volatile, it has to access the real variable and perform the x + 1 operation.
    for (int i = 0; i < 10; i++)
    {
      x += 1;
    }
  • the instructions before and after volatile operations cannot be moved across the memory barrier

@reyang
Copy link
Member

reyang commented Jun 20, 2022

    public class VolatileBenchmarks
    {
        private long value;

        [Benchmark]
        public long UsingVolatile()
        {
            var v = Volatile.Read(ref value);
            return v;
        }
    }

Is the best way to test the performance here? I'm not sure if there is any possibility of reordering of memory operations here as there is only one memory read operation and moreover the benchmarks are not run on multiple threads.

Interlocked provides thread level guarantee, regardless of single vs. multiple CPU/cores: https://docs.microsoft.com/dotnet/api/system.threading.interlocked

Provides atomic operations for variables that are shared by multiple threads.

@CodeBlanch
Copy link
Member Author

I'm basing this basically off this: https://stackoverflow.com/a/72508084

  • On 32 bit, no difference between Interlocked.Read & Volatile.Read.
  • On 64 bit, Volatile.Read gives you a cheaper memory fence w/o atomicity because read is already atomic by the CPU.

@utpilla
Copy link
Contributor

utpilla commented Jun 20, 2022

  • On 64 bit, Volatile.Read gives you a cheaper memory fence w/o atomicity because read is already atomic by the CPU.

My question is mainly around "freshness" of data since the docs for volatile read clearly state that:

On a multiprocessor system, a volatile read operation is not guaranteed to obtain the latest value written to that memory location by any processor.

We would want to get the latest value written to the memory location for our scenarios, right? (otherwise, we might get some value that the processor cached)? Also, if Interlocked ensures that we would get the latest value written to that memory location by any processor, then we would want to use Interlocked, right?

@pellared
Copy link
Member

pellared commented Jun 21, 2022

Personally, I find using Volatile.Read or Thread.VolatileRead over Interlocked.Read as a dangerous (probably buggy) premature optimization.

  1. It does not guarantee to return "fresh" data, because it can use a "half-fence". From the docs: "If a read or write appears after this method in the code, the processor cannot move it before this method." It means that a Volatile.Read can be reordered before a write operation. Would it not cause bugs?
  2. It would make maintaining the codebase harder.
  3. How much time would we save in an E2E scenario?

Side note: finding issues using volatile/Volatile is hard b/c most of the processors are often "forcing" CPU cache synchronization. I remember using Intel Atom for finding race conditions a few years ago 😆

Additional reference: https://www.albahari.com/threading/part4.aspx#_Nonblocking_Synchronization

@CodeBlanch
Copy link
Member Author

On a multiprocessor system, a volatile read operation is not guaranteed to obtain the latest value written to that memory location by any processor.

We would want to get the latest value written to the memory location for our scenarios, right? (otherwise, we might get some value that the processor cached)? Also, if Interlocked ensures that we would get the latest value written to that memory location by any processor, then we would want to use Interlocked, right?

TBH, I don't know. It seems like a bug with the docs. If Volatile.Read doesn't do a volatile read, what purpose does it have? 🤣

@noahfalk Any insight you can share on this?

@noahfalk
Copy link

I assume when the docs say you aren't guaranteed to get the latest value, they mean Volatile.Read isn't going to do an IPI like FlushProcessWriteBuffers. Interlocked.Read also doesn't do an IPI so I don't think either one is giving you a stronger guarantee of freshness. Anything that did do that IPI would be very expensive and it is rare to have algorithms which require it for correctness.

Looking at the generated code the difference between them is that Volatile.Read gaurantees it won't reorder with any following read/write whereas Interlocked.Read is giving you a guarantee it won't be reordered with any read/write before or after. Which ordering guarantees you need depend on the algorithm being implemented. Both of them are also giving you an atomicity guarantee. Oddly I see that Volatile.Read docs fail to mention the atomicity guarantee and Interlocked.Read docs fail to mention the ordering guarantee.

If your goal is to be safe, Interlocked.Read is giving you a strict super-set of the guarantees from Volatile.Read, so replacing Volatile.Read -> Interlocked.Read should always be safe. However if you are trying to optimize for performance, Volatile.Read is going to be faster in situations where you only need lesser reordering guarantee. Your benchmark above shows the best case performance for Interlocked.Read because it is single threaded. If you make another benchmark where Interlocked.Read of the same location is being executed frequently from 2 or more threads I would expect that cost to go up 10-30x whereas Volatile.Read cost won't go up.

@cijothomas cijothomas closed this Jun 28, 2022
@cijothomas cijothomas reopened this Jun 28, 2022
@codecov
Copy link

codecov bot commented Jun 28, 2022

Codecov Report

Merging #3384 (6c49e37) into main (274dc44) will increase coverage by 0.42%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3384      +/-   ##
==========================================
+ Coverage   85.83%   86.25%   +0.42%     
==========================================
  Files         271      258      -13     
  Lines        9523     9269     -254     
==========================================
- Hits         8174     7995     -179     
+ Misses       1349     1274      -75     
Impacted Files Coverage Δ
...ourceInstrumentation/DiagnosticSourceSubscriber.cs 95.12% <100.00%> (ø)
src/OpenTelemetry/Metrics/MetricPoint.cs 84.89% <100.00%> (ø)
src/OpenTelemetry.Api/Context/RuntimeContext.cs 63.63% <0.00%> (-27.28%) ⬇️
...emetry.Api/Context/AsyncLocalRuntimeContextSlot.cs 75.00% <0.00%> (-25.00%) ⬇️
...c/OpenTelemetry/Logs/OpenTelemetryLoggerOptions.cs 78.57% <0.00%> (-21.43%) ⬇️
...tation/OpenTelemetryProtocolExporterEventSource.cs 75.00% <0.00%> (-10.00%) ⬇️
...elemetry/Metrics/MeterProviderBuilderExtensions.cs 63.82% <0.00%> (-7.60%) ⬇️
src/OpenTelemetry/Resources/ResourceBuilder.cs 93.75% <0.00%> (-6.25%) ⬇️
src/OpenTelemetry/BatchExportProcessor.cs 79.04% <0.00%> (-2.86%) ⬇️
.../OpenTelemetry/Metrics/MeterProviderBuilderBase.cs 87.71% <0.00%> (-1.57%) ⬇️
... and 21 more

@github-actions
Copy link
Contributor

github-actions bot commented Jul 6, 2022

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jul 6, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants