-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM PR reviewer #6381
LLM PR reviewer #6381
Conversation
Datadog ReportBranch report: ✅ 0 Failed, 237879 Passed, 1964 Skipped, 19h 38m 3.35s Total Time New Flaky Tests (1)
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (69ms) : 66, 73
. : milestone, 69,
master - mean (69ms) : 66, 72
. : milestone, 69,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (986ms) : 957, 1014
. : milestone, 986,
master - mean (982ms) : 957, 1007
. : milestone, 982,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (108ms) : 106, 110
. : milestone, 108,
master - mean (108ms) : 105, 110
. : milestone, 108,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (684ms) : 666, 701
. : milestone, 684,
master - mean (681ms) : 666, 696
. : milestone, 681,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (92ms) : 89, 95
. : milestone, 92,
master - mean (92ms) : 90, 94
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (639ms) : 625, 654
. : milestone, 639,
master - mean (634ms) : 616, 651
. : milestone, 634,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (189ms) : 185, 194
. : milestone, 189,
master - mean (189ms) : 184, 193
. : milestone, 189,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (1,090ms) : 1055, 1124
. : milestone, 1090,
master - mean (1,082ms) : 1051, 1114
. : milestone, 1082,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (276ms) : 271, 282
. : milestone, 276,
master - mean (276ms) : 271, 281
. : milestone, 276,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (874ms) : 848, 900
. : milestone, 874,
master - mean (866ms) : 835, 897
. : milestone, 866,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6381) - mean (265ms) : 261, 269
. : milestone, 265,
master - mean (263ms) : 259, 266
. : milestone, 263,
section CallTarget+Inlining+NGEN
This PR (6381) - mean (851ms) : 819, 883
. : milestone, 851,
master - mean (848ms) : 810, 886
. : milestone, 848,
|
Throughput/Crank Report ⚡Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6381) (11.174M) : 0, 11174235
master (11.434M) : 0, 11433823
benchmarks/2.9.0 (11.033M) : 0, 11032866
section Automatic
This PR (6381) (7.347M) : 0, 7346518
master (7.329M) : 0, 7329326
benchmarks/2.9.0 (7.786M) : 0, 7785853
section Trace stats
master (7.611M) : 0, 7611113
section Manual
master (11.108M) : 0, 11107912
section Manual + Automatic
This PR (6381) (6.682M) : 0, 6681760
master (6.845M) : 0, 6844945
section DD_TRACE_ENABLED=0
master (10.329M) : 0, 10328733
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6381) (9.275M) : 0, 9274926
master (9.534M) : 0, 9533585
benchmarks/2.9.0 (9.495M) : 0, 9494821
section Automatic
This PR (6381) (6.415M) : 0, 6414890
master (6.293M) : 0, 6293263
section Trace stats
master (6.541M) : 0, 6541247
section Manual
master (9.502M) : 0, 9502053
section Manual + Automatic
This PR (6381) (5.927M) : 0, 5926822
master (5.976M) : 0, 5976365
section DD_TRACE_ENABLED=0
master (8.806M) : 0, 8806055
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6381) (9.855M) : 0, 9854550
master (9.968M) : 0, 9968345
benchmarks/2.9.0 (10.020M) : 0, 10019592
section Automatic
This PR (6381) (6.280M) : 0, 6280374
master (6.506M) : 0, 6506205
benchmarks/2.9.0 (7.255M) : 0, 7255257
section Trace stats
master (7.120M) : 0, 7119839
section Manual
master (10.011M) : 0, 10010780
section Manual + Automatic
This PR (6381) (5.945M) : 0, 5944685
master (5.923M) : 0, 5922704
section DD_TRACE_ENABLED=0
master (9.290M) : 0, 9290361
|
Benchmarks Report for tracer 🐌Benchmarks for #6381 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 | 1.156 | 400.90 | 463.55 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 401ns | 0.496ns | 1.92ns | 0.00815 | 0 | 0 | 576 B |
master | StartFinishSpan |
netcoreapp3.1 | 568ns | 1.8ns | 6.98ns | 0.00775 | 0 | 0 | 576 B |
master | StartFinishSpan |
net472 | 604ns | 0.974ns | 3.77ns | 0.0918 | 0 | 0 | 578 B |
master | StartFinishScope |
net6.0 | 477ns | 0.702ns | 2.72ns | 0.00967 | 0 | 0 | 696 B |
master | StartFinishScope |
netcoreapp3.1 | 665ns | 1.03ns | 3.97ns | 0.00935 | 0 | 0 | 696 B |
master | StartFinishScope |
net472 | 874ns | 1.41ns | 5.47ns | 0.104 | 0 | 0 | 658 B |
#6381 | StartFinishSpan |
net6.0 | 463ns | 0.701ns | 2.71ns | 0.00804 | 0 | 0 | 576 B |
#6381 | StartFinishSpan |
netcoreapp3.1 | 623ns | 1.28ns | 4.95ns | 0.00766 | 0 | 0 | 576 B |
#6381 | StartFinishSpan |
net472 | 622ns | 0.997ns | 3.86ns | 0.0916 | 0 | 0 | 578 B |
#6381 | StartFinishScope |
net6.0 | 484ns | 0.846ns | 3.28ns | 0.00973 | 0 | 0 | 696 B |
#6381 | StartFinishScope |
netcoreapp3.1 | 717ns | 1.64ns | 6.14ns | 0.00932 | 0 | 0 | 696 B |
#6381 | StartFinishScope |
net472 | 853ns | 1.46ns | 5.67ns | 0.104 | 0 | 0 | 658 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 646ns | 0.639ns | 2.48ns | 0.00962 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 905ns | 1.14ns | 4.43ns | 0.00912 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
net472 | 1.11μs | 2.23ns | 8.64ns | 0.104 | 0 | 0 | 658 B |
#6381 | RunOnMethodBegin |
net6.0 | 623ns | 0.734ns | 2.84ns | 0.0096 | 0 | 0 | 696 B |
#6381 | RunOnMethodBegin |
netcoreapp3.1 | 950ns | 1.18ns | 4.58ns | 0.00907 | 0 | 0 | 696 B |
#6381 | RunOnMethodBegin |
net472 | 1.05μs | 1.87ns | 7.25ns | 0.104 | 0 | 0 | 658 B |
fix typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits
@@ -254,6 +254,39 @@ stages: | |||
displayName: Generate Matrices | |||
name: generate_variables_step | |||
|
|||
- stage: generate_LLM_Report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we please put this at the bottom of the file, as it's entirely optional? 🥺
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
- template: steps/update-github-status-jobs.yml | ||
parameters: | ||
jobs: [generate_LLM_job] | ||
|
||
- job: generate_LLM_job | ||
timeoutInMinutes: 3 | ||
dependsOn: [] | ||
pool: | ||
name: azure-windows-scale-set-3 | ||
|
||
steps: | ||
- template: steps/clone-repo.yml | ||
parameters: | ||
targetShaId: $(targetShaId) | ||
targetBranch: $(targetBranch) | ||
- template: steps/install-latest-dotnet-sdk.yml | ||
|
||
- powershell: | | ||
tracer/build.ps1 LLMReport | ||
displayName: Generate LLM report | ||
name: generate_llm_step | ||
env: | ||
PullRequestNumber: $(System.PullRequest.PullRequestNumber) | ||
GITHUB_TOKEN: $(GITHUB_TOKEN) | ||
OpenAIKey: $(OPEN_AI_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong value I think, should be OPEN_AI_KEY
based on the Nuke code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined as
[Parameter("An OpenAI key", Name = "OPEN_AI_KEY")]
readonly string OpenAIKey;
It seems to be working...
tracer/build/_build/Build.GitHub.cs
Outdated
else if (string.IsNullOrEmpty(OpenAIKey)) | ||
{ | ||
result = "Null or empty OpenAI key."; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't happen because you marked it Required()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done!
tracer/build/_build/Build.GitHub.cs
Outdated
|
||
if (executeLocal) | ||
{ | ||
File.WriteAllText("changes.txt", fullPrompt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should make this absolute paths, so the location of changes.txt
is fixed and in a sensible (i.e. temporary) place. That way it will be excluded by the gitignore (same goes for the LLMResult.txt
).
Personally, I'd rather we didn't generate these files, and instead just print them to the console. You can always pipe them to a file if you want to anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Thanks! Done!
tracer/build/_build/Build.GitHub.cs
Outdated
@@ -94,6 +99,92 @@ await client.Issue.Update( | |||
Console.WriteLine($"PR assigned"); | |||
}); | |||
|
|||
Target LLMReport => _ => _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename it so that it's clear it's only a review of a PR? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the target to LLMPRReview. Thanks!
Co-authored-by: Andrew Lock <[email protected]>
Co-authored-by: Andrew Lock <[email protected]>
Co-authored-by: Andrew Lock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'll try it in some of my PRs 😄
|
||
var requestContent = new | ||
{ | ||
model = "gpt-4o", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the result really tied to the model used? (like does it do any difference with o1 and the thinking pattern?)
We can maybe in the future specify the openAI model as an arg if other new models get released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I chose this model as a compromise between results and price, but adding the model as an optional argument could be a nice feature. Thanks!
Thanks for your feedback and reviews! |
Summary of changes
This PR adds one stage to the pipeline that writes a report with the code changes of the PR by sending them to OpenAI for a code review. This stage is optional and is only launched when the pipeline variable "generate_llm_report" is set to "true" here. By default, this variable will be set as "false" unless it's decided that we want to add these kind of reports by default.
Also, a report can be generated locally by running the task LLMPRReview
For example:
tracer\build LLMPRReview -GITHUB_TOKEN <GH_TOKEN> -OPEN_AI_KEY <OPEN_AI_KEY> -PullRequestNumber <PRNumber>
This task, when run in local mode, will generate two files:
Reason for change
It's an innovation week project.
Implementation details
Test coverage
Other details