Artifacts should only be logged in the SARIF if we have results #2431

eddynaka · 2022-01-27T23:59:33Z

Description

During some analysis, we saw that we are emitting an artifact even if we don't have a result. This is not good because it will make the SARIF gigantic in case you analyze hundreds/thousands of files.

Proposal

We will only emit an artifact if we have a result.

Tests

Created one 'file' of each level/kind and checked the SARIF output. It should only generate a list of artifacts equal to the locations you have in the results.

eddynaka · 2022-01-28T00:01:26Z

src/Sarif/Writers/SarifLogger.cs

-                        dataToInsert: dataToInsert,
-                        encoding: encoding,
-                        hashData: hashData);
-                }


This code path is used by the single threaded analysis. When hashes is enabled, it was going to generate all hashes and store everything in the SARIF.

The issue is that if we analyze 1k files and we do not produce any results, we would still generate the artifacts.

As an optimization, we will only store the artifact if we have a result.

Reviewed all versions that we released and I saw that all versions after 1.4.2 are emitting the artifacts IF the list of analysisTargets is not empty.

Looking at the other variables, looks like we tried to improve this behavior by adding the _persistArtifacts, but that is occurring too late (after we already added the artifact to the run)

eddynaka · 2022-01-28T00:02:30Z

src/Sarif/Writers/SarifLogger.cs

@@ -447,12 +411,16 @@ private void CaptureArtifact(ArtifactLocation fileLocation)
                catch (ArgumentException) { } // Unrecognized encoding name
            }

+            HashData hashData = null;
+            AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);


AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);

For the single threaded analysis, the AnalysisTargetToHashDataMap will have the hashes and here, we will just use the information that we already have and store in the artifacts.

The CaptureArtifact method is only used when we are logging the artifacts of a result.

If hashes isn't enabled, analysisTargetToHashDataMap will be null, so we are using the null check to guarantee and prevent a null reference exception.

eddynaka · 2022-01-28T00:04:30Z

src/Sarif.Driver/Sdk/MultithreadedAnalyzeCommandBase.cs

@@ -485,10 +492,6 @@ private async Task<bool> HashAsync()
                            _hashToFilesMap[hashData.Sha256] = paths;
                        }

-                        _run?.GetFileIndex(new ArtifactLocation { Uri = context.TargetUri },
-                                           dataToInsert: _dataToInsert,
-                                           hashData: hashData);


instead of adding artifacts for all files, we are moving this to the place when we have an actual result.

The hash itself will already be in the context, so, no need to calculate it again.

eddynaka · 2022-01-28T00:05:52Z

src/Test.UnitTests.Sarif.Driver/Sdk/AnalyzeCommandBaseTests.cs

@@ -1258,21 +1258,93 @@ private void AnalyzeScenarios(int[] scenarios)
        }

        [Fact]
-        public void AnalyzeCommandBase_MultithreadedShouldUseCacheIfFilesAreTheSame()
+        public void AnalyzeCommandBase_Multithreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled()


AnalyzeCommandBase_Multithreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled

This test will guarantee that we are always producing the correct number of artifacts if results are emitted. If we dont emit results, artifacts should be null.

eddynaka · 2022-01-28T00:06:18Z

src/Test.UnitTests.Sarif.Driver/Sdk/AnalyzeCommandBaseTests.cs

+        }
+
+        [Fact]
+        public void AnalyzeCommmandBase_SingleThreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled()


AnalyzeCommmandBase_SingleThreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled

This test will guarantee that we are always producing the correct number of artifacts if results are emitted. If we dont emit results, artifacts should be null.

eddynaka · 2022-01-28T02:35:13Z

src/Test.UnitTests.Sarif/Writers/SarifLoggerTests.cs

+                                }
+                            }
+                        }
+                    };


we require a result with a location to see the artifacts.

eddynaka · 2022-01-28T02:35:39Z

src/Test.UnitTests.Sarif/Writers/SarifLoggerTests.cs

-                        levels: new List<FailureLevel> { FailureLevel.Warning, FailureLevel.Error },
-                        kinds: new List<ResultKind> { ResultKind.Fail }))
-                    {
-                    }


no results are being emitted, with that, no artifacts should be logged. This test was wrong.

eddynaka · 2022-01-31T23:26:36Z

This PR got replaced by this: #2433

Artifacts should only be logged in the SARIF if we have results

58c8205

eddynaka commented Jan 28, 2022

View reviewed changes

eddynaka added 4 commits January 27, 2022 16:19

Updating magic variables

65b4788

Updating tests

0407059

Fixing tests

2fcf491

Updating tests

2108cdb

eddynaka commented Jan 28, 2022

View reviewed changes

Optmizing logic

406a62c

eddynaka marked this pull request as ready for review January 28, 2022 02:42

eddynaka requested a review from michaelcfanning as a code owner January 28, 2022 02:42

eddynaka closed this Jan 31, 2022

eddynaka deleted the users/ednakamu/should-not-log-all-artifacts branch January 31, 2022 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artifacts should only be logged in the SARIF if we have results #2431

Artifacts should only be logged in the SARIF if we have results #2431

eddynaka commented Jan 27, 2022 •

edited

Loading

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022 •

edited

Loading

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022 •

edited

Loading

eddynaka Jan 28, 2022 •

edited

Loading

eddynaka Jan 28, 2022

eddynaka Jan 28, 2022

eddynaka commented Jan 31, 2022

Artifacts should only be logged in the SARIF if we have results #2431

Artifacts should only be logged in the SARIF if we have results #2431

Conversation

eddynaka commented Jan 27, 2022 • edited Loading

Description

Proposal

Tests

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

eddynaka Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka Jan 28, 2022

Choose a reason for hiding this comment

eddynaka commented Jan 31, 2022

eddynaka commented Jan 27, 2022 •

edited

Loading

eddynaka Jan 28, 2022 •

edited

Loading

eddynaka Jan 28, 2022 •

edited

Loading

eddynaka Jan 28, 2022 •

edited

Loading