Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifacts should only be logged in the SARIF if we have results #2431

Closed
wants to merge 6 commits into from

Conversation

eddynaka
Copy link
Collaborator

@eddynaka eddynaka commented Jan 27, 2022

Description

During some analysis, we saw that we are emitting an artifact even if we don't have a result. This is not good because it will make the SARIF gigantic in case you analyze hundreds/thousands of files.

Proposal

We will only emit an artifact if we have a result.

Tests

Created one 'file' of each level/kind and checked the SARIF output. It should only generate a list of artifacts equal to the locations you have in the results.

dataToInsert: dataToInsert,
encoding: encoding,
hashData: hashData);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is used by the single threaded analysis. When hashes is enabled, it was going to generate all hashes and store everything in the SARIF.

The issue is that if we analyze 1k files and we do not produce any results, we would still generate the artifacts.

As an optimization, we will only store the artifact if we have a result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all versions that we released and I saw that all versions after 1.4.2 are emitting the artifacts IF the list of analysisTargets is not empty.

Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other variables, looks like we tried to improve this behavior by adding the _persistArtifacts, but that is occurring too late (after we already added the artifact to the run)

@@ -447,12 +411,16 @@ private void CaptureArtifact(ArtifactLocation fileLocation)
catch (ArgumentException) { } // Unrecognized encoding name
}

HashData hashData = null;
AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisTargetToHashDataMap?.TryGetValue(fileLocation.Uri.OriginalString, out hashData);

For the single threaded analysis, the AnalysisTargetToHashDataMap will have the hashes and here, we will just use the information that we already have and store in the artifacts.

The CaptureArtifact method is only used when we are logging the artifacts of a result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hashes isn't enabled, analysisTargetToHashDataMap will be null, so we are using the null check to guarantee and prevent a null reference exception.

@@ -485,10 +492,6 @@ private async Task<bool> HashAsync()
_hashToFilesMap[hashData.Sha256] = paths;
}

_run?.GetFileIndex(new ArtifactLocation { Uri = context.TargetUri },
dataToInsert: _dataToInsert,
hashData: hashData);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding artifacts for all files, we are moving this to the place when we have an actual result.

The hash itself will already be in the context, so, no need to calculate it again.

@@ -1258,21 +1258,93 @@ private void AnalyzeScenarios(int[] scenarios)
}

[Fact]
public void AnalyzeCommandBase_MultithreadedShouldUseCacheIfFilesAreTheSame()
public void AnalyzeCommandBase_Multithreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled()
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeCommandBase_Multithreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled

This test will guarantee that we are always producing the correct number of artifacts if results are emitted. If we dont emit results, artifacts should be null.

}

[Fact]
public void AnalyzeCommmandBase_SingleThreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled()
Copy link
Collaborator Author

@eddynaka eddynaka Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeCommmandBase_SingleThreaded_ShouldOnlyLogArtifactsWhenHashesIsEnabled

This test will guarantee that we are always producing the correct number of artifacts if results are emitted. If we dont emit results, artifacts should be null.

}
}
}
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we require a result with a location to see the artifacts.

levels: new List<FailureLevel> { FailureLevel.Warning, FailureLevel.Error },
kinds: new List<ResultKind> { ResultKind.Fail }))
{
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no results are being emitted, with that, no artifacts should be logged. This test was wrong.

@eddynaka eddynaka marked this pull request as ready for review January 28, 2022 02:42
@eddynaka
Copy link
Collaborator Author

This PR got replaced by this: #2433

@eddynaka eddynaka closed this Jan 31, 2022
@eddynaka eddynaka deleted the users/ednakamu/should-not-log-all-artifacts branch January 31, 2022 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant