Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristics for matching packages to ARP after installing #2044

Merged
merged 41 commits into from
Apr 8, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
76a0548
Add type for ARP correlation algorithms
Mar 22, 2022
da36d8b
Add function to compute best match
Mar 23, 2022
d560e33
Add overal structure for tests
Mar 23, 2022
c0894f4
Record ARP product code after install
Mar 23, 2022
2ac0f75
Use correlation measures in post-install
Mar 24, 2022
bcfb37e
Add test cases
Mar 24, 2022
d2f4662
Add normalized name measure (very hacky...)
Mar 24, 2022
a3e9d49
Add edit distance measure
Mar 24, 2022
5574320
Cleanup data
Mar 24, 2022
6c4ec11
Add edit distance measure to tests
Mar 24, 2022
cb2b9c9
Spelling
Mar 24, 2022
fb8da65
Merge branch 'master' into matching
Mar 28, 2022
d0f4110
PR comments, cleanup & refactor
Mar 30, 2022
435d4c3
Report false matches in tests
Mar 30, 2022
58a3f29
Use FoldCase; remove edit distance weights
Mar 30, 2022
2cf7961
Cleanup test data
Mar 30, 2022
ab55bbc
Fix crashes; add logs
Apr 1, 2022
867f156
Put whole ARP entry in context
Apr 1, 2022
c01cfbb
Cleanup test data
Apr 1, 2022
49727ba
Update test logs
Apr 1, 2022
aa5afee
Use type in context
Apr 1, 2022
e75287d
Update test data
Apr 4, 2022
d58c1ef
Allow empty
Apr 5, 2022
abc38b3
Remove unused measure
Apr 6, 2022
b647a60
Reduce reporting
Apr 6, 2022
d2cc53c
Spelling
Apr 6, 2022
0e29cc4
Add empty heuristic override for ARP snapshot tests
Apr 6, 2022
7c43ebf
Hide test
Apr 6, 2022
891e678
Rename context data
Apr 8, 2022
561e21d
Refactor per PR comments; use UTF-32 for edit distance
Apr 8, 2022
70ae168
Expand test cases
Apr 8, 2022
35683f8
Remove duplicates in data
Apr 8, 2022
1087ec5
Copy code for publisher property
Apr 8, 2022
0520643
Use Publisher property in tests
Apr 8, 2022
95af4ce
Merge branch 'master' into matching
Apr 8, 2022
ce8b259
Resolve TODOs
Apr 8, 2022
314d2f1
Report time for correlation
Apr 8, 2022
b48b697
Do a single allocation for edit distance table
Apr 8, 2022
1ccd981
Spelling
Apr 8, 2022
b3c3332
Update src/AppInstallerCLITests/Correlation.cpp
lechacon Apr 8, 2022
3f49a78
Use steady_clock
Apr 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/actions/spelling/excludes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ ignore$
^Localization/
^NOTICE$
^src/AppInstallerCLICore/Commands/ExperimentalCommand\.cpp$
^src/AppInstallerCLITests/TestData/InputARPData.txt$
^src/AppInstallerCLITests/TestData/InputNames.txt$
^src/AppInstallerCLITests/TestData/InputPublishers.txt$
^src/AppInstallerCLITests/TestData/NormalizationInitialIds.txt$
Expand Down
7 changes: 7 additions & 0 deletions src/AppInstallerCLICore/ExecutionContextData.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ namespace AppInstaller::CLI::Execution
// On import: Sources for the imported packages
Sources,
ARPSnapshot,
ProductCodeFromARP,
florelis marked this conversation as resolved.
Show resolved Hide resolved
Dependencies,
DependencySource,
AllowedArchitectures,
Expand Down Expand Up @@ -190,6 +191,12 @@ namespace AppInstaller::CLI::Execution
using value_t = std::vector<std::tuple<Utility::LocIndString, Utility::LocIndString, Utility::LocIndString>>;
};

template <>
struct DataMapping<Data::ProductCodeFromARP>
{
using value_t = Utility::LocIndString;
};

template <>
struct DataMapping<Data::Dependencies>
{
Expand Down
243 changes: 108 additions & 135 deletions src/AppInstallerCLICore/Workflows/InstallFlow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "WorkflowBase.h"
#include "Workflows/DependenciesFlow.h"
#include <AppInstallerDeployment.h>
#include <winget/ARPCorrelation.h>

using namespace winrt::Windows::ApplicationModel::Store::Preview::InstallControl;
using namespace winrt::Windows::Foundation;
Expand Down Expand Up @@ -506,164 +507,125 @@ namespace AppInstaller::CLI::Workflow

void ReportARPChanges(Execution::Context& context) try
{
if (context.Contains(Execution::Data::ARPSnapshot))
if (!context.Contains(Execution::Data::ARPSnapshot))
{
const auto& entries = context.Get<Execution::Data::ARPSnapshot>();

// Open it again to get the (potentially) changed ARP entries
Source arpSource = context.Reporter.ExecuteWithProgress(
[](IProgressCallback& progress)
{
Repository::Source result = Repository::Source(PredefinedSource::ARP);
result.Open(progress);
return result;
}, true);

std::vector<ResultMatch> changes;
return;
}

for (auto& entry : arpSource.Search({}).Matches)
// Open the ARP source again to get the (potentially) changed ARP entries
Source arpSource = context.Reporter.ExecuteWithProgress(
[](IProgressCallback& progress)
{
auto installed = entry.Package->GetInstalledVersion();

if (installed)
{
auto entryKey = std::make_tuple(
entry.Package->GetProperty(PackageProperty::Id),
installed->GetProperty(PackageVersionProperty::Version),
installed->GetProperty(PackageVersionProperty::Channel));

auto itr = std::lower_bound(entries.begin(), entries.end(), entryKey);
if (itr == entries.end() || *itr != entryKey)
{
changes.emplace_back(std::move(entry));
}
}
}
Repository::Source result = Repository::Source(PredefinedSource::ARP);
result.Open(progress);
return result;
}, true);

// Also attempt to find the entry based on the manifest data
const auto& manifest = context.Get<Execution::Data::Manifest>();

SearchRequest nameAndPublisherRequest;
const auto& manifest = context.Get<Execution::Data::Manifest>();

// The default localization must contain the name or we cannot do this lookup
if (manifest.DefaultLocalization.Contains(Localization::PackageName))
// Try finding the package by product code in ARP.
// If we can find it now, we will be able to find it again later
// so we don't need to do anything else here.
SearchRequest productCodeSearchRequest;
std::vector<std::string> productCodes;
for (const auto& installer : manifest.Installers)
{
if (!installer.ProductCode.empty())
{
AppInstaller::Manifest::Manifest::string_t defaultName = manifest.DefaultLocalization.Get<Localization::PackageName>();
AppInstaller::Manifest::Manifest::string_t defaultPublisher;
if (manifest.DefaultLocalization.Contains(Localization::Publisher))
if (std::find(productCodes.begin(), productCodes.end(), installer.ProductCode) == productCodes.end())
{
defaultPublisher = manifest.DefaultLocalization.Get<Localization::Publisher>();
productCodeSearchRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::ProductCode, MatchType::Exact, installer.ProductCode));
productCodes.emplace_back(installer.ProductCode);
}
}
}

nameAndPublisherRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::NormalizedNameAndPublisher, MatchType::Exact, defaultName, defaultPublisher));
SearchResult arpFoundByProductCode;

for (const auto& loc : manifest.Localizations)
{
if (loc.Contains(Localization::PackageName) || loc.Contains(Localization::Publisher))
{
nameAndPublisherRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::NormalizedNameAndPublisher, MatchType::Exact,
loc.Contains(Localization::PackageName) ? loc.Get<Localization::PackageName>() : defaultName,
loc.Contains(Localization::Publisher) ? loc.Get<Localization::Publisher>() : defaultPublisher));
}
}
}
// Don't execute this search if it would just find everything
if (!productCodeSearchRequest.IsForEverything())
{
arpFoundByProductCode = arpSource.Search(productCodeSearchRequest);
}

std::vector<std::string> productCodes;
for (const auto& installer : manifest.Installers)
{
if (!installer.ProductCode.empty())
{
if (std::find(productCodes.begin(), productCodes.end(), installer.ProductCode) == productCodes.end())
{
nameAndPublisherRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::ProductCode, MatchType::Exact, installer.ProductCode));
productCodes.emplace_back(installer.ProductCode);
}
}
}
if (!arpFoundByProductCode.Matches.empty())
{
// TODO: Would we want to report changes in this case?
AICLI_LOG(CLI, Info, << "Installed package can be found in ARP by Product Code");
return;
}

SearchResult findByManifest;
// The product codes were not enough to find the package.
// We need to run some heuristics to try and match it with some ARP entry.

// Don't execute this search if it would just find everything
if (!nameAndPublisherRequest.IsForEverything())
{
findByManifest = arpSource.Search(nameAndPublisherRequest);
}
// First format the ARP data appropriately for the heuristic search
std::vector<Correlation::ARPEntry> arpEntries;

// Cross reference the changes with the search results
std::vector<std::shared_ptr<IPackage>> packagesInBoth;
size_t changedCount = 0;
const auto& arpSnapshot = context.Get<Execution::Data::ARPSnapshot>();
for (auto& entry : arpSource.Search({}).Matches)
{
auto installed = entry.Package->GetInstalledVersion();

for (const auto& change : changes)
if (installed)
{
for (const auto& byManifest : findByManifest.Matches)
// Compare with the previous snapshot to see if it changed.
auto entryKey = std::make_tuple(
entry.Package->GetProperty(PackageProperty::Id),
installed->GetProperty(PackageVersionProperty::Version),
installed->GetProperty(PackageVersionProperty::Channel));

auto itr = std::lower_bound(arpSnapshot.begin(), arpSnapshot.end(), entryKey);
bool isNewOrUpdated = (itr == arpSnapshot.end() || *itr != entryKey);
if (isNewOrUpdated)
{
if (change.Package->IsSame(byManifest.Package.get()))
{
packagesInBoth.emplace_back(change.Package);
break;
}
++changedCount;
}
}

// We now have all of the package changes; time to report them.
// The set of cases we could have for changes to ARP:
// 0 packages :: No changes were detected to ARP, which could mean that the installer
// did not write an entry. It could also be a forced reinstall.
// 1 package :: Golden path; this should be what we installed.
// 2+ packages :: We need to determine which package actually matches the one that we
// were installing.
//
// The set of cases we could have for finding packages based on the manifest:
// 0 packages :: The manifest data does not match the ARP information.
// 1 package :: Golden path; this should be what we installed.
// 2+ packages :: The data in the manifest is either too broad or we have
// a problem with our name normalization.

// Find the package that we are going to log
std::shared_ptr<IPackageVersion> toLog;

// If there is only a single common package (changed and matches), it is almost certainly the correct one.
if (packagesInBoth.size() == 1)
{
toLog = packagesInBoth[0]->GetInstalledVersion();
}
// If it wasn't changed but we still find a match, that is the best thing to report.
else if (findByManifest.Matches.size() == 1)
{
toLog = findByManifest.Matches[0].Package->GetInstalledVersion();
}
// If only a single ARP entry was changed and we found no matches, report that.
else if (findByManifest.Matches.empty() && changes.size() == 1)
{
toLog = changes[0].Package->GetInstalledVersion();
arpEntries.emplace_back(installed, isNewOrUpdated);
}
}

IPackageVersion::Metadata toLogMetadata;
if (toLog)
{
toLogMetadata = toLog->GetMetadata();
}
// Find the best match
const auto& correlationMeasure = Correlation::ARPCorrelationMeasure::GetInstance();
auto arpEntry = correlationMeasure.GetBestMatchForManifest(manifest, arpEntries)
->Entry; // TODO: Fix; this was modified as a hack for the tests...

// We can only get the source identifier from an active source
std::string sourceIdentifier;
if (context.Contains(Execution::Data::PackageVersion))
{
sourceIdentifier = context.Get<Execution::Data::PackageVersion>()->GetProperty(PackageVersionProperty::SourceIdentifier);
}
IPackageVersion::Metadata arpEntryMetadata;
if (arpEntry)
{
arpEntryMetadata = arpEntry->GetMetadata();
}

Logging::Telemetry().LogSuccessfulInstallARPChange(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this telemetry event get moved somewhere else? It should still be done in this function when one is found rather than being done in the helper method that could be used for other purposes.

That might mean changing the output of the helper to return additional information, although the count fields in this event are less meaningful with different algorithms. But we could still calculate the number of changes, how many manifests were above the threshold, and how many of those were changed as the values used here, in that order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had moved it down to the function doing the correlation; but now it's back here. I changed the helper to return the count of changes/matches, although I'm keeping that count to only consider the exact matches from the source search as I couldn't figure out a good way to keep the count consistent across the multiple "passes".

Do you have any ideas how to count the matching manifests when sometimes we use the exact matching and sometimes the confidence measures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can reason about the meaning, anything is fine. It can stay using the exact same values as before, just with a better guess. I thought we might use these values in some way to find things that weren't correlating, but it turned out to be very easy to find them 😉

So basically, these numbers are probably not important. Don't spend time trying to improve them, and if you think they are broken, we might consider just reporting 0 for all of them.

sourceIdentifier,
manifest.Id,
manifest.Version,
manifest.Channel,
changes.size(),
findByManifest.Matches.size(),
packagesInBoth.size(),
toLog ? static_cast<std::string>(toLog->GetProperty(PackageVersionProperty::Name)) : "",
toLog ? static_cast<std::string>(toLog->GetProperty(PackageVersionProperty::Version)) : "",
toLog ? static_cast<std::string_view>(toLogMetadata[PackageVersionMetadata::Publisher]) : "",
toLog ? static_cast<std::string_view>(toLogMetadata[PackageVersionMetadata::InstalledLocale]) : ""
);
// We can only get the source identifier from an active source
std::string sourceIdentifier;
if (context.Contains(Execution::Data::PackageVersion))
{
sourceIdentifier = context.Get<Execution::Data::PackageVersion>()->GetProperty(PackageVersionProperty::SourceIdentifier);
}

// Store the ARP entry found to match the package to record it in the tracking catalog later
if (arpEntry)
{
// We use the product code as the ID in the ARP source.
context.Add<Data::ProductCodeFromARP>(arpEntry->GetProperty(PackageVersionProperty::Id));
}

// TODO: Revisit removed checks

Logging::Telemetry().LogSuccessfulInstallARPChange(
sourceIdentifier,
manifest.Id,
manifest.Version,
manifest.Channel,
changedCount,
0, // TODO findByManifest.Matches.size(),
0, // TODO packagesInBoth.size(),
arpEntry ? static_cast<std::string>(arpEntry->GetProperty(PackageVersionProperty::Name)) : "",
arpEntry ? static_cast<std::string>(arpEntry->GetProperty(PackageVersionProperty::Version)) : "",
arpEntry ? static_cast<std::string_view>(arpEntryMetadata[PackageVersionMetadata::Publisher]) : "",
arpEntry ? static_cast<std::string_view>(arpEntryMetadata[PackageVersionMetadata::InstalledLocale]) : ""
);
}
CATCH_LOG();

Expand All @@ -677,10 +639,21 @@ namespace AppInstaller::CLI::Workflow
return;
}

auto manifest = context.Get<Data::Manifest>();

// If we have determined an ARP entry matches the installed package,
// we set its product code in the manifest we record to ensure we can
// find it in the future.
// Note that this may overwrite existing information.
if (context.Contains(Data::ProductCodeFromARP))
{
manifest.DefaultInstallerInfo.ProductCode = context.Get<Data::ProductCodeFromARP>().get();
}

auto trackingCatalog = context.Get<Data::PackageVersion>()->GetSource().GetTrackingCatalog();

trackingCatalog.RecordInstall(
context.Get<Data::Manifest>(),
manifest,
context.Get<Data::Installer>().value(),
WI_IsFlagSet(context.GetFlags(), ContextFlag::InstallerExecutionUseUpdate));
}
Expand Down
7 changes: 4 additions & 3 deletions src/AppInstallerCLICore/Workflows/InstallFlow.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,15 +167,16 @@ namespace AppInstaller::CLI::Workflow
// Outputs: ARPSnapshot
void SnapshotARPEntries(Execution::Context& context);

// Reports on the changes between the stored ARPSnapshot and the current values.
// Reports on the changes between the stored ARPSnapshot and the current values,
// and stores the product code of the ARP entry found for the package.
// Required Args: None
// Inputs: ARPSnapshot?, Manifest, PackageVersion
// Outputs: None
// Outputs: ProductCodeFromARP?
void ReportARPChanges(Execution::Context& context);

// Records the installation to the tracking catalog.
// Required Args: None
// Inputs: PackageVersion?, Manifest, Installer
// Inputs: PackageVersion?, Manifest, Installer, ProductCodeFromARP?
// Outputs: None
void RecordInstall(Execution::Context& context);
}
4 changes: 4 additions & 0 deletions src/AppInstallerCLITests/AppInstallerCLITests.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@
<ClCompile Include="Command.cpp" />
<ClCompile Include="Completion.cpp" />
<ClCompile Include="CompositeSource.cpp" />
<ClCompile Include="Correlation.cpp" />
<ClCompile Include="CustomHeader.cpp" />
<ClCompile Include="Dependencies.cpp" />
<ClCompile Include="Downloader.cpp" />
Expand Down Expand Up @@ -542,6 +543,9 @@
<CopyFileToFolders Include="TestData\InputPublishers.txt">
<DeploymentContent>true</DeploymentContent>
</CopyFileToFolders>
<CopyFileToFolders Include="TestData\InputARPData.txt">
<DeploymentContent>true</DeploymentContent>
</CopyFileToFolders>
<CopyFileToFolders Include="TestData\NormalizationInitialIds.txt">
<DeploymentContent>true</DeploymentContent>
</CopyFileToFolders>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,9 @@
<ClCompile Include="PackageTrackingCatalog.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="Correlation.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<None Include="PropertySheet.props" />
Expand Down Expand Up @@ -555,5 +558,8 @@
<CopyFileToFolders Include="TestData\Installer_Exe_DependenciesMultideclaration.yaml">
<Filter>TestData</Filter>
</CopyFileToFolders>
<CopyFileToFolders Include="TestData\InputARPData.txt">
<Filter>TestData</Filter>
</CopyFileToFolders>
</ItemGroup>
</Project>
Loading