-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve correlation by keeping arch info declared in manifest arp DisplayName entry #3100
Changes from 4 commits
fd08bb9
e9b89a2
93d88fd
754cc9c
055c6e7
6746056
f846d02
c9a0a82
dba1b71
0e6da40
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
PackageIdentifier: AppInstallerTest.TestMappingWithArchitectureX64 | ||
PackageVersion: 1.0 | ||
PackageName: TestMappingWithArchitecture | ||
PackageLocale: en-US | ||
Publisher: Microsoft | ||
License: Test | ||
ShortDescription: E2E test for mapping with architecture. | ||
Installers: | ||
- Architecture: x64 | ||
InstallerUrl: https://localhost:5001/TestKit/AppInstallerTestExeInstaller/AppInstallerTestExeInstaller.exe | ||
InstallerType: exe | ||
InstallerSha256: <EXEHASH> | ||
AppsAndFeaturesEntries: | ||
- DisplayName: "TestMappingWithArchitecture(X64)" | ||
InstallerSwitches: | ||
Custom: '/DisplayName TestMappingWithArchitecture(X64) /ProductID {0e426f01-b682-4e67-a357-52f9ecb4590d}' | ||
ManifestType: singleton | ||
ManifestVersion: 1.2.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
PackageIdentifier: AppInstallerTest.TestMappingWithArchitectureX86 | ||
PackageVersion: 1.0 | ||
PackageName: TestMappingWithArchitecture | ||
PackageLocale: en-US | ||
Publisher: Microsoft | ||
License: Test | ||
ShortDescription: E2E test for mapping with architecture. | ||
Installers: | ||
- Architecture: x86 | ||
InstallerUrl: https://localhost:5001/TestKit/AppInstallerTestExeInstaller/AppInstallerTestExeInstaller.exe | ||
InstallerType: exe | ||
InstallerSha256: <EXEHASH> | ||
AppsAndFeaturesEntries: | ||
- DisplayName: "TestMappingWithArchitecture(X86)" | ||
InstallerSwitches: | ||
Custom: '/DisplayName TestMappingWithArchitecture(X86) /ProductID {0e426f01-b682-4e67-a357-52f9ecb4590d}' | ||
ManifestType: singleton | ||
ManifestVersion: 1.2.0 |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if this is part of a larger problem where we are taking the first match rather than treating multiple matches as an issue (and thus resulting in no matches, with some logging of course). Also, some analysis that I hadn't thought is to look at how many normalized name and publisher values (or any of the system reference string cohort) map onto multiple package ids. Any time that is happening is likely cause for concern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's 153 unique ID_Name_Map_DuplicatesOnly_2023-03-29.csv If you look at the mapping where different package id's have duplicate name AND publisher, there's 339 id's that have duplication. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
#include "CompositeSource.h" | ||
#include "Microsoft/PinningIndex.h" | ||
#include <winget/ExperimentalFeature.h> | ||
#include <winget/NameNormalization.h> | ||
|
||
using namespace AppInstaller::Repository::Microsoft; | ||
using namespace AppInstaller::Settings; | ||
|
@@ -842,7 +843,15 @@ namespace AppInstaller::Repository | |
// Data relevant to correlation for a package. | ||
struct PackageData | ||
{ | ||
// ProductCode, PackageFamilyName, etc | ||
std::set<SystemReferenceString> SystemReferenceStrings; | ||
// Normalized Publisher and Names, with NormalizationField info for creating filtered search request. | ||
std::map<SystemReferenceString, Utility::NormalizationField> NameAndPublishers; | ||
|
||
bool Empty() | ||
{ | ||
return SystemReferenceStrings.empty() && NameAndPublishers.empty(); | ||
} | ||
|
||
void AddIfNotPresent(SystemReferenceString&& srs) | ||
{ | ||
|
@@ -852,18 +861,52 @@ namespace AppInstaller::Repository | |
} | ||
} | ||
|
||
SearchRequest CreateInclusionsSearchRequest() const | ||
void AddNameAndPublisherIfNotPresent(SystemReferenceString&& srs, Utility::NormalizationField normalizationField) | ||
{ | ||
if (NameAndPublishers.find(srs) == NameAndPublishers.end()) | ||
{ | ||
NameAndPublishers.emplace(std::move(srs), normalizationField); | ||
} | ||
} | ||
|
||
// If filter is std::nullopt, all values will be included. | ||
SearchRequest CreateInclusionsSearchRequest(std::optional<Utility::NormalizationField> filter = std::nullopt) const | ||
{ | ||
SearchRequest result; | ||
for (const auto& srs : SystemReferenceStrings) | ||
{ | ||
srs.AddToFilters(result.Inclusions); | ||
} | ||
for (const auto& nameAndPublisher : NameAndPublishers) | ||
{ | ||
// We specifically use direct compare of NormalizationField here since most likely in the future, | ||
// The matching logic will try narrowing down list of strings step by step. | ||
// For example, try Name|Arch|Locale, then Name|Arch, then Name. | ||
if (filter.has_value() && nameAndPublisher.second != filter.value()) | ||
{ | ||
continue; | ||
} | ||
nameAndPublisher.first.AddToFilters(result.Inclusions); | ||
} | ||
return result; | ||
} | ||
|
||
bool PackageNamesContainNormalizationField(Utility::NormalizationField filter) | ||
{ | ||
for (const auto& nameAndPublisher : NameAndPublishers) | ||
{ | ||
if (nameAndPublisher.second == filter) | ||
{ | ||
return true;; | ||
} | ||
} | ||
|
||
return false; | ||
} | ||
}; | ||
|
||
// For a given package version, prepares the results for it. | ||
// For a given package version, prepares the system reference strings for correlation. | ||
// Excluding normalized publisher and name. | ||
PackageData GetSystemReferenceStrings(IPackageVersion* version) | ||
{ | ||
PackageData result; | ||
|
@@ -1054,14 +1097,17 @@ namespace AppInstaller::Repository | |
auto names = installedVersion->GetMultiProperty(PackageVersionMultiProperty::Name); | ||
auto publishers = installedVersion->GetMultiProperty(PackageVersionMultiProperty::Publisher); | ||
|
||
Utility::NameNormalizer normer(Utility::NormalizationVersion::Initial); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Different versions of the index will use different versions of the normalization; I don't think that this is future proof. The goal was that from the outside, normalization was known to be used, but not understood. I don't think that this level of detail is a workable solution for an arbitrary source to implement. |
||
|
||
for (size_t i = 0; i < names.size(); ++i) | ||
{ | ||
auto normalizationField = normer.NormalizeName(names[i]).GetNormalizedFields(); | ||
for (size_t j = 0; j < publishers.size(); ++j) | ||
{ | ||
data.AddIfNotPresent(SystemReferenceString{ | ||
data.AddNameAndPublisherIfNotPresent(SystemReferenceString{ | ||
PackageMatchField::NormalizedNameAndPublisher, | ||
names[i], | ||
publishers[j] }); | ||
publishers[j] }, normalizationField); | ||
} | ||
} | ||
} | ||
|
@@ -1218,11 +1264,8 @@ namespace AppInstaller::Repository | |
|
||
auto installedPackageData = result.GetSystemReferenceStrings(installedVersion.get()); | ||
|
||
// Create a search request to run against all available sources | ||
if (!installedPackageData.SystemReferenceStrings.empty()) | ||
auto tryAddAvailablePackage = [&](const SearchRequest& systemReferenceSearch) -> bool | ||
{ | ||
SearchRequest systemReferenceSearch = installedPackageData.CreateInclusionsSearchRequest(); | ||
|
||
Source trackedSource; | ||
std::shared_ptr<IPackage> trackingPackage; | ||
std::shared_ptr<IPackageVersion> trackingPackageVersion; | ||
|
@@ -1306,6 +1349,31 @@ namespace AppInstaller::Repository | |
addedAvailablePackage = true; | ||
compositePackage->AddAvailablePackage(std::move(availablePackage)); | ||
} | ||
|
||
return addedAvailablePackage; | ||
}; | ||
|
||
// Create a search request to run against all available sources | ||
if (!installedPackageData.Empty()) | ||
{ | ||
// For installed package to available package mapping, | ||
// try the search with NormalizedName with Arch first, if not found, try with all values. | ||
// This can be extended in the future for more granular search requests. | ||
std::vector<SearchRequest> candidateSearches; | ||
if (installedPackageData.PackageNamesContainNormalizationField(Utility::NormalizationField::Architecture)) | ||
{ | ||
candidateSearches.emplace_back(installedPackageData.CreateInclusionsSearchRequest(Utility::NormalizationField::Architecture)); | ||
} | ||
candidateSearches.emplace_back(installedPackageData.CreateInclusionsSearchRequest()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand why we need to search for both in any scenario; it seems like it could lead to the same problem, just in the less likely reverse case. If winget-pkgs only had one architecture version, and we had the other installed, we would get a mismatch. The reason to leave the non-architectured index entry is for older clients. If we find architecture, we should only search for the one with architecture. This does mean that we need to roll out the updated index before the updated client of course. This could be done by putting the search behavior behind an experimental feature for now, while leaving the index construction to include both. Once the index update is on the server, we can try it out with the experimental feature and then make it non-experimental if it is working. |
||
|
||
for (const auto& candidateSearch : candidateSearches) | ||
{ | ||
if (tryAddAvailablePackage(candidateSearch)) | ||
{ | ||
// If found available package, break. Otherwise try next search request. | ||
break; | ||
} | ||
} | ||
} | ||
|
||
// Move the installed result into the composite result | ||
|
@@ -1340,10 +1408,15 @@ namespace AppInstaller::Repository | |
|
||
// If no package was found that was already in the results, do a correlation lookup with the installed | ||
// source to create a new composite package entry if we find any packages there. | ||
if (packageData && !packageData->SystemReferenceStrings.empty()) | ||
if (packageData && !packageData->Empty()) | ||
{ | ||
// Create a search request to run against the installed source | ||
SearchRequest systemReferenceSearch = packageData->CreateInclusionsSearchRequest(); | ||
// For available package to installed package mapping, only one try is needed. | ||
// For example, if arp DisplayName contains arch, then the local packages's arp DisplayName should also include arch. | ||
SearchRequest systemReferenceSearch = | ||
packageData->PackageNamesContainNormalizationField(Utility::NormalizationField::Architecture) ? | ||
packageData->CreateInclusionsSearchRequest(Utility::NormalizationField::Architecture) : | ||
packageData->CreateInclusionsSearchRequest(); | ||
|
||
// Correlate against installed (allow exceptions out as we own the installed source) | ||
SearchResult installedCrossRef = m_installedSource.Search(systemReferenceSearch); | ||
|
@@ -1388,10 +1461,15 @@ namespace AppInstaller::Repository | |
// If no package was found that was already in the results, do a correlation lookup with the installed | ||
// source to create a new composite package entry if we find any packages there. | ||
bool foundInstalledMatch = false; | ||
if (packageData && !packageData->SystemReferenceStrings.empty()) | ||
if (packageData && !packageData->Empty()) | ||
{ | ||
// Create a search request to run against the installed source | ||
SearchRequest systemReferenceSearch = packageData->CreateInclusionsSearchRequest(); | ||
// For available package to installed package mapping, only one try is needed. | ||
// For example, if arp DisplayName contains arch, then the local packages's arp DisplayName should also include arch. | ||
SearchRequest systemReferenceSearch = | ||
packageData->PackageNamesContainNormalizationField(Utility::NormalizationField::Architecture) ? | ||
packageData->CreateInclusionsSearchRequest(Utility::NormalizationField::Architecture) : | ||
packageData->CreateInclusionsSearchRequest(); | ||
|
||
// Correlate against installed (allow exceptions out as we own the installed source) | ||
SearchResult installedCrossRef = m_installedSource.Search(systemReferenceSearch); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about for packages that just include their architecture and not the parens? Or where it isn't just the arch to string? Won't this still cause issues? I feel like where DisplayName is included in the manifest, the normalization shouldn't occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our normalization code uses a regex which should take care of different formats of arch representations. x64, (x64), 64bit etc.
The reason we still need to do normalization on DisplayName comes from various considerations:
According to our brief analysis, the arch is the main source of false correlation. And it makes sense to keep arch info specifically. We could expand to include locale, or even version if we decide to in the future. But each expansion will come with the cost of making index multiple times larger, and less correlation success if the installed version is not in the repo. Unless we redesign the index structure in future winget v2 (if winget v2 is a thing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; thanks for the detailed explanation!