Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the size of the index #3666

Merged
merged 4 commits into from
Sep 25, 2023
Merged

Conversation

JohnMcPMS
Copy link
Member

@JohnMcPMS JohnMcPMS commented Sep 25, 2023

Change

This change seeks to reduce the size of the index in two ways:

  1. Better schema design for the 1:N mapping tables
  2. Dropping some of the mapping data that is not particularly interesting per manifest (aka per version)

Better schema design

This is achieved by the map table having no rowid and using a primary key with the value first. This makes the table already sorted by the value, thus the reverse lookups are fast. It also drops a fair amount of the data in the table itself to remove the rowid, given that it was ~1/3 of the rows.

Dropping map data

We don't actually use the fact that we know that different versions have different tags (or any other data). Thus, we can simply have one manifest entry per package identifier have all of the values and maintain the same functionality. There is a slight loss of fidelity if one is reading through the values via API, but this is deemed acceptable given the large data savings. I explicitly left the product codes alone, as this does have value to keep per version (even if we are not using it currently).

Size comparison

State Size (bytes) Percentage Delta (relative)
Original 18141184 100%
Better schema 12673024 70% -30%
Map folding 9256960 51% -19% (-27%)
File name shortened 8306688 45.8% -5.2% (-10.3%)

As a bonus, we plan to also shorten the file names of the manifests, but this is a service only change.

Validation

Tests are added to verify the behavior is as expected for the various folded data.
The regression tests should help ensure that the schema changes are not functionally impactful.

Microsoft Reviewers: Open in CodeFlow

@JohnMcPMS JohnMcPMS requested a review from a team as a code owner September 25, 2023 00:19
Schema::Version latestVersion = SQLiteIndex::GetLatestVersion();
if (latestVersion.MajorVersion != 1)
{
throw std::exception("You added major version 2, figure out how to deal with these tests that do back compat coverage!");

This comment was marked as resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And most likely I'm the one who will be greeted by that message 😄

@JohnMcPMS JohnMcPMS merged commit 657d33c into microsoft:master Sep 25, 2023
@JohnMcPMS JohnMcPMS deleted the index-reduce branch September 25, 2023 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants