Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse chunks from the same download if duplicated #683

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Roman513
Copy link

@Roman513 Roman513 commented Dec 23, 2024

Drastically decrease memory usage

Usually we do not need even 1Gb of shared memory with this patch.

Closes #17

Drastically decrease memory footprint
@derrod
Copy link
Owner

derrod commented Dec 24, 2024

I've not done a full review of this but I think at the very least it should be an option like reordering (ideally also in the config so you don't have to specify it each time).

@derrod
Copy link
Owner

derrod commented Dec 25, 2024

It should be opt-in, not opt-out, like reordering.

@Roman513
Copy link
Author

Roman513 commented Dec 25, 2024

I think at the very least it should be an option like reordering (ideally also in the config so you don't have to specify it each time).

I've created an option to disable this behavior. This feature functions more like reusing chunks during patching (which is enabled by default) than reordering. It is simply an adaptation of the same algorithm - just scroll a few lines up in the analysis function to find it.

It should be opt-in, not opt-out, like reordering.

Reordering is likely disabled by default because it is very time-consuming with questionable benefits. However, this option is not as resource-intensive. I conducted measurements on an extremely large project (~860 GB), and it took about 1.5 seconds on my machine. For normal projects, it typically takes around 0.0-0.1 seconds. In contrast, reordering calculations take about 4 minutes.

python legendary/cli.py install fb67ca3f5de74da880ab96bdc668118d
[Core] INFO: Trying to re-use existing login session...
[cli] INFO: Preparing download for "ARK DevKit" (fb67ca3f5de74da880ab96bdc668118d)...
...
[DLM] DEBUG: 666373 added files
[DLM] DEBUG: Disk space delta: 863984.23 MiB

# Without enable-reordering

[DLM] DEBUG: Final cache size requirement: 10274.0 MiB.

# With enable-reordering, 'Manifest contains too many files' block was commented out to test this

[DLM] DEBUG: Processing optimizations took 244.0 seconds.
[DLM] DEBUG: Final cache size requirement: 390.0 MiB.

# With read from files enabled, temporary added time calculation to test this

[DLM] DEBUG: Analyzing manifest for re-usable chunks in saved files...
[DLM] DEBUG: Processing read from saved files took 1.5 seconds.
[DLM] DEBUG: Final cache size requirement: 81.0 MiB.

@derrod
Copy link
Owner

derrod commented Dec 25, 2024

The entire system was designed around minismising disk I/O. I'm not going to merge this if it's on by default.

README.md Show resolved Hide resolved
@Roman513 Roman513 force-pushed the reuse-chunks branch 2 times, most recently from 7866399 to 652b78b Compare December 25, 2024 13:04
@Roman513
Copy link
Author

Roman513 commented Dec 25, 2024

The entire system was designed around minismising disk I/O. I'm not going to merge this if it's on by default.

From my perspective, this results in minimal I/O and primarily affects highly duplicated projects. The bigger issue is the enormous memory consumption. With this option, chunks remain in shared memory if we reuse different parts of the same chunk.

Additionally, I noticed that the actual memory consumption reported by the CLI during downloads is higher than the calculated values. This means that even if there seems to be enough memory based on calculations, the system could still hang. Therefore, keeping parts in memory for such edge cases seems like the wrong approach to me.

As a compromise, I've added an auto-retry mechanism for memory errors. Does this work for you?

@derrod
Copy link
Owner

derrod commented Dec 25, 2024

From my perspective, this results in minimal I/O and primarily affects highly duplicated projects. The bigger issue is the enormous memory consumption. Chunks remain in shared memory if we reuse different parts of the same chunk multiple times.

It's only an issue for like 2 games where the devs did something stupid. It's not worth compromising for. According to the Steam hardware survey 16 and 32 GB are the most common configurations, so memory isn't really an issue.

Additionally, I noticed that the actual memory consumption reported by the CLI during downloads is higher than the calculated values. This means that even if there seems to be enough memory based on calculations, the system could still hang. Therefore, keeping parts in memory for such edge cases seems like the wrong approach to me.

The size calculated is only the minimum, not a limit. If the configured cache size is larger it'll still be used to the full extent if data isn't flushed to disk quickly enough.

As a compromise, I've added an auto-retry mechanism for memory errors. Does this work for you?

No, but I do want this to be a config option. But I can do that whenever I get around to cleaning up the commits and merging the PR.

@Roman513
Copy link
Author

Roman513 commented Dec 25, 2024

It's only an issue for like 2 games where the devs did something stupid. It's not worth compromising for. According to the Steam hardware survey 16 and 32 GB are the most common configurations, so memory isn't really an issue.

And I/O is, so badly? I guess people more eager to use such enormous amount of memory for something else. If we have really 2 games like this, fallback will almost never happen because it is enough shared memory. But it is not, because it became so annoying and even Heroic devs added fallback with re-running legendary cli with auto-increasing shared memory each time if we have memory error with default shared memory amount, which leads to manifests re-downloading and much slower.

https://github.com/Heroic-Games-Launcher/HeroicGamesLauncher/blob/395ffa73dd8e96f976e48092f1ee63ce30cdabd8/src/backend/storeManagers/legendary/games.ts#L637C1-L645C4

No, but I do want this to be a config option.

Fallback will improve user experience because users mostly use legendary with frontends and even are not able to see such errors. But even CLI users could suffer. Are you sure we want users to figure out which workaround from 3 available (increasing max memory, re-using files, or reordering) they need to apply? If this one surely works fast, why it cannot do a fallback in such cases? Also, not having a fallback will put some work to frontend developers to implement workarounds on their side.

Did I convince you to a fallback, or you still have arguments against it?

If not, please provide requirements for error messages to help users with choosing workaround, for config options (per game/global/etc/naming), and for CLI options naming if you aren't good with mine.

UPD: I've added global config option.

cleaning up the commits

You mean squashing commits into the one? I can do this then we'll come to final decision, or GitHub can do this for your on merging. Just let me know. I want this to be finished and do not stuck in unmerged forever.

@Roman513
Copy link
Author

Roman513 commented Jan 2, 2025

@derrod Just a reminder. Cold you please review the last comment and let me know what is left to get this merged? BTW we got a report about another game with the same issue. Thanks!

@derrod
Copy link
Owner

derrod commented Jan 2, 2025

@derrod Just a reminder. Cold you please review the last comment and let me know what is left to get this merged? BTW we got a report about another game with the same issue. Thanks!

Just need to find some time to re-review it, last week's been busy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement on-disk cache during download
2 participants