Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test plan item for large files #30243

Closed
2 tasks done
alexdima opened this issue Jul 7, 2017 · 4 comments
Closed
2 tasks done

Test plan item for large files #30243

alexdima opened this issue Jul 7, 2017 · 4 comments

Comments

@alexdima
Copy link
Member

alexdima commented Jul 7, 2017

Test plan item for #30180:

fyi @egamma

Note to testers: I am not really interested in various ways that you can cause an OOM crash :)

Files will be considered large if they are above 30MB or above 300K lines. Such large files will never get tokenization, wrapping, indent guides, nor folding. Also, they will not travel to the extension host, nor to the web worker, so no diffing, quick diffing, link detection or word based suggestions. <-- parts of this is new

Otherwise, files that are larger than 5MB do not travel to the extension host, but otherwise use the regular file path (i.e. have tokenization, wrapping, etc. etc.) <-- this is not new.

  • Please do a smoke test of what working with a large file feels like.
  • does editing work ok, pasting, enter, etc.
  • does find in file work, etc.

Notes:

  • I have successfully opened a 1.3 GB file with 1.5 MM lines or a 351 MB file with 6.3 MM lines and I was able to work with it (except saving/hot exit)
  • our memory usage is mostly driven by file size + 2 objects allocated per line (one being the actual string of the line) + an array of pointers to those line objects.
  • try to find some of the limitations / steps that should work without additional memory consumption that end up leading to a crash (i.e. are there features that end up doing massive memory allocation when they should not?).
  • is at a minimum the "read-only" experience OK ? (the scenario is mostly log files)

Release Notes contribution (someone needs to paste this in to 1.15.md which doesn't exist yet):

Large files are always a great way to stress-test an editor. We currently use a line based representation of the text buffer, which has some serious advantages, but also disadvantages, especially in terms of memory consumption (e.g. a file with very many short lines).

Any file that is over 30MB or over 300K lines will be considered a large file and handled in certain code paths, specially. By choosing to disable certain features for such files, e.g. tokenization, line guides, wrapping or folding, we were able to optimize the memory usage, in some cases, by as much as 50%.

Additionally, large files will never travel to our web worker, which computes diff information, link detection, or word based completions. This helps in reducing the memory pressure on the OS.

Some of the optimizations will impact all files, although the effects will be hard to notice with small files. We have decided to lift the hard coded file size limit of 50MB on 64-bit installations and enforce a 300MB file size limit of 32-bit installations.


Emergency toggles:

  • MAX_FILE_ZIE
  • USE_MIMINAL_MODEL_LINE - uses a different implementation for ModelLine that does not have properties for all of the tokenization state and has a lazy markers property.
  • USE_IDENTITY_LINES_COLLECTION - uses a different implementation for the ViewModel line mapping that prohibits wrapping and folding from working and assumes a 1:1 mapping.
  • AVOID_SLICED_STRINGS - takes the strings that make up lines and runs them through a nodejs Buffer to avoid having a (sliced string) per line.
  • Additionally, all the commits that touched something w.r.t. memory improvements contain "Improve memory usage for large files #30180" in their message so if something horrible happens, git bisect on those commits should uncover it.
@alexdima alexdima added this to the July 2017 milestone Jul 7, 2017
@Bill-Stewart
Copy link

"50MB on 64-bit installations" (less than "300MB file size limit of 32-bit installations")?

@Chillee
Copy link

Chillee commented Jul 8, 2017

@Bill-Stewart There was previously a 50MB limit across all editors. That has changed to 300MB for Windows 32 bit and no limit for all other editors.

@michelkaporin
Copy link
Contributor

Great job on that @alexandrudima! Even native Windows Notepad freezes and takes much longer to open large files than VS Code.

Overall experience is good, although the limit for the file size on win32 seems to be optimistic in the current case. We crash quite gracefully, so for me it is acceptable. However, it would be good to have more parameters that define this limit.

@weinand
Copy link
Contributor

weinand commented Jul 27, 2017

I've moved the text for the release notes to v1.15.md

@sandy081 sandy081 removed their assignment Jul 27, 2017
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants