Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory, process killed #536

Closed
ianare opened this issue Nov 22, 2018 · 23 comments
Closed

Out of memory, process killed #536

ianare opened this issue Nov 22, 2018 · 23 comments
Labels

Comments

@ianare
Copy link

ianare commented Nov 22, 2018

Machine info:

32 GB of RAM
8-core CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz

Running zola v0.5.0

The site I'm generating has a content directory with 105336 files, 105732 sub-folders. Almost all of the files are section markdown.

I understand this is probably not a very common use case, but hopefully will be useful for you in optimizing memory management of Zola.

dmesg -T | grep zola

Out of memory: Kill process 29015 (zola) score 908 or sacrifice child
Killed process 29015 (zola) total-vm:30630456kB, anon-rss:30549904kB, file-rss:0kB, shmem-rss:0kB
oom_reaper: reaped process 29015 (zola), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Grafana chart over 3 unsuccessful runs:

screenshot_20181122_204147

@Keats
Copy link
Collaborator

Keats commented Nov 23, 2018

100k files :o
Where does it error? Does it manage to load everything (ie do you see something like -> Creating 43 pages (0 orphan), 1 sections, and processing 0 images)?

@Keats Keats added the bug label Nov 23, 2018
@caemor
Copy link
Contributor

caemor commented Nov 23, 2018

It seems like the hashmaps for get_taxonomy, get_page and get_section, which are made from register_tera_global_fns, are just getting too big for blogs over 10k files

For huge blog:
image

For giant blog (bench example with 100k files generated) until it ran out of memory:
image

about the tool: heaptrack was the first thing I found by searching and therefore I used it for this (+ it is in the arch repos) and one needs to add the following to the main file: (for more infos see: https://speice.io/2018/10/case-study-optimization.html - this also might change the result a little bit)

use std::alloc::System;

#[global_allocator]
static GLOBAL: System = System;

@ianare
Copy link
Author

ianare commented Nov 23, 2018

100k files :o
Where does it error? Does it manage to load everything (ie do you see something like -> Creating 43 pages (0 orphan), 1 sections, and processing 0 images)?

No it never got to that point, the only console output was "building site", followed by "Killed".

For what it's worth, I was able to generate the site on a machine with 128Gb of RAM.

Usage was around 39 GB on each run.

screenshot_20181123_173455

@Keats
Copy link
Collaborator

Keats commented Nov 23, 2018

It seems like the hashmaps for get_taxonomy, get_page and get_section, which are made from register_tera_global_fns, are just getting too big for blogs over 10k files

Yeah that was my guess as well but it is tricky to fix: Keats/tera#340
I don't have any good idea on that currently other than rewriting a custom serde format that doesn't clone like serde_json

@ianare
What's the average size of a section?

@Keats
Copy link
Collaborator

Keats commented Nov 27, 2018

get_page and get_section are redundant now, we can have only get_content so it should be better.
Most of the improvements are likely to be in Tera rather than Zola though.

@ianare
Copy link
Author

ianare commented Dec 6, 2018

Sorry for the delay.

I've run some more generations and it seems that when the sections are small (~ 5 kb) and not complex the memory problem crops up.

When the sections are more complex and larger (10-15 kb), and processing times are slower, the memory usage is better (even if still having large peaks).

It might be a suitable workaround to be able to specify the number of CPUs used during generation.

This would also be useful on machines that are both generating and serving with a dedicated server like nginx at the same time, so as not to introduce lag on the site during generation. Although this would be better in a dedicated ticket... should I open this?

Also really wanted to thank everyone for taking the time to look into this... and apologies for not having (yet) the ability to help out in the code.

@ianare
Copy link
Author

ianare commented Dec 6, 2018

I've run some more generations and it seems that when the sections are small (~ 5 kb) and not complex the memory problem crops up.

When the sections are more complex and larger (10-15 kb), and processing times are slower, the memory usage is better (even if still having large peaks).

It might be a suitable workaround to be able to specify the number of CPUs used during generation.

Apologies, there was an error in the configuration and the reason it was OK on the 32GB server was because there was "only" 60K files.

@Keats
Copy link
Collaborator

Keats commented Jan 27, 2019

@caemor @ianare

I've updated the next branch and it should use way less memory now, can you try again?
Running heaptrack again surfaced another source of allocations which will be fairly easy to fix afaik

@Keats
Copy link
Collaborator

Keats commented Feb 4, 2019

@caemor @ianare ping

@caemor
Copy link
Contributor

caemor commented Feb 5, 2019

I'm gonna try it once more later today. Sorry for the late answer :-D

@ianare
Copy link
Author

ianare commented Feb 7, 2019

Sorry for the delay, I'm getting an error message related to macros:

--> 4:1
|
4 | {% import "macros/image_list.html" as image_list %}
| ^---
|
= unexpected tag; expected end of input or some content

@Keats
Copy link
Collaborator

Keats commented Feb 7, 2019 via email

@caemor
Copy link
Contributor

caemor commented Feb 7, 2019

the next-branch doesn't compile for me with the following errors:

cargo bench bench_loading_huge_blog
.......
Compiling site v0.1.0 (/home/caemor/git/tests/zola/components/site)                                                        
error[E0599]: no method named `pages_values` found for type `std::sync::Arc<std::sync::RwLock<library::Library>>` in the current scope
  --> components/site/benches/site.rs:46:49                                                                                   
   |                                                                                                                          
46 |     b.iter(|| site.render_rss_feed(site.library.pages_values(), None).unwrap());                                         
   |                                                 ^^^^^^^^^^^^                                                             
                                                                                                                              
error[E0599]: no method named `sections_values` found for type `std::sync::Arc<std::sync::RwLock<library::Library>>` in the current scope
  --> components/site/benches/site.rs:64:32                                                                                   
   |                                                                                                                          
64 |     let section = site.library.sections_values()[0];                                                                     
   |                                ^^^^^^^^^^^^^^^                                                                           
                                                                                                                              
error[E0308]: mismatched types                                                                                                
  --> components/site/benches/site.rs:65:55                                                                                   
   |                                                                                                                          
65 |     let paginator = Paginator::from_section(&section, &site.library);                                                    
   |                                                       ^^^^^^^^^^^^^ expected struct `library::Library`, found struct `std::sync::Arc`
   |                                                                                                                          
   = note: expected type `&library::Library`                                                                                  
              found type `&std::sync::Arc<std::sync::RwLock<library::Library>>`                                               
                                                                                                                              
error: aborting due to 3 previous errors                                                                                      
                                                                                                                              
Some errors occurred: E0308, E0599.   

@Keats
Copy link
Collaborator

Keats commented Feb 7, 2019 via email

@Keats
Copy link
Collaborator

Keats commented Feb 8, 2019

Both issues should be fixed on the next branch now

@caemor
Copy link
Contributor

caemor commented Feb 8, 2019

Looks much better than last time. Approx. only 25% of last times maximum heap usage for the huge blog. And the giant blog which didn't build last time also worked with a max usage of 5.8GB
Great work Keats! 👍

Huge Blog (10000 pages, 0 orphans, 0 sections, 0 images): 579,5MB at max

zola_huge_blog_18858

Giant Blog (100000 pages, 0 orphans, 0 sections, 0 images): 5,8GB at max

zola_giant_20174

In Comparison current master with the same huge blog from above needs 2,3GB at max

image

Building time also got reduced to about half it's previous value: (zola build)

Type Without heaptrack With heaptrack
huge on next ~6s ~70s
huge on master ~15s ~148s
giant on next ~71s ~711s

@Keats
Copy link
Collaborator

Keats commented Feb 8, 2019 via email

@caemor
Copy link
Contributor

caemor commented Feb 8, 2019

After setting the rss_limit to 100 the time got down to ~65s/~5.8s (with/without heaptrack) and ~455MB max heap usage for the huge blog on next. Also be aware that the screenshot is only showing the state at its max usage.
Now the serializing of the taxonomies takes the most memory.

image

@Keats maybe you want to set that limit in the gen.py autogeneration in the future for the benches?

Note for myself: Steps to reproduce these heaptracks later:

pacman -S heaptrack
cd zola/components/site/benches
python gen.py
cd zola/
cargo build --release
cd zola/components/site/benches/huge_blog/
heaptrack ../../../../target/release/zola build

Now use the commandline that heaptrack returns to open and analyze the recorded heaptrack (sth similar to heaptrack --analye "path_to_folder/.../heaptrack.zola.25242.zst")

@Keats
Copy link
Collaborator

Keats commented Feb 9, 2019

Now the serializing of the taxonomies takes the most memory.

I see that the toc serialization is taking lots of time because it is on page rather than being added to the context of the specific page rendering itself. Probably an easy win as I don't think many people want to show the table of contents while displaying a list of pages but I could be wrong...

Still using a bit too much memory for my taste...

@Keats
Copy link
Collaborator

Keats commented Feb 9, 2019

I moved the toc out of page and that's a bit better.
The reason the taxonomy rendering takes so much time if that they are not paginated so it is basically serializing all the pages, which is going to take time/memory in those benches.
The blog benches are not super realistic, the huge-kb has 10k pages as well but renders in 3.6s for example.

@Keats
Copy link
Collaborator

Keats commented Mar 25, 2019

It should be fine with 0.6.0 (currently being built).
Re-open the issue if you still encounter problems!

@Keats Keats closed this as completed Mar 25, 2019
@ianare
Copy link
Author

ianare commented Apr 10, 2019

Can confirm that memory usage is much much lower now. Great job!

@sr2ds
Copy link

sr2ds commented Mar 11, 2024

Hello guys,
@ianare , how big is your blog now?

I'm dealing with some work like your, my blog now has 104k pages in markdown and 1 image for each post.
I'm not dealing with re-size or something for images, I'm only copying and using the original image.

I'm with the same problem that you had some years ago.
The memory increase is high, not so high, @Keats really made a good job, thanks.

My problem is that, I'm using the Netlify as free version and I have some memory limit in the build, around 6GB.
My build is taking around 7,5GB in my computer, but in Netlify the process can´t be finished.
For now, I'm doing the build here and pushing the statics direclty but I would like try improve this behavior because my blog will growth more and will be impossible deal with builds here soon.

Some ideas about the possible solution:

Has some way for us allow some async process as batch?
Like build X posts by time and the build process can fresh memory while work?

Or another idea is to be able do incremental builds?
Maybe something like:
zola buld --increase

And in this option mode, we need keep the public versioned, but we can only construct the new files and the public is not erased all time.

I can write something in Rust, I didnt start to find some solution yet, I would like some help before start it and found this good chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants