-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory, process killed #536
Comments
100k files :o |
It seems like the hashmaps for get_taxonomy, get_page and get_section, which are made from register_tera_global_fns, are just getting too big for blogs over 10k files For giant blog (bench example with 100k files generated) until it ran out of memory: about the tool: heaptrack was the first thing I found by searching and therefore I used it for this (+ it is in the arch repos) and one needs to add the following to the main file: (for more infos see: https://speice.io/2018/10/case-study-optimization.html - this also might change the result a little bit)
|
No it never got to that point, the only console output was "building site", followed by "Killed". For what it's worth, I was able to generate the site on a machine with 128Gb of RAM. Usage was around 39 GB on each run. |
Yeah that was my guess as well but it is tricky to fix: Keats/tera#340 @ianare |
|
Sorry for the delay. I've run some more generations and it seems that when the sections are small (~ 5 kb) and not complex the memory problem crops up. When the sections are more complex and larger (10-15 kb), and processing times are slower, the memory usage is better (even if still having large peaks). It might be a suitable workaround to be able to specify the number of CPUs used during generation. This would also be useful on machines that are both generating and serving with a dedicated server like nginx at the same time, so as not to introduce lag on the site during generation. Although this would be better in a dedicated ticket... should I open this? Also really wanted to thank everyone for taking the time to look into this... and apologies for not having (yet) the ability to help out in the code. |
Apologies, there was an error in the configuration and the reason it was OK on the 32GB server was because there was "only" 60K files. |
I'm gonna try it once more later today. Sorry for the late answer :-D |
Sorry for the delay, I'm getting an error message related to macros:
|
I believe I fixed that in Tera but didnt push a new version yet. Is there
an empty line above? Try to remove it if there is one
…On Thu, 7 Feb 2019, 20:19 ianaré sévi, ***@***.***> wrote:
Sorry for the delay, I'm getting an error message related to macros:
--> 4:1
|
4 | {% import "macros/image_list.html" as image_list %}
| ^---
|
= unexpected tag; expected end of input or some content
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#536 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApho6RJJ2QVcpITYe8Ee0CPrjl8aQNoks5vLHxAgaJpZM4Yv1c_>
.
|
the next-branch doesn't compile for me with the following errors:
|
Looks like I forgot to update those benches, will do later
…On Thu, 7 Feb 2019, 21:23 Chris, ***@***.***> wrote:
the next-branch doesn't compile for me with the following errors:
cargo bench bench_loading_huge_blog
.......
Compiling site v0.1.0 (/home/caemor/git/tests/zola/components/site)
error[E0599]: no method named `pages_values` found for type `std::sync::Arc<std::sync::RwLock<library::Library>>` in the current scope
--> components/site/benches/site.rs:46:49
|
46 | b.iter(|| site.render_rss_feed(site.library.pages_values(), None).unwrap());
| ^^^^^^^^^^^^
error[E0599]: no method named `sections_values` found for type `std::sync::Arc<std::sync::RwLock<library::Library>>` in the current scope
--> components/site/benches/site.rs:64:32
|
64 | let section = site.library.sections_values()[0];
| ^^^^^^^^^^^^^^^
error[E0308]: mismatched types
--> components/site/benches/site.rs:65:55
|
65 | let paginator = Paginator::from_section(§ion, &site.library);
| ^^^^^^^^^^^^^ expected struct `library::Library`, found struct `std::sync::Arc`
|
= note: expected type `&library::Library`
found type `&std::sync::Arc<std::sync::RwLock<library::Library>>`
error: aborting due to 3 previous errors
Some errors occurred: E0308, E0599.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#536 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAphow_yx6Lscn1OIr4ZWq9ZnFWrmAyDks5vLItZgaJpZM4Yv1c_>
.
|
Both issues should be fixed on the |
Looks much better than last time. Approx. only 25% of last times maximum heap usage for the huge blog. And the giant blog which didn't build last time also worked with a max usage of 5.8GB Huge Blog (10000 pages, 0 orphans, 0 sections, 0 images): 579,5MB at max Giant Blog (100000 pages, 0 orphans, 0 sections, 0 images): 5,8GB at max In Comparison current master with the same huge blog from above needs 2,3GB at max Building time also got reduced to about half it's previous value: (zola build)
|
Looks like the RSS feed doesn't have a limit so it serialises and renders
all the pages. Can you try to put a realistic limit like 100 for rss_limit
in the config?
…On Fri, 8 Feb 2019, 20:41 Chris, ***@***.***> wrote:
Looks much better than last time. Approx. only 25% of last times usage for
the huge blog. Great work Keats! 👍
[image: zola_huge_blog_18858]
<https://user-images.githubusercontent.com/11088935/52501678-32b6fd80-2be1-11e9-9ba7-205946ccfed7.png>
[image: zola_giant_20174]
<https://user-images.githubusercontent.com/11088935/52501686-36e31b00-2be1-11e9-924d-a98df4e3c6f0.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#536 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApho34ridCR1K_J5-Nm6d1-m7svESGyks5vLdLigaJpZM4Yv1c_>
.
|
After setting the rss_limit to 100 the time got down to ~65s/~5.8s (with/without heaptrack) and ~455MB max heap usage for the huge blog on next. Also be aware that the screenshot is only showing the state at its max usage. @Keats maybe you want to set that limit in the gen.py autogeneration in the future for the benches? Note for myself: Steps to reproduce these heaptracks later:
Now use the commandline that heaptrack returns to open and analyze the recorded heaptrack (sth similar to heaptrack --analye "path_to_folder/.../heaptrack.zola.25242.zst") |
I see that the toc serialization is taking lots of time because it is on Still using a bit too much memory for my taste... |
I moved the toc out of |
It should be fine with 0.6.0 (currently being built). |
Can confirm that memory usage is much much lower now. Great job! |
Hello guys, I'm dealing with some work like your, my blog now has 104k pages in markdown and 1 image for each post. I'm with the same problem that you had some years ago. My problem is that, I'm using the Netlify as free version and I have some memory limit in the build, around 6GB. Some ideas about the possible solution: Has some way for us allow some async process as batch? Or another idea is to be able do incremental builds? And in this option mode, we need keep the I can write something in Rust, I didnt start to find some solution yet, I would like some help before start it and found this good chat. |
Machine info:
32 GB of RAM
8-core CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz
Running zola v0.5.0
The site I'm generating has a
content
directory with 105336 files, 105732 sub-folders. Almost all of the files are section markdown.I understand this is probably not a very common use case, but hopefully will be useful for you in optimizing memory management of Zola.
Grafana chart over 3 unsuccessful runs:
The text was updated successfully, but these errors were encountered: