-
Notifications
You must be signed in to change notification settings - Fork 192
Performance Notes and Tips
This documentation page is a collection of information on topics related to simulation performance.
-
Performance FAQ
- My CPU is running under 100% usage while simulating. Is this normal?
- What CPU works best with the FLIP Fluids simulator?
- Does caching on a hard disk drive versus a solid state drive affect performance?
- Can operating system power management settings affect performance?
- Can minimizing Blender affect performance?
- Is the FLIP Fluids simulator GPU accelerated?
- Simulation Optimization Tips
- Performance Score and Measuring Performance
- FLIP Fluids Benchmark
Yes, it is normal for your CPU to run under 100% usage on average. Here are some explanations why the simulator is not running at 100% usage:
- CPU usage may be low on lower resolution simulations and is generally higher on high resolution simulations. For lower resolution or smaller simulation effects, there may just not be enough work for the CPU to do to use more of its resources.
- The simulation calculations alternate between single threaded and multithreaded calculations. Some calculations in the simulator are not able to be multithreaded efficiently and must be run on a single thread or smaller number of threads. These sections of calculations create a bottleneck which lowers average CPU usage.
- Simulation setups with small amounts of fluid in a large domain with a lot of empty space can result in lower CPU usage. It may be possible to optimize the simulation setup for performance and detail by sizing the domain to tightly fit around your fluid effect. More information in this documentation topic: How large should I make my domain object?
- The simulator may be running with too many threads enabled. It is possible that simulations could slow down from the overhead of running more threads than the simulator can handle efficiently. Systems with highly threaded CPUs may benefit from simulating with less threads enabled for smaller simulations (More information in Measuring Performance topic).
- The operating system power management settings can reduce performance and result in lower CPU usage (More information in Power Management topic).
The simulation calculations alternate between single threaded and multithreaded calculations. Single threaded calculation performance will depend on high clock speed as well as high single threaded clock speed. Multithreaded performance will depend on both high clock speeds and number of threads.
A good all around processor for FLIP Fluids simulation baking would be capable of running around 8 to 32 threads and have a high clock speed as well as a high boosted single threaded clock speed. High clock speed is the most important aspect of a CPU for this type of simulator in terms of performance. After around 32 to 48 threads, you may start to see diminishing returns on performance when running more threads unless you are running a high resolution simulation with large amounts of liquid.
For highly threaded CPUs, these perform best for high resolution simulations with large amounts of fluid. For example, large simulations run on an AMD Ryzen Threadripper 1950X (32 thread, 3.4GHz base, 4.0GHz boosted) bake at over twice the speed of an Intel Core i7-7700 (8 thread, 3.6GHz base, 4.2GHz boosted) in our benchmarks. These types of CPUs are very fast for the multithreaded calculations, but for the single threaded calculations, they don’t perform much quicker than any other CPU with a similar clock speed.
Highly threaded CPUs can perform very well for running multiple simulations simultaneously which can be useful for testing different variations of a simulation setup or simulation settings. Running multiple simulations may require a system with a large amount of RAM and 64GB of RAM is recommended for running two or more large simulations at the same time.
For comparisons between difference CPUs, see the FLIP Fluids Benchmark results below. The Example Scene Descriptions also detail time comparisons between CPUs.
Whether you are caching your simulations on a hard drive (HDD) or a solid state drive (SSD) can affect how quickly your simulation runs. The simulator saves simulation files to your HDD or SSD at the end of each frame, and the CPU cannot continue calculations until the files are finished writing. Generally, file write speed on an HDD are much slower than on an SSD and for this reason it is recommended to cache to an SSD if possible.
For large simulations there will be more data to save to your HDD or SSD. The amount of data can often exceed 300 MB per frame for very large simulation. To test write performance for large files on your HDD vs SSD, you can try copying a 300 MB file. If your HDD is taking 5 seconds to copy this file vs 0.5 seconds on an SSD, this time difference can really add up over a large number of frames. 5 seconds spent on each frame for 500 frames adds up to over 40 minutes spent on just writing files!
Yes, it is possible that the power management settings of your operating system can affect simulation times and CPU usage. Settings that reduce CPU usage to save power or increase battery life on laptops can slow down the simulation. It is possible that an OS update can affect these settings, so this may be something to check if you begin to experience slower simulation times than normal. Other software may also affect these settings - for example, MSI Center may reset power settings upon a system restart.
See these OS help topics for how to set the power management settings on your system:
- Windows 10/11 - Change the power mode for your Windows PC
- macOS - Save energy on your Mac | Change Battery settings on a Mac laptop
- Ubuntu Linux - PowerManagement/ReducedPower
Yes, it is possible that minimizing Blender can affect performance. Depending on operating system settings such as how the OS handles background tasks, performance can decrease when minimizing the Blender application during simulation or when using other areas of Blender such as rendering. Running a simulation or render from the command line can help prevent this performance decrease (See Command Line Tools).
The FLIP Fluids simulator is not GPU accelerated and runs fully on the CPU.
The simulation methods and techniques used in many of our features are not suitable for GPU processing. This is due to the nature of the types of calculations that our simulator runs. Many calculations of these features are not parallelizable enough to benefit from running on a GPU. Some features would benefit from being run on the GPU, however, switching between computations on the CPU and GPU can be slow and harm performance.
At the moment we do not have plans to develop a liquid simulator that runs fully on the GPU. We may visit this idea in a future development project separate from the FLIP Fluids addon, but this in not planned at the moment.
We are experimenting with accelerating some computations on the GPU as part of the FLIP Fluids 2.0 project, but this will not be a major feature of the new simulation engine (Will the FLIP Fluids 2.0 engine be GPU accelerated?).
High quality fluid simulations are known to take a long time to compute, but there are ways to optimize your simulation setup to maximize performance:
- Does your domain have a lot of empty space? Try resizing your domain to fit tightly around your fluid effect to maximize simulation detail and performance. This is the most common problem we see in simulation setups that are running slowly. See this documentation topic: How large should I make my domain object?
- Does your simulation contain a large volume of fluid? More fluid will lead to longer baking times. Reducing the volume of fluid in your domain will help speed up simulation times. Is simulating a deep pool of liquid necessary? Simulating a more shallow body of liquid is a common way to speed up your fluid simulation.
- Do your keyframe/animated objects contain a lot of geometry? Moving objects need to be re-computed every frame and substep of the simulation. If your moving objects contain a lot of geometry, this can greatly slow down simulation. Often we see animated objects that contain way more geometry detail than is necessary. A common workflow in simulation is to use simpler low detail objects for simulation and high detail objects for rendering.
- Are all of the enabled features necessary? Enabling the following features can add a lot to the simulation time
- Mesh Generation Subdivisions - We usually recommend increasing the mesh subdivision level to 1 for a final simulation. This will take longer to generate but will result in a higher detail mesh. If you are just testing your simulation setup, set the subdivisions to 0 to speed up testing and iteration.
- Whitewater Solver - Enabling whitewater simulation can often double the base simulation time.
- Viscosity solver - There is a trick to simulate low viscosity fluids at no extra cost using the PIC/FLIP Ratio setting.
- Surface tension and sheeting solver - These features are usually only applicable to small scale fluid effects. Enabling these features in large scale fluid effects will add a lot of simulation time and the effect on the simulation will often not be very noticeable.
- For large scale simulations, such as oceans, beaches, or other slow moving bodies of water, you can often get away with a higher CFL Number in the FLIP Fluid Advanced Panel such as 10 or 15 without affecting results. This can greatly improve simulation baking time, and even double or triple the speed in high resolution simulations. However, if you have thin obstacles or very quick moving obstacles, this may affect accuracy or result in leakage. A thick obstacle, such as a ships hull moving through the water is a good situation for increasing the CFL number.
- Are you able to lower your domain resolution? Not all simulation effects need to be incredibly detailed. If you can get away with simulating at a lower resolution, this will help speed up baking.
When any FLIP Fluid objects have the Export Animated Mesh option enabled, the addon will need to export this animation frame by frame. Here are some note on why this process could be taking a long time and tips to resolve these types of issues:
- Does your object need to be exported as an animated mesh? The Export Animated Mesh option is only needed for objects that have animation more complex than than keyframed location/rotation/scale or f-curve animation, such as armatures or parented objects. The addon will be able export simple keyframed animation very quickly.
- Do your animated objects contain a lot of geometry? Excessive geometry can take more time to export. If your exported mesh geometry contains many thousands of vertices and polygons for each frame, this can really add up and take a long time to export! For most cases, 30k faces for an object is more than enough geometry for the simulator. You may want to use the high poly object for rendering and a lower poly proxy version of the object for simulation.
- Don't want to take the time to re-export when starting a new bake? If your animated object has already been exported and the motion hasn't changed, you can skip re-exporting this object by enabling the Skip animated mesh re-export option for the object. Just remember to disable this option, or force a re-export if the motion or geometry of the object has changed since the last export.
- Is playback slow in your scene? To export animated meshes, Blender needs to evaluate and playback each frame in order to fully evaluate the object meshes. If you have other baked data or objects in your scene that take a long time to evaluate, disabling these objects from loading (such as by disabling modifiers) will speed up playback, and thus will speed up animated mesh export.
As the simulation runs, a performance score metric can be viewed in the Domain > Stats > Simulation Stats panel. The Performance Score value is a simple measure for how much fluid is being processed per second on your system under the current simulation setup and settings. This value can be viewed for individual frames or an average over the entire cache.
Notes and Tips:
- The score may range from small single or double-digit values to higher values in the thousands and will depend on your CPU, simulation setup, and simulation settings.
- This score can be used to get an idea for how the current simulation setup is performing on your CPU and how changes to the setup or settings affect performance.
- This value can also be useful for measuring performance at different thread counts. Running small simulations with too many threads can harm performance due to overhead of thread management. For small simulations, you may see a higher performance score and quicker simulation baking at a lower number of threads (Documentation).
As an example, here are the performance score results on a high powered Intel i9-13900K CPU for different thread counts on a basic simulation at the default 65 resolution. As you can see, running this simulation with 4 threads maximizes the performance score while running with the maximum 32 threads results in a lower score.1 thread: 364 2 thread: 590 3 thread: 671 4 thread: 716 <-- 5 thread: 714 6 thread: 672 7 thread: 646 8 thread: 608 16 thread: 417 32 thread: 249
- The performance score may increase for larger high resolution simulations as there is more fluid to process and can take advantage of more CPU resources.
- Enabling more addon features will often increase how long it takes to compute the fluid, and in turn will decrease the performance score.
- If the simulation setup contains a domain with a large amount of empty space, this can lower the performance score. Sizing the domain so that it fits more tightly around the fluid effect can help improve performance. Related topic: How large should I make my domain object?
The FLIP Fluids addon benchmark is composed of six simulation scenes, each with a setup designed to test performance of different simulation features and aspects of your hardware such as multi-threaded CPU performance, single-threaded and boosted CPU performance, memory performance, and more.
Visit our FLIP Fluids Benchmark page for information on how to submit results
The following two charts are based on the latest benchmarks from November 2023. While one chart shows the average performance score of various systems in direct comparison, the other shows a "normalized" version of the same benchmark scenes where performance scores were scaled vertically to the same level, without losing the performance difference.
Below are notes and the tabulated results for each benchmark scene.
The Force Fields benchmark scene tests force field computation and chaotic fluid motion. This is a balanced scene that benefits from both single-threaded and multi-threaded CPU performance.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 1415 | 1000 | 00:22:20 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 2 | 1205 | 851 | 00:26:14 |
Apple M2 Max | 12 | 3.49 / -- | 3 | 1114 | 787 | 00:28:23 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 4 | 926 | 654 | 00:33:33 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 5 | 739 | 523 | 00:42:47 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 6 | 666 | 470 | 00:47:28 |
Apple M2 | 8 | 3.49 / -- | 7 | 561 | 397 | 00:56:21 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 8 | 363 | 257 | 01:27:05 |
The Dam Break benchmark scene is a classic liquid simulation scenario where a block of liquid is dropped on one side of the simulation domain. This scene tests a basic liquid simulation at mostly default settings. This is a balanced scene that benefits from both single-threaded and multi-threaded CPU performance.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 1608 | 1000 | 00:10:46 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 2 | 1359 | 844 | 00:12:44 |
Apple M2 Max | 12 | 3.49 / -- | 3 | 1215 | 755 | 00:14:15 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 4 | 1074 | 668 | 00:16:07 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 5 | 833 | 518 | 00:20:46 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 6 | 777 | 483 | 00:22:16 |
Apple M2 | 8 | 3.49 / -- | 7 | 674 | 419 | 00:25:40 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 8 | 387 | 240 | 00:44:43 |
The Collapsing benchmark scene tests a 714 piece fracture simulation scenario and tests simulation performance in processing a large number of obstacle objects. This scene can benefit greatly from single-threaded CPU performance, but multi-threaded performance will still play a significant role in obstacle processing.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 1034 | 1000 | 00:16:20 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 3 | 613 | 591 | 00:27:33 |
Apple M2 Max | 12 | 3.49 / -- | 2 | 883 | 852 | 00:19:08 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 4 | 606 | 585 | 00:27:52 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 6 | 483 | 466 | 00:34:58 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 7 | 362 | 349 | 00:46:39 |
Apple M2 | 8 | 3.49 / -- | 5 | 551 | 532 | 00:30:39 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 8 | 300 | 290 | 00:56:18 |
The Fluid in and Invisible Box benchmark scene tests performance for a common simulation setup: a smaller amount of fluid simulated within a much larger simulation domain. In this setup, fluid contained in a box falls and tumbles within a large domain. For maximum performance the domain should fit tightly around the fluid effect with minimal empty space (documentation), but this ideal setup is not always possible such as in this simulation scenario. This scene tests sparse fluid computations of the simulator, meaning optimizations where the simulator only computes over areas where fluid exists and ignores empty areas.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 884 | 1000 | 01:28:33 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 3 | 688 | 778 | 01:53:47 |
Apple M2 Max | 12 | 3.49 / -- | 2 | 691 | 781 | 01:53:17 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 4 | 625 | 707 | 02:05:15 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 5 | 497 | 563 | 02:37:31 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 6 | 447 | 506 | 02:55:08 |
Apple M2 | 8 | 3.49 / -- | 7 | 373 | 422 | 03:29:52 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 8 | 244 | 276 | 05:20:50 |
The Viscosity benchmark scene tests the viscosity solver of the simulator. Viscosity solving is largely a single-threaded process in this simulator and this scene is designed to focus on the single-threaded performance of your CPU.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 23 | 1000 | 00:50:30 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 2 | 20 | 869 | 00:58:05 |
Apple M2 Max | 12 | 3.49 / -- | 1 | 23 | 1000 | 00:50:30 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 3 | 19 | 826 | 01:01:08 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 4 | 16 | 695 | 01:12:36 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 6 | 9 | 391 | 02:09:04 |
Apple M2 | 8 | 3.49 / -- | 3 | 19 | 826 | 01:01:08 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 5 | 14 | 608 | 01:22:59 |
The Attributes A benchmark scene tests fluid surface attribute generation and color blending features of the simulator. This scene involves processing large amounts of data and is designed to focus on the multi-threaded performance of your CPU.
CPU Model | Threads | Base / Max Freq. (GHz) | Rank | Performance Score | Normalized | Bake Time (HH:MM:SS) |
---|---|---|---|---|---|---|
Apple M2 Studio Ultra | 24 | 3.68 / -- | 1 | 407 | 1000 | 00:15:32 |
Intel i9 13900K | 32 | 3.00 / 5.80 | 3 | 104 | 255 | 01:00:45 |
Apple M2 Max | 12 | 3.49 / -- | 6 | 48 | 118 | 02:11:36 |
Intel i9 12900K | 24 | 3.20 / 5.20 | 5 | 74 | 182 | 01:25:25 |
AMD Ryzen 9 3900X | 24 | 3.80 / 4.60 | 2 | 264 | 648 | 00:23:58 |
AMD Ryzen 9 3950X | 32 | 3.50 / 4.70 | 4 | 77 | 189 | 01:22:04 |
Apple M2 | 8 | 3.49 / -- | 7 | 22 | 54 | 04:47:10 |
Intel i7 7700 | 8 | 3.60 / 4.20 | 8 | 18 | 44 | 05:50:59 |