docs: improve multi-threaded solving

TimefoldAI · Mar 25, 2024 · 6151bf1 · 6151bf1
1 parent bc9c216
commit 6151bf1
Showing 1 changed file with 140 additions and 51 deletions.
diff --git a/docs/src/modules/ROOT/pages/enterprise-edition/enterprise-edition.adoc b/docs/src/modules/ROOT/pages/enterprise-edition/enterprise-edition.adoc
@@ -346,17 +346,19 @@ It is not available in the Community Edition.
 
 There are several ways of doing multi-threaded solving:
 
-* *Multitenancy*: solve different datasets in parallel
-** The `SolverManager` will make it even easier to set this up, in a future version.
+* *<<multithreadedIncrementalSolving,Multi-threaded incremental solving>>*:
+Solve 1 dataset with multiple threads without sacrificing xref:constraints-and-score/performance.adoc#incrementalScoreCalculation[incremental score calculation].
+** Donate a portion of your CPU cores to Timefold Solver to scale up the score calculation speed and get the same results in fraction of the time.
+* *<<partitionedSearch,Partitioned Search>>*:
+Split 1 dataset in multiple parts and solve them independently.
 * *Multi bet solving*: solve 1 dataset with multiple, isolated solvers and take the best result.
 ** Not recommended: This is a marginal gain for a high cost of hardware resources.
 ** Use the xref:using-timefold-solver/benchmarking-and-tweaking.adoc#benchmarker[Benchmarker] during development to determine the most appropriate algorithm, although that's only on average.
 ** Use multi-threaded incremental solving instead.
-* *Partitioned Search*: Split 1 dataset in multiple parts and solve them independently.
-** Configure a <<partitionedSearch,Partitioned Search>>.
-* *Multi-threaded incremental solving*: solve 1 dataset with multiple threads without sacrificing xref:constraints-and-score/performance.adoc#incrementalScoreCalculation[incremental score calculation].
-** Donate a portion of your CPU cores to Timefold Solver to scale up the score calculation speed and get the same results in fraction of the time.
-** Configure <<multithreadedIncrementalSolving,multi-threaded incremental solving>>.
+* *Multitenancy*: solve different datasets in parallel
+** The `SolverManager` will make it even easier to set this up, in a future version.
+
+In this section, we will focus on multi-threaded incremental solving.
 
 image::enterprise-edition/multiThreadingStrategies.png[align="center"]
 
@@ -370,36 +372,105 @@ and slow down the xref:constraints-and-score/performance.adoc#scoreCalculationSp
 [#multithreadedIncrementalSolving]
 ==== Multi-threaded incremental solving
 
+With this feature, the solver can run significantly faster, getting you the right solution earlier.
+It is especially useful for large datasets, where score calculation speed is the bottleneck.
+
+The following table shows the observed score calculation speeds
+of the Vehicle Routing Problem and the Maintenance Scheduling Problem,
+as the number of threads increases:
+
+|===
+|Number of Threads |Vehicle Routing |Maintenance Scheduling
+
+|1
+|~ 22,000
+|~  6,000
+
+|2
+|~ 40,000
+|~ 11,000
+
+|4
+|~ 70,000
+|~ 19,000
+|===
+
+As we can see, the speed increases with the number of threads,
+but the scaling is not exactly linear due to the overhead of managing communication between multiple thread.
+Above 4 parallel threads,
+this overhead starts to dominate and therefore we do not recommend scaling over that threshold.
+
+[NOTE]
+====
+These numbers are strongly dependent on move selector configuration,
+size of the dataset and performance of individual constraints.
+We believe they are indicative of the speedups you can expect from this feature,
+but your mileage may vary significantly.
+====
+
+===== Enabling multi-threaded incremental solving
+
 Enable multi-threaded incremental solving by <<planningId,adding a @PlanningId annotation>>
 on every planning entity class and planning value class.
 Then configure a `moveThreadCount`:
 
-[source,xml,options="nowrap"]
+[tabs]
+====
+Quarkus::
++
+--
+Add the following to your `application.properties`:
+
+[source,properties]
 ----
-<solver xmlns="https://timefold.ai/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
-    xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
-  <moveThreadCount>AUTO</moveThreadCount>
-  ...
-</solver>
+quarkus.timefold.solver.move-thread-count=AUTO
 ----
+--
+Spring::
++
+--
+Add the following to your `application.properties`:
 
-That one extra line heavily improves the score calculation speed,
-presuming that your machine has enough free CPU cores.
+[source,properties]
+----
+timefold.solver.move-thread-count=AUTO
+----
+--
+Java::
++
+--
+[source,java,options="nowrap"]
+Use the `SolverConfig` class:
 
-Advanced configuration:
+----
+SolverConfig solverConfig = new SolverConfig()
+    ...
+    .withMoveThreadCount("AUTO");
+----
+--
+XML::
++
+--
+Add the following to your `solverConfig.xml`:
 
 [source,xml,options="nowrap"]
 ----
 <solver xmlns="https://timefold.ai/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
-    xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
-  <moveThreadCount>4</moveThreadCount>
-  <moveThreadBufferSize>10</moveThreadBufferSize>
-  <threadFactoryClass>...MyAppServerThreadFactory</threadFactoryClass>
-  ...
+xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
+
+    ...
+    <moveThreadCount>AUTO</moveThreadCount>
+    ...
+
 </solver>
 ----
+--
+====
 
-A `moveThreadCount` of `4` xref:integration/integration.adoc#sizingHardwareAndSoftware[saturates almost 5 CPU cores]:
+Setting `moveThreadCount` to `AUTO` allows Timefold Solver to decide how many move threads to run in parallel.
+This formula is based on experience and does not hog all CPU cores on a multi-core machine.
+
+A `moveThreadCount` of `4` xref:integration/integration.adoc#sizingHardwareAndSoftware[saturates almost 5 CPU cores].
 the 4 move threads fill up 4 CPU cores completely
 and the solver thread uses most of another CPU core.
 
@@ -409,15 +480,18 @@ The following ``moveThreadCount``s are supported:
 * ``AUTO``: Let Timefold Solver decide how many move threads to run in parallel.
 On machines or containers with little or no CPUs, this falls back to the single threaded code.
 * Static number: The number of move threads to run in parallel.
-+
-[source,xml,options="nowrap"]
-----
-<moveThreadCount>4</moveThreadCount>
-----
-+
 This can be `1` to enforce running the multi-threaded code with only 1 move thread
 (which is less efficient than `NONE`).
 
+[IMPORTANT]
+====
+In cloud environments where resource use is billed by the hour,
+consider the trade-off between cost of the extra CPU cores needed and the time saved.
+Compute nodes with higher CPU core counts are typically more expensive to run
+and therefore you may end up paying more for the same result,
+even though the actual compute time needed will be less.
+====
+
 It is counter-effective to set a `moveThreadCount`
 that is higher than the number of available CPU cores,
 as that will slow down the score calculation speed.
@@ -430,37 +504,31 @@ A run of the same solver configuration on 2 machines with a different number of
 is still reproducible, unless the `moveThreadCount` is set to `AUTO` or a function of `availableProcessorCount`.
 ====
 
-The `moveThreadBufferSize` power tweaks the number of moves that are selected but won't be foraged.
-Setting it too low reduces performance, but setting it too high too.
-Unless you're deeply familiar with the inner workings of multi-threaded solving, don't configure this parameter.
+===== Advanced configuration
 
-To run in an environment that doesn't like arbitrary thread creation,
-use `threadFactoryClass` to plug in a <<customThreadFactory,custom thread factory>>.
-
-
-
-[#customThreadFactory]
-==== Custom thread factory (WildFly, GAE, ...)
-
-The `threadFactoryClass` allows to plug in a custom `ThreadFactory` for environments
-where arbitrary thread creation should be avoided,
-such as most application servers (including WildFly) or Google App Engine.
-
-Configure the `ThreadFactory` on the solver to create the <<multithreadedIncrementalSolving,move threads>>
-and the <<partitionedSearch,Partition Search threads>> with it:
+There are additional parameters you can supply to your `solverConfig.xml`:
 
 [source,xml,options="nowrap"]
 ----
 <solver xmlns="https://timefold.ai/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
+  <moveThreadCount>4</moveThreadCount>
+  <moveThreadBufferSize>10</moveThreadBufferSize>
   <threadFactoryClass>...MyAppServerThreadFactory</threadFactoryClass>
   ...
 </solver>
 ----
 
+The `moveThreadBufferSize` power tweaks the number of moves that are selected but won't be foraged.
+Setting it too low reduces performance, but setting it too high too.
+Unless you're deeply familiar with the inner workings of multi-threaded solving, don't configure this parameter.
+
+To run in an environment that doesn't like arbitrary thread creation,
+use `threadFactoryClass` to plug in a <<customThreadFactory,custom thread factory>>.
+
 
 [#partitionedSearch]
-=== Partitioned search
+==== Partitioned search
 
 [NOTE]
 ====
@@ -469,7 +537,7 @@ It is not available in the Community Edition.
 ====
 
 [#partitionedSearchAlgorithm]
-==== Algorithm description
+===== Algorithm description
 
 It is often more efficient to partition large data sets (usually above 5000 planning entities)
 into smaller pieces and solve them separately.
@@ -495,7 +563,7 @@ without any of the constraints crossing boundaries between partitions.
 
 
 [#partitionedSearchConfiguration]
-==== Configuration
+===== Configuration
 
 Simplest configuration:
 
@@ -558,11 +626,11 @@ followed by a non-partitioned Local Search phase:
 
 
 [#partitioningASolution]
-==== Partitioning a solution
+===== Partitioning a solution
 
 
 [#customSolutionPartitioner]
-===== Custom `SolutionPartitioner`
+====== Custom `SolutionPartitioner`
 
 To use a custom `SolutionPartitioner`, configure one on the Partitioned Search phase:
 
@@ -605,7 +673,7 @@ add the `solutionPartitionerCustomProperties` element and use xref:using-timefol
 
 
 [#runnablePartThreadLimit]
-==== Runnable part thread limit
+===== Runnable part thread limit
 
 When running a multi-threaded solver, such as Partitioned Search, CPU power can quickly become a scarce resource,
 which can cause other processes or threads to hang or freeze.
@@ -651,6 +719,27 @@ the host is likely to hang or freeze,
 unless there is an OS specific policy in place to avoid Timefold Solver from hogging all the CPU processors.
 ====
 
+
+[#customThreadFactory]
+==== Custom thread factory (WildFly, GAE, ...)
+
+The `threadFactoryClass` allows to plug in a custom `ThreadFactory` for environments
+where arbitrary thread creation should be avoided,
+such as most application servers (including WildFly) or Google App Engine.
+
+Configure the `ThreadFactory` on the solver to create the <<multithreadedIncrementalSolving,move threads>>
+and the <<partitionedSearch,Partition Search threads>> with it:
+
+[source,xml,options="nowrap"]
+----
+<solver xmlns="https://timefold.ai/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+    xsi:schemaLocation="https://timefold.ai/xsd/solver https://timefold.ai/xsd/solver/solver.xsd">
+  <threadFactoryClass>...MyAppServerThreadFactory</threadFactoryClass>
+  ...
+</solver>
+----
+
+
 [#automaticNodeSharing]
 === Automatic node sharing