How to get the compiled function which can be called later? #230
-
Once I do
Essentially an autotuner that returns the best compiled version. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi @vesuppi! Good question, as this isn't really documented all that well! For Python applications, we have the PythonKernel from kernel_tuner.kernelbuilder. This example shows the simplest way to use it: The idea is that you can either directly specify which configuration you want with the 'params=' option of PythonKernel. For example you could use get_best_config from kernel_tuner.util and pass that as the params. A probably better way to do this is to let Kernel Tuner figure out which configuration to use based on the tuning results of tune_kernel that have been stored to a file. In that case you have to use the 'results_file=' option of PythonKernel and point to a "results file" written using the store_results function in kernel_tuner.integration. This may seem like an additional step, but this enables you to only tune once, store the results, and then run the application many times reusing the same tuning results. The selection for which kernel configuration to compile is made based on the GPU available at run time and the specified problem size. |
Beta Was this translation helpful? Give feedback.
-
I see, thank you very much for the detailed explanation! I was able to get the best config using results, env = tune_kernel("vector_add", kernel_string, N, (c, a, b, torch.tensor(N)), tune_params)
best_config = util.get_best_config(results, 'time') Haven't tried PythonKernel yet, but will do! Another side question, if we want to tune the size of the thread block, does the parameters have to be named "block_size_x" and "block_size_y"? Thanks! |
Beta Was this translation helpful? Give feedback.
-
You can indeed use other names for the thread block dimensions. You can specify the names of these using the This test illustrates how to use this option: |
Beta Was this translation helpful? Give feedback.
Hi @vesuppi! Good question, as this isn't really documented all that well! For Python applications, we have the PythonKernel from kernel_tuner.kernelbuilder. This example shows the simplest way to use it:
https://github.com/KernelTuner/kernel_tuner/blob/master/examples/cuda/python_kernel.py
The idea is that you can either directly specify which configuration you want with the 'params=' option of PythonKernel. For example you could use get_best_config from kernel_tuner.util and pass that as the params.
A probably better way to do this is to let Kernel Tuner figure out which configuration to use based on the tuning results of tune_kernel that have been stored to a file. In that case you hav…