You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi All,
With reference to "#46" I've implemented a model with many parameters (more than 32) and I'm trying to implement a "vector function" to be fit, plus the number of parameter is more than 32. Therefore I'm trying to hack a bit the code in such a way I'd be able to support my case. With reference to function below
void LMFitCUDA::solve_equation_system()
{
dim3 threads(1, 1, 1);
dim3 blocks(1, 1, 1);
threads.x = info_.n_parameters_to_fit_*info_.n_fits_per_block_;
blocks.x = n_fits_ / info_.n_fits_per_block_;
cuda_modify_step_widths << < blocks, threads >> >(
gpu_data_.hessians_,
gpu_data_.lambdas_,
gpu_data_.scaling_vectors_,
info_.n_parameters_to_fit_,
gpu_data_.iteration_failed_,
gpu_data_.finished_,
info_.n_fits_per_block_);
CUDA_CHECK_STATUS(cudaGetLastError());
int n_parameters_pow2 = 1;
while (n_parameters_pow2 < info_.n_parameters_to_fit_)
{
n_parameters_pow2 *= 2;
}
//set up to run the Gauss Jordan elimination
int const n_equations = info_.n_parameters_to_fit_;
int const n_solutions = n_fits_;
threads.x = n_equations + 1;
threads.y = n_equations;
blocks.x = n_solutions;
//set the size of the shared memory area for each block
int const shared_size
= sizeof(float) * ((threads.x * threads.y)
+ n_parameters_pow2 + n_parameters_pow2);
//set up the singular_test vector
int * singular_tests;
CUDA_CHECK_STATUS(cudaMalloc((void**)&singular_tests, n_fits_ * sizeof(int)));
//run the Gauss Jordan elimination
cuda_gaussjordan << < blocks, threads, shared_size >> >(
gpu_data_.deltas_,
gpu_data_.gradients_,
gpu_data_.hessians_,
gpu_data_.finished_,
singular_tests,
info_.n_parameters_to_fit_,
n_parameters_pow2);
CUDA_CHECK_STATUS(cudaGetLastError());
//set up to update the lm_state_gpu_ variable with the Gauss Jordan results
threads.x = std::min(n_fits_, 256);
threads.y = 1;
blocks.x = int(std::ceil(float(n_fits_) / float(threads.x)));
//update the lm_state_gpu_ variable
cuda_update_state_after_gaussjordan << < blocks, threads >> >(
n_fits_,
singular_tests,
gpu_data_.states_);
CUDA_CHECK_STATUS(cudaGetLastError());
CUDA_CHECK_STATUS(cudaFree(singular_tests));
threads.x = info_.n_parameters_*info_.n_fits_per_block_;
threads.y = 1;
blocks.x = n_fits_ / info_.n_fits_per_block_;
cuda_update_parameters << < blocks, threads >> >(
gpu_data_.parameters_,
gpu_data_.prev_parameters_,
gpu_data_.deltas_,
info_.n_parameters_to_fit_,
gpu_data_.parameters_to_fit_indices_,
gpu_data_.finished_,
info_.n_fits_per_block_);
CUDA_CHECK_STATUS(cudaGetLastError());
}
//set up to run the Gauss Jordan elimination
int const n_equations = info_.n_parameters_to_fit_;
int const n_solutions = n_fits_;
threads.x = n_equations + 1;
threads.y = n_equations;
blocks.x = n_solutions;
I mean the link I pointed above give a work around to implement a vector function, however due to the large number of parameters I have to modify the threads and blocks variables, but I don't want to mess-up with how they're actually used in practice. I have just one function to fit.
The text was updated successfully, but these errors were encountered:
The limitation on the number of model parameters has been removed, by changing the way in which the calculation of the Hessian is parallelized. The new parameter limit is 1024.
Hi All,
With reference to "#46" I've implemented a model with many parameters (more than 32) and I'm trying to implement a "vector function" to be fit, plus the number of parameter is more than 32. Therefore I'm trying to hack a bit the code in such a way I'd be able to support my case. With reference to function below
What is the meaning of the following:
I mean the link I pointed above give a work around to implement a vector function, however due to the large number of parameters I have to modify the threads and blocks variables, but I don't want to mess-up with how they're actually used in practice. I have just one function to fit.
The text was updated successfully, but these errors were encountered: