Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some details on LMFitCuda::solve_equation_system() #50

Closed
lukkio88 opened this issue Mar 13, 2018 · 3 comments
Closed

some details on LMFitCuda::solve_equation_system() #50

lukkio88 opened this issue Mar 13, 2018 · 3 comments

Comments

@lukkio88
Copy link

Hi All,
With reference to "#46" I've implemented a model with many parameters (more than 32) and I'm trying to implement a "vector function" to be fit, plus the number of parameter is more than 32. Therefore I'm trying to hack a bit the code in such a way I'd be able to support my case. With reference to function below

void LMFitCUDA::solve_equation_system()
{
	dim3  threads(1, 1, 1);
	dim3  blocks(1, 1, 1);

	threads.x = info_.n_parameters_to_fit_*info_.n_fits_per_block_;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

	cuda_modify_step_widths << < blocks, threads >> >(
		gpu_data_.hessians_,
		gpu_data_.lambdas_,
		gpu_data_.scaling_vectors_,
		info_.n_parameters_to_fit_,
		gpu_data_.iteration_failed_,
		gpu_data_.finished_,
		info_.n_fits_per_block_);
	CUDA_CHECK_STATUS(cudaGetLastError());

	int n_parameters_pow2 = 1;

	while (n_parameters_pow2 < info_.n_parameters_to_fit_)
	{
		n_parameters_pow2 *= 2;
	}

	//set up to run the Gauss Jordan elimination
	int const n_equations = info_.n_parameters_to_fit_;
	int const n_solutions = n_fits_;

	threads.x = n_equations + 1;
	threads.y = n_equations;
	blocks.x = n_solutions;

	//set the size of the shared memory area for each block
	int const shared_size
		= sizeof(float) * ((threads.x * threads.y)
			+ n_parameters_pow2 + n_parameters_pow2);

	//set up the singular_test vector
	int * singular_tests;
	CUDA_CHECK_STATUS(cudaMalloc((void**)&singular_tests, n_fits_ * sizeof(int)));

	//run the Gauss Jordan elimination
	cuda_gaussjordan << < blocks, threads, shared_size >> >(
		gpu_data_.deltas_,
		gpu_data_.gradients_,
		gpu_data_.hessians_,
		gpu_data_.finished_,
		singular_tests,
		info_.n_parameters_to_fit_,
		n_parameters_pow2);
	CUDA_CHECK_STATUS(cudaGetLastError());

	//set up to update the lm_state_gpu_ variable with the Gauss Jordan results
	threads.x = std::min(n_fits_, 256);
	threads.y = 1;
	blocks.x = int(std::ceil(float(n_fits_) / float(threads.x)));

	//update the lm_state_gpu_ variable
	cuda_update_state_after_gaussjordan << < blocks, threads >> >(
		n_fits_,
		singular_tests,
		gpu_data_.states_);
	CUDA_CHECK_STATUS(cudaGetLastError());

	CUDA_CHECK_STATUS(cudaFree(singular_tests));

	threads.x = info_.n_parameters_*info_.n_fits_per_block_;
	threads.y = 1;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

	cuda_update_parameters << < blocks, threads >> >(
		gpu_data_.parameters_,
		gpu_data_.prev_parameters_,
		gpu_data_.deltas_,
		info_.n_parameters_to_fit_,
		gpu_data_.parameters_to_fit_indices_,
		gpu_data_.finished_,
		info_.n_fits_per_block_);
	CUDA_CHECK_STATUS(cudaGetLastError());
}

What is the meaning of the following:

	threads.x = info_.n_parameters_to_fit_*info_.n_fits_per_block_;
	blocks.x = n_fits_ / info_.n_fits_per_block_;
	//set up to run the Gauss Jordan elimination
	int const n_equations = info_.n_parameters_to_fit_;
	int const n_solutions = n_fits_;

	threads.x = n_equations + 1;
	threads.y = n_equations;
	blocks.x = n_solutions;

	threads.x = info_.n_parameters_*info_.n_fits_per_block_;
	threads.y = 1;
	blocks.x = n_fits_ / info_.n_fits_per_block_;

I mean the link I pointed above give a work around to implement a vector function, however due to the large number of parameters I have to modify the threads and blocks variables, but I don't want to mess-up with how they're actually used in practice. I have just one function to fit.

@superchromix
Copy link
Collaborator

We will have a fix for models using >32 parameters very soon.

@lukkio88
Copy link
Author

Thank you.

@superchromix
Copy link
Collaborator

The limitation on the number of model parameters has been removed, by changing the way in which the calculation of the Hessian is parallelized. The new parameter limit is 1024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants