-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting StandAlone kernel #253
Comments
A little bit of work here The only kernels present are vector add (not really qmc-specific, but the simplest kernel) and 3D spline. Possible additional kernels
|
The plan is to make an official maintained QMCPACK repository with splines and updates at first. The idea is that they are clean, zero baggage, well documented and accessible for performance analysis, total refactoring, accessible by non-experts etc. We have much of the code, but which versions should @TApplencourt use to start from? I think reference cpu, cuda, gpu offload etc. would all be of interest. e.g. @PDoakORNL made fresh CUDA implementations in a fork of miniqmc... |
I can start with the spline of @markdewing if you (aka QMCPACK community) want. If I understand correctly this code handle {double,single} / {real, complex} data type and many more type of spline. My recommendation is to start with the bare minimum functionality (one type only for example) and to trim down the rest. It will make the porting / analysis easier. |
Please take a careful look at the one in this repo (here, https://github.com/QMCPACK/miniqmc ). I am not sure which branch is best though - someone else will need to chime in. miniqmc knows how to setup various sizes of problems corresponding to NiO. i.e. It is realistic. |
I would start with only single precision real. This is the "legacy CUDA" default in mainline and the one used in benchmarks. |
@markdewing does your implementation differs from miniqmc one? I would prefer to start from our has it look simpler. But if they are different in can trim down the miniqmc too. In all case, I will use miniqmc to generate realistic problem size. |
I started from the miniqmc version. For correctness checking, the driver prints a couple of values from the reference implementation and a couple of values from the non-reference version and the user has to compare them manually. This needs to be done better. The nx,ny,nz and nspline parameters for a few NiO problem sizes are: a32-e384 is 112x66x66 with 144 splines |
It would be quite easy to take these |
It looks like Peter's code has CPU, CUDA and Kokkos already. Peter - are/were these all working? It might well be better for Thomas to start with these since they look like a more comprehensive starting point. |
@markdewing Those spline counts look very strange to me, but perhaps I misunderstand? a32-e384 = 32 atoms and 384 electrons, so 192 electrons per spin = 192 splines. The others should be multiples of this number. Thomas: The grid size corresponds to the primitive cell, i.e. we assume we are doing tiling for the larger cells and running bulks, as we do for the ECP and CORAL benchmarks . |
Yes but probably I should merge to the main repo again. The onecode in my current branch is the current state. The Kokkos had been dropped at that point so I don’t believe it works anymore. I started to look at extracting just the batched/blocked spline eval yesterday, I think it could be made fairly compact especially if some of the variants are deleted/templated. |
@prckent I took the numbers from QMCPACK. My understanding is that the splines are complex, and depending on the k-point, some of the values are converted to two orbitals, and some are not (in |
Yes, that explains the difference. e.g. For the a32-e384 performance test we can see this on the line "NumDistinctOrbitals 144 numOrbs = 192" |
It took longer than expected[*], but with Kevin, we did some progress on extracting the inner May I ask people of this thread for review? I'm not sure If we initialize the input correctly. Do you know about some sanity check I can run on the output to verify we don't do any stupid? (the norm should be 1, or something like that...). Now, the Hessian and gradient values look suspiciously large... The next step is to create more robust testing, then put the [*] I would like to be able to say that it is because I work from home and have to take care of my young child. But I far as I know, I don't have a toddler... |
Hi,
I open this issue to discuss the possibility of extracting key miniQMC kernels into standalone files.
Indeed having some standalone kernels will help the collaboration between QMCPACK and other ECP projects/vendors.
Those kernels will be easy to install, to benchmark, and port to the different programming models. This will greatly facilitate the early exploration and validation of new hardware/software/programming model.
Regards,
Thomas
The text was updated successfully, but these errors were encountered: