-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/nh2 #107
base: feature/ramses-optimisations
Are you sure you want to change the base?
Feature/nh2 #107
Conversation
… OMP settings because the number of final galaxies is different for different runs
Requires some changes to source branch. Some changes alter the default behaviour of VR or default compilation. |
@@ -38,7 +38,7 @@ macro(vr_option optname optdesc status) | |||
endmacro() | |||
|
|||
# Input formats | |||
vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" ON) | |||
vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default compilation should not be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee, please update the source of the pull
impiusemesh = false; | ||
impiusetree = true; | ||
#else | ||
impiusemesh = true; | ||
impiusemesh = false; | ||
impiusetree = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default configuration should not be changed. Plus swift will use a mpi mesh decomposition so please revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?
@@ -0,0 +1 @@ | |||
//#define JS_ADT_ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file since it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee, please be careful with pull requests to not add unneeded files
impiusemesh = false; | ||
impiusetree = true; | ||
#else | ||
impiusemesh = true; | ||
impiusemesh = false; | ||
impiusetree = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?
#define ompsplitsubsearchnum 10000000 | ||
#define ompsplitsubsearchnum 100000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee why the large increase in the splitsubsearchnum
#define RAMSESIDTYPE unsigned int | ||
#define RAMSESIDTYPE int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee is it possible that this type could be different for different runs of RAMSES? Does it warrant wrapping this in idefs and adding some compilation options?
cout<<"%123123-----"<<endl; | ||
cout<<"%123123 MPIDomainDecompositionWithTree: Check the size of Part_mpi to include the number of baryons"<<endl; | ||
cout<<"%123123-----"<<endl; | ||
exit(9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee is there a reason for having %123123
?
// byteoffset+=RAMSES_fortran_read(Famr[i],header[i].npart[RAMSESGASTYPE]); | ||
// | ||
// //then skip the rest | ||
// for (j=0;j<14;j++) RAMSES_fortran_skip(Famr[i]); | ||
// if (lmin>header[i].nlevelmax) lmin=header[i].nlevelmax; | ||
// if (lmax<header[i].nlevelmax) lmax=header[i].nlevelmax; | ||
// //@} | ||
// //read header info from hydro files | ||
// //@{ | ||
// RAMSES_fortran_skip(Fhydro[i]); | ||
// RAMSES_fortran_read(Fhydro[i],header[i].nvarh); | ||
// RAMSES_fortran_skip(Fhydro[i]); | ||
// RAMSES_fortran_skip(Fhydro[i]); | ||
// RAMSES_fortran_skip(Fhydro[i]); | ||
// RAMSES_fortran_read(Fhydro[i],header[i].gamma_index); | ||
// //@} | ||
// | ||
// //then apparently read ngridlevels, which appears to be an array storing the number of grids at a given level | ||
// ngridlevel=new int[header[i].nlevelmax]; | ||
// ngridfile=new int[(1+header[i].nboundary)*header[i].nlevelmax]; | ||
// RAMSES_fortran_read(Famr[i],ngridlevel); | ||
// for (j=0;j<header[i].nlevelmax;j++) ngridfile[j]=ngridlevel[j]; | ||
// //skip some more | ||
// RAMSES_fortran_skip(Famr[i]); | ||
// //if nboundary>0 then need two skip twice then read ngridbound | ||
// if(header[i].nboundary>0) { | ||
// ngridbound=new int[header[i].nboundary*header[i].nlevelmax]; | ||
// RAMSES_fortran_skip(Famr[i]); | ||
// RAMSES_fortran_skip(Famr[i]); | ||
// //ngridbound is an array of some sort but I don't see what it is used for | ||
// RAMSES_fortran_read(Famr[i],ngridbound); | ||
// for (j=0;j<header[i].nlevelmax;j++) ngridfile[header[i].nlevelmax+j]=ngridbound[j]; | ||
// } | ||
// //skip some more | ||
// RAMSES_fortran_skip(Famr[i],2); | ||
// //if odering list in info is bisection need to skip more | ||
// if (orderingstring==string("bisection")) RAMSES_fortran_skip(Famr[i],5); | ||
// else RAMSES_fortran_skip(Famr[i],4); | ||
// | ||
// for (k=0;k<header[i].nboundary+1;k++) { | ||
// for (j=0;j<header[i].nlevelmax;j++) { | ||
// //first read amr for positions | ||
// chunksize=nchunk=ngridfile[k*header[i].nlevelmax+j]; | ||
// if (chunksize>0) { | ||
// xtempchunk=new RAMSESFLOAT[3*chunksize]; | ||
// //store son value in icell | ||
// icellchunk=new int[header[i].twotondim*chunksize]; | ||
// //skip grid index, next index and prev index. | ||
// RAMSES_fortran_skip(Famr[i],3); | ||
// //now read grid centre | ||
// for (idim=0;idim<header[i].ndim;idim++) { | ||
// RAMSES_fortran_read(Famr[i],&xtempchunk[idim*chunksize]); | ||
// } | ||
// //skip father index, then neighbours index | ||
// RAMSES_fortran_skip(Famr[i],1+2*header[i].ndim); | ||
// //read son index to determine if a cell in a specific grid is at the highest resolution and needs to be represented by a particle | ||
// for (idim=0;idim<header[i].twotondim;idim++) { | ||
// RAMSES_fortran_read(Famr[i],&icellchunk[idim*chunksize]); | ||
// } | ||
// //skip cpu map and refinement map (2^ndim*2) | ||
// RAMSES_fortran_skip(Famr[i],2*header[i].twotondim); | ||
// } | ||
// RAMSES_fortran_skip(Fhydro[i]); | ||
// //then read hydro for other variables (first is density, then velocity, then pressure, then metallicity ) | ||
// if (chunksize>0) { | ||
// //first read velocities (for 2 cells per number of dimensions (ie: cell corners?)) | ||
// for (idim=0;idim<header[i].twotondim;idim++) { | ||
// for (ivar=0;ivar<header[i].nvarh;ivar++) { | ||
// for (igrid=0;igrid<chunksize;igrid++) { | ||
// //once we have looped over all the hydro data then can start actually storing it into the particle structures | ||
// if (ivar==header[i].nvarh-1) { | ||
// //if cell has no internal cells or at maximum level produce a particle | ||
// if (icellchunk[idim*chunksize+igrid]==0 || j==header[i].nlevelmax-1) { | ||
// //first suggestion is to add some jitter to the particle positions | ||
// double dx = pow(0.5, j); | ||
// int ix, iy, iz; | ||
// //below assumes three dimensions with 8 corners (? maybe cells) per grid | ||
// iz = idim/4; | ||
// iy = (idim - (4*iz))/2; | ||
// ix = idim - (2*iy) - (4*iz); | ||
// // Calculate absolute coordinates + jitter, and generate particle | ||
// xtemp[0] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid] + (double(ix)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ; | ||
// xtemp[1] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+1*chunksize] + (double(iy)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ; | ||
// xtemp[2] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+2*chunksize] + (double(iz)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ; | ||
// //determine processor this particle belongs on based on its spatial position | ||
// ibuf=MPIGetParticlesProcessor(opt, xtemp[0],xtemp[1],xtemp[2]); | ||
// Nbuf[ibuf]++; | ||
// } | ||
// } | ||
// } | ||
// } | ||
// } | ||
// delete[] xtempchunk; | ||
// } | ||
// } | ||
// } | ||
// Famr[i].close(); | ||
// } | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee is there a reason for commenting all this?
@@ -7,6 +7,7 @@ | |||
|
|||
#include "allvars.h" | |||
#include "proto.h" | |||
#include "js_perform.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee remove this include.
xtempall = new RAMSESFLOAT[nbodies*3]; | ||
famtempall = new int[nbodies]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee, can you explain how xtempall
is a viable solution? It is quite possible that there are too many particles to fit in memory. Hence the previous approach to load stuff in chunks. Does it not make sense to only enable this type of load if and only if one can construct a local xtempall allocation and then build a tree to do the mpi decomposition?
void MPIInitialDomainDecompositionWithTree(Options &opt){ | ||
if (ThisTask==0) { | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , why is there any empty function?
tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL); | ||
//tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL); | ||
tree3dfofomp[i] = new KDTree(js_bsize, &Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , why the KDTree API change? The API should remain the same. Please verify the NBodylib.
@@ -44,14 +44,15 @@ OMP_Domain *OpenMPBuildDomains(Options &opt, const Int_t numompregions, KDTree * | |||
KDTree **OpenMPBuildLocalTrees(Options &opt, const Int_t numompregions, vector<Particle> &Part, OMP_Domain *ompdomain, Double_t *period) | |||
{ | |||
KDTree **tree3dfofomp = new KDTree*[numompregions]; | |||
Int_t i; | |||
Int_t i, js_bsize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee, update all variables that use js_
to not use this prefix.
|
||
js_count++; | ||
cout<<" 3DFOF Log - "<<i<<" th / "<<js_count<<" of "<<numompregions<<" / # ptcls : "<<ompdomain[i].ncount<<" / # groups : "<<ng<<" / Time [s] : "<<MyGetTime() - js_time<<endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , please remove this and also the js_time
|
||
getline(Finfo,stringbuf);//Mysterious mis-reading one line | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee, Is this true for all RAMSES formats? Might it be necessary to wrap this in ifdefs and add some compilation options to the ramses format? This might apply to the additions you have added below.
bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1); | ||
//bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1); | ||
bool runompfof = (numompregions>=1 && nthreads > 1 && opt.iopenmpfof == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense. @JinsuRhee you can only run searches in parallel if there are at least two regions. Please fix.
//tree = new KDTree(Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100); | ||
tree = new KDTree(rdist, Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100); | ||
if(opt.iverbose) cout<<ThisTask<<": Building Root Tree Done"<<endl; | ||
if(opt.iverbose) cout<<" # of OMP Domains -> "<<tree->GetNumLeafNodes()<<endl; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JinsuRhee , These API changes to the KDTree are not acceptable. Please update the library to ensure that it remains backward compatible.
else if (strcmp(tbuff, "MPI_use_tree_decomposition")==0) | ||
opt.impiusetree = (atoi(vbuff)>0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a comment about adding this config option. Does also require an update to allvars.cxx to ensure the meta data is written for the config option.
#pragma omp parallel default(shared) \ | ||
#pragma omp parallel default(shared) num_threads(1) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no point in running a parallel with num_threads(1)
. Remove
#pragma omp parallel for \ | ||
default(shared) private(v2,Ti,mass) schedule(dynamic) \ | ||
reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads) | ||
//#pragma omp parallel for \ | ||
//default(shared) private(v2,Ti,mass) schedule(dynamic) \ | ||
//reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
This request includes the following...