Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CICE6 max_blocks set on the fly #145

Closed
minghangli-uni opened this issue Apr 21, 2024 · 13 comments · Fixed by #149
Closed

CICE6 max_blocks set on the fly #145

minghangli-uni opened this issue Apr 21, 2024 · 13 comments · Fixed by #149
Assignees
Labels
cice6 Related to CICE6 enhancement New feature or request in progress

Comments

@minghangli-uni
Copy link
Contributor

The current max_blocks does not match the number of blocks per processor (numBlocksPerProc), which represents the number of cores per processor used in the computation. Hence, errors such as ERROR: num blocks exceed max or max_blocks too small may arise.

nblocks_x   = (nx_global-1)/block_size_x + 1
nblocks_y   = (ny_global-1)/block_size_y + 1
numBlocksXPerProc = (nblocks_x-1)/nprocsX + 1
numBlocksYPerProc = (nblocks_y-1)/nprocsY + 1
numBlocksPerProc = numBlocksXPerProc * numBlocksYPerProc

A straighforward solution (could have better ways) to address this issue is to update max_blocks in cicecore/cicedyn/infrastructure/ice_domain.F90 to align with the correct number of blocks per processor.

proc_decomposition is used to find an optimal 2d processor decomposition for different processor_shape (e.g., slenderX1, slenderX2, square-ice and square-pop), hence we dont need to worry about changes to the global number of cores in x and y directions with different selections of processor_shape. And this modification does not impact the selection of distribution_type (e.g., cartesian, roundrobin, sectrobin, sectcart etc).

! mli: proc_decomposition imported
   use ice_distribution, only: distrb,proc_decomposition
! mli: update optimal 2d processors
    integer (int_kind) :: &
       nprocs_x_mli,nprocs_y_mli ! num of processors after
                                 ! finding optimal 2d processor
...
! mli: update nprocs_x and nprocs_y
   call proc_decomposition(nprocs, nprocs_x_mli, nprocs_y_mli)

   if (my_task == master_task) then
     if (max_blocks < 1) then
!       max_blocks=( ((nx_global-1)/block_size_x + 1) *         &
!                    ((ny_global-1)/block_size_y + 1) - 1) / nprocs + 1
       max_blocks=((nx_global-1)/block_size_x/nprocs_x_mli+1) * &
                  ((ny_global-1)/block_size_y/nprocs_y_mli+1)
...
@minghangli-uni minghangli-uni added the bug Something isn't working label Apr 21, 2024
@anton-seaice
Copy link
Contributor

Thanks for looking into this ! Have you tested this?

We might be able to deprecate max_blocks? Although I guess its nice having to set max_blocks so users confirm the value is reasonable.

I suggest making a branch in https://github.com/access-nri/cice6 and we can test this :)

@anton-seaice anton-seaice added enhancement New feature or request cice6 Related to CICE6 and removed bug Something isn't working labels Apr 21, 2024
@minghangli-uni
Copy link
Contributor Author

Thanks for looking into this ! Have you tested this?

Yes I've tested it with slenderX2 and cartesian & square-ice and roundrobin and encountered no errors. But I just found a warning message: WARNING: ice no. blocks too large: decrease max to, when using square-ice and roundrobin.

We might be able to deprecate max_blocks? Although I guess its nice having to set max_blocks so users confirm the value is reasonable.

I suggest once we've fix all errors and warnings, we revisit the decision on whether to depreciate it or not.

I suggest making a branch in https://github.com/access-nri/cice6 and we can test this :)

Do you mean this repo? https://github.com/ACCESS-NRI/CICE

@anton-seaice
Copy link
Contributor

Yeah. Thats the repo. Interesting you still get the warning after making the change. I wonder if that is because some blocks are masked.

@anton-seaice
Copy link
Contributor

Hi @minghangli-uni

I had a look at ESCOMP/CICE@32f5d69

Should we just make nprocsX and nprocsY public ints in ice_grid. And then we can remove the calls to proc_decomposition in the create_distrb functions ?

@minghangli-uni
Copy link
Contributor Author

I dont think so because these two values nprocsX and nprocsY are determined at runtime.

@anton-seaice
Copy link
Contributor

Hi Minghang

I ran some cice standalone tests in https://github.com/ESCOMP/CICE/tree/1084d0ed78c2f3b19eb94d8f745f93325570f4f2 (the decomp_suite), and mostly the tests passed, except for those using the rake distribution. I guess maybe the rake distribution tries to have more blocks than typical around the equator?

Results are in /g/data/tm70/as2285/CICE/testsuite.decomp_mxblck0/gadi_intel_decomp_gx3_4x2x25x29x5.decomp_mxblck0

Happy to get you set-up to run this tests yourself, or you can fix it and then I will run again.

Example failure:

Domain Information

  Horizontal domain: nx =    100
                     ny =    116
  No. of categories: nc =      5
  No. of ice layers: ni =      7
  No. of snow layers:ns =      1
  Processors:  total    =     16
  Processor shape       = slenderX2
  Distribution type     = rake
  Distribution weight   = latitude
  Distribution wght file= unknown
  ew_boundary_type      = cyclic
  ns_boundary_type      = open
  maskhalo_dyn          =      F
  maskhalo_remap        =      F
  maskhalo_bound        =      F
  add_mpi_barriers      =      F
  debug_blocks          =      F
  block_size_x,_y       =      5    10
  max_blocks            =     15
  Number of ghost cells =      1

 (ice_read_global) read_global           11           1  -1.36148077740934     
   1.56905100613449        916.469935499633     
 (ice_read_global) read_global           12           1  0.000000000000000E+000
   25.0000000000000        168117.000000000     
 ice_domain work_unit, max_work_unit =          399          10
 ice_domain nocn =            0        3981      261821
 ice_domain work_per_block =            0          11        1011
(proc_decomposition)  Processors (X x Y) =    8 x    2
  
  
 (abort_ice)ABORTED: 
 (abort_ice) error = 
 (create_distrb_cart)ERROR: max_blocks too small (need at least 18)

@anton-seaice
Copy link
Contributor

Also - hammering out all the corner cases in this might not be possible. There might not be appetite to merge it upstream becuase of this, I am not sure. We could just include the calculation into om3-utils\payu\config-checks to get the max_blocks correct and set it in ice_in?

@minghangli-uni
Copy link
Contributor Author

Hi @anton-seaice, I dont have permission to /g/data/tm70/as2285/CICE/testsuite.decomp_mxblck0/gadi_intel_decomp_gx3_4x2x25x29x5.decomp_mxblck0

Can you set max_blocks=-1 and Distribution weight = block and try again? I manually calculated max_blocks with the parameters provided, and it came out to be 18.

nx_global = 100
ny_global = 116
block_size_x = 5
block_size_y = 10
nprocs=16
nprocs_x = 8
nprocs_y = 2
numBlocksXPerProc = int((nx_global-1)/block_size_x/nprocs_x)+1
numBlocksYPerProc = int((ny_global-1)/block_size_y/nprocs_y)+1   
numBlocksPerProc = numBlocksXPerProc * numBlocksYPerProc  
print(numBlocksPerProc)

@anton-seaice
Copy link
Contributor

anton-seaice commented Apr 24, 2024

Hi @minghangli-uni

Because we are using integers, every operation will be truncated (not rounded) to an integer.

NVM: my maths is wrong
i.e.
ny_global-1 = 99
99/block_size_y=9
9/nprocs_y=4
4+1=5
Where in float, that would equal 5.95
I don't know if it makes more sense to calculate this as a real, and convert to integer at the end, or do something else (presumable add some more +1/-1.

@minghangli-uni
Copy link
Contributor Author

Also - hammering out all the corner cases in this might not be possible. There might not be appetite to merge it upstream becuase of this, I am not sure. We could just include the calculation into om3-utils\payu\config-checks to get the max_blocks correct and set it in ice_in?

Hi @anton-seaice
The method used to calculate max_blocks doesnt align with the one applied in the code, which seems like a bug in the source code, albeit a minor one. But I agree we can incorporate the correct calculation into config-checks. I'm willing to help and give it a try.

@anton-seaice
Copy link
Contributor

Sorry - the max_blocks = 15 was me not setting up the tests correctly.

I still get max_blocks too small with max_blocks = 18:

Domain Information

  Horizontal domain: nx =    100
                     ny =    116
  No. of categories: nc =      5
  No. of ice layers: ni =      7
  No. of snow layers:ns =      1
  Processors:  total    =     16
  Processor shape       = slenderX2
  Distribution type     = rake
  Distribution weight   = blockall
  Distribution wght file= unknown
  ew_boundary_type      = cyclic
  ns_boundary_type      = open
  maskhalo_dyn          =      F
  maskhalo_remap        =      F
  maskhalo_bound        =      F
  add_mpi_barriers      =      F
  debug_blocks          =      F
  block_size_x,_y       =      5    10
  max_blocks            =     18
  Number of ghost cells =      1

 (ice_read_global) read_global           11           1  -1.36148077740934     
   1.56905100613449        916.469935499633     
 (ice_read_global) read_global           12           1  0.000000000000000E+000
   25.0000000000000        168117.000000000     
 ice_domain work_unit, max_work_unit =            6          10
 ice_domain nocn =            1          50        8024
 ice_domain work_per_block =            1          10        1711
(proc_decomposition)  Processors (X x Y) =    8 x    2
 (create_distrb_rake) rake in each direction
(proc_decomposition)  Processors (X x Y) =    8 x    2
  
 (abort_ice)ABORTED: 
 (abort_ice) error = (create_distrb_rake)ERROR: max_blocks too small

The new error is here:

https://github.com/CICE-Consortium/CICE/blob/29c7bcf839bc3ce48e4d6128d6f29ba73839222e/cicecore/shared/ice_distribution.F90#L1005

@anton-seaice
Copy link
Contributor

I had a look at how the rake distribution is made, it has to iterate around a loop to optimise the distribution. I don't think we will be able to get max_blocks perfect for that scenario. What you have done here is a much better estimate than the old estimate though, and should work for the Cartesian distributions.

The main downside is proc_decomposition gets called twice, which is messy.

I added auto detection of nprocs (ACCESS-NRI/CICE@5341473) to the branch.

If that looks ok, create a patch file and make a PR in this repo first. If the review looks ok we can make a PR to main CICE too.

@minghangli-uni
Copy link
Contributor Author

The main downside is proc_decomposition gets called twice, which is messy.

I think it is only called once for each run so wont affect overall performance?

I added auto detection of nprocs (ACCESS-NRI/CICE@5341473) to the branch.

This looks good to me. But I suggest a minor modification for auto detection of nprocs to this branch. ACCESS-NRI/CICE@fda7c92

create a patch file and make a PR in this repo first.

I will do it later soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cice6 Related to CICE6 enhancement New feature or request in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants