CICE6 max_blocks set on the fly #145

minghangli-uni · 2024-04-21T23:14:10Z

The current max_blocks does not match the number of blocks per processor (numBlocksPerProc), which represents the number of cores per processor used in the computation. Hence, errors such as ERROR: num blocks exceed max or max_blocks too small may arise.

nblocks_x   = (nx_global-1)/block_size_x + 1
nblocks_y   = (ny_global-1)/block_size_y + 1
numBlocksXPerProc = (nblocks_x-1)/nprocsX + 1
numBlocksYPerProc = (nblocks_y-1)/nprocsY + 1
numBlocksPerProc = numBlocksXPerProc * numBlocksYPerProc

A straighforward solution (could have better ways) to address this issue is to update max_blocks in cicecore/cicedyn/infrastructure/ice_domain.F90 to align with the correct number of blocks per processor.

proc_decomposition is used to find an optimal 2d processor decomposition for different processor_shape (e.g., slenderX1, slenderX2, square-ice and square-pop), hence we dont need to worry about changes to the global number of cores in x and y directions with different selections of processor_shape. And this modification does not impact the selection of distribution_type (e.g., cartesian, roundrobin, sectrobin, sectcart etc).

! mli: proc_decomposition imported
   use ice_distribution, only: distrb,proc_decomposition
! mli: update optimal 2d processors
    integer (int_kind) :: &
       nprocs_x_mli,nprocs_y_mli ! num of processors after
                                 ! finding optimal 2d processor
...
! mli: update nprocs_x and nprocs_y
   call proc_decomposition(nprocs, nprocs_x_mli, nprocs_y_mli)

   if (my_task == master_task) then
     if (max_blocks < 1) then
!       max_blocks=( ((nx_global-1)/block_size_x + 1) *         &
!                    ((ny_global-1)/block_size_y + 1) - 1) / nprocs + 1
       max_blocks=((nx_global-1)/block_size_x/nprocs_x_mli+1) * &
                  ((ny_global-1)/block_size_y/nprocs_y_mli+1)
...

The text was updated successfully, but these errors were encountered:

anton-seaice · 2024-04-21T23:44:33Z

Thanks for looking into this ! Have you tested this?

We might be able to deprecate max_blocks? Although I guess its nice having to set max_blocks so users confirm the value is reasonable.

I suggest making a branch in https://github.com/access-nri/cice6 and we can test this :)

minghangli-uni · 2024-04-22T00:03:07Z

Thanks for looking into this ! Have you tested this?

Yes I've tested it with slenderX2 and cartesian & square-ice and roundrobin and encountered no errors. But I just found a warning message: WARNING: ice no. blocks too large: decrease max to, when using square-ice and roundrobin.

We might be able to deprecate max_blocks? Although I guess its nice having to set max_blocks so users confirm the value is reasonable.

I suggest once we've fix all errors and warnings, we revisit the decision on whether to depreciate it or not.

I suggest making a branch in https://github.com/access-nri/cice6 and we can test this :)

Do you mean this repo? https://github.com/ACCESS-NRI/CICE

anton-seaice · 2024-04-22T00:12:35Z

Yeah. Thats the repo. Interesting you still get the warning after making the change. I wonder if that is because some blocks are masked.

anton-seaice · 2024-04-22T04:37:00Z

Hi @minghangli-uni

I had a look at ESCOMP/CICE@32f5d69

Should we just make nprocsX and nprocsY public ints in ice_grid. And then we can remove the calls to proc_decomposition in the create_distrb functions ?

minghangli-uni · 2024-04-22T04:42:19Z

I dont think so because these two values nprocsX and nprocsY are determined at runtime.

anton-seaice · 2024-04-23T04:46:15Z

Hi Minghang

I ran some cice standalone tests in https://github.com/ESCOMP/CICE/tree/1084d0ed78c2f3b19eb94d8f745f93325570f4f2 (the decomp_suite), and mostly the tests passed, except for those using the rake distribution. I guess maybe the rake distribution tries to have more blocks than typical around the equator?

Results are in /g/data/tm70/as2285/CICE/testsuite.decomp_mxblck0/gadi_intel_decomp_gx3_4x2x25x29x5.decomp_mxblck0

Happy to get you set-up to run this tests yourself, or you can fix it and then I will run again.

Example failure:

Domain Information

  Horizontal domain: nx =    100
                     ny =    116
  No. of categories: nc =      5
  No. of ice layers: ni =      7
  No. of snow layers:ns =      1
  Processors:  total    =     16
  Processor shape       = slenderX2
  Distribution type     = rake
  Distribution weight   = latitude
  Distribution wght file= unknown
  ew_boundary_type      = cyclic
  ns_boundary_type      = open
  maskhalo_dyn          =      F
  maskhalo_remap        =      F
  maskhalo_bound        =      F
  add_mpi_barriers      =      F
  debug_blocks          =      F
  block_size_x,_y       =      5    10
  max_blocks            =     15
  Number of ghost cells =      1

 (ice_read_global) read_global           11           1  -1.36148077740934     
   1.56905100613449        916.469935499633     
 (ice_read_global) read_global           12           1  0.000000000000000E+000
   25.0000000000000        168117.000000000     
 ice_domain work_unit, max_work_unit =          399          10
 ice_domain nocn =            0        3981      261821
 ice_domain work_per_block =            0          11        1011
(proc_decomposition)  Processors (X x Y) =    8 x    2
  
  
 (abort_ice)ABORTED: 
 (abort_ice) error = 
 (create_distrb_cart)ERROR: max_blocks too small (need at least 18)

anton-seaice · 2024-04-23T05:07:42Z

Also - hammering out all the corner cases in this might not be possible. There might not be appetite to merge it upstream becuase of this, I am not sure. We could just include the calculation into om3-utils\payu\config-checks to get the max_blocks correct and set it in ice_in?

minghangli-uni · 2024-04-23T06:15:11Z

Hi @anton-seaice, I dont have permission to /g/data/tm70/as2285/CICE/testsuite.decomp_mxblck0/gadi_intel_decomp_gx3_4x2x25x29x5.decomp_mxblck0

Can you set max_blocks=-1 and Distribution weight = block and try again? I manually calculated max_blocks with the parameters provided, and it came out to be 18.

nx_global = 100
ny_global = 116
block_size_x = 5
block_size_y = 10
nprocs=16
nprocs_x = 8
nprocs_y = 2
numBlocksXPerProc = int((nx_global-1)/block_size_x/nprocs_x)+1
numBlocksYPerProc = int((ny_global-1)/block_size_y/nprocs_y)+1   
numBlocksPerProc = numBlocksXPerProc * numBlocksYPerProc  
print(numBlocksPerProc)

anton-seaice · 2024-04-24T00:47:33Z

Hi @minghangli-uni

Because we are using integers, every operation will be truncated (not rounded) to an integer.

NVM: my maths is wrong
i.e.
ny_global-1 = 99
99/block_size_y=9
9/nprocs_y=4
4+1=5
Where in float, that would equal 5.95
I don't know if it makes more sense to calculate this as a real, and convert to integer at the end, or do something else (presumable add some more +1/-1.

minghangli-uni · 2024-04-24T03:22:44Z

Also - hammering out all the corner cases in this might not be possible. There might not be appetite to merge it upstream becuase of this, I am not sure. We could just include the calculation into om3-utils\payu\config-checks to get the max_blocks correct and set it in ice_in?

Hi @anton-seaice
The method used to calculate max_blocks doesnt align with the one applied in the code, which seems like a bug in the source code, albeit a minor one. But I agree we can incorporate the correct calculation into config-checks. I'm willing to help and give it a try.

anton-seaice · 2024-04-24T04:23:31Z

Sorry - the max_blocks = 15 was me not setting up the tests correctly.

I still get max_blocks too small with max_blocks = 18:

Domain Information

  Horizontal domain: nx =    100
                     ny =    116
  No. of categories: nc =      5
  No. of ice layers: ni =      7
  No. of snow layers:ns =      1
  Processors:  total    =     16
  Processor shape       = slenderX2
  Distribution type     = rake
  Distribution weight   = blockall
  Distribution wght file= unknown
  ew_boundary_type      = cyclic
  ns_boundary_type      = open
  maskhalo_dyn          =      F
  maskhalo_remap        =      F
  maskhalo_bound        =      F
  add_mpi_barriers      =      F
  debug_blocks          =      F
  block_size_x,_y       =      5    10
  max_blocks            =     18
  Number of ghost cells =      1

 (ice_read_global) read_global           11           1  -1.36148077740934     
   1.56905100613449        916.469935499633     
 (ice_read_global) read_global           12           1  0.000000000000000E+000
   25.0000000000000        168117.000000000     
 ice_domain work_unit, max_work_unit =            6          10
 ice_domain nocn =            1          50        8024
 ice_domain work_per_block =            1          10        1711
(proc_decomposition)  Processors (X x Y) =    8 x    2
 (create_distrb_rake) rake in each direction
(proc_decomposition)  Processors (X x Y) =    8 x    2
  
 (abort_ice)ABORTED: 
 (abort_ice) error = (create_distrb_rake)ERROR: max_blocks too small

The new error is here:

https://github.com/CICE-Consortium/CICE/blob/29c7bcf839bc3ce48e4d6128d6f29ba73839222e/cicecore/shared/ice_distribution.F90#L1005

anton-seaice · 2024-04-24T06:19:12Z

I had a look at how the rake distribution is made, it has to iterate around a loop to optimise the distribution. I don't think we will be able to get max_blocks perfect for that scenario. What you have done here is a much better estimate than the old estimate though, and should work for the Cartesian distributions.

The main downside is proc_decomposition gets called twice, which is messy.

I added auto detection of nprocs (ACCESS-NRI/CICE@5341473) to the branch.

If that looks ok, create a patch file and make a PR in this repo first. If the review looks ok we can make a PR to main CICE too.

minghangli-uni · 2024-04-29T00:38:58Z

The main downside is proc_decomposition gets called twice, which is messy.

I think it is only called once for each run so wont affect overall performance?

I added auto detection of nprocs (ACCESS-NRI/CICE@5341473) to the branch.

This looks good to me. But I suggest a minor modification for auto detection of nprocs to this branch. ACCESS-NRI/CICE@fda7c92

create a patch file and make a PR in this repo first.

I will do it later soon.

minghangli-uni added the bug Something isn't working label Apr 21, 2024

anton-seaice added enhancement New feature or request cice6 Related to CICE6 and removed bug Something isn't working labels Apr 21, 2024

anton-seaice assigned anton-seaice and minghangli-uni Apr 21, 2024

anton-seaice mentioned this issue Apr 30, 2024

Deprecate nprocs, and better estimate for max_blocks #149

Merged

dougiesquire added the in progress label May 2, 2024

anton-seaice closed this as completed in #149 May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CICE6 max_blocks set on the fly #145

CICE6 max_blocks set on the fly #145

minghangli-uni commented Apr 21, 2024

anton-seaice commented Apr 21, 2024

minghangli-uni commented Apr 22, 2024

anton-seaice commented Apr 22, 2024

anton-seaice commented Apr 22, 2024

minghangli-uni commented Apr 22, 2024

anton-seaice commented Apr 23, 2024

anton-seaice commented Apr 23, 2024

minghangli-uni commented Apr 23, 2024

anton-seaice commented Apr 24, 2024 •

edited

Loading

minghangli-uni commented Apr 24, 2024

anton-seaice commented Apr 24, 2024

anton-seaice commented Apr 24, 2024

minghangli-uni commented Apr 29, 2024

CICE6 max_blocks set on the fly #145

CICE6 max_blocks set on the fly #145

Comments

minghangli-uni commented Apr 21, 2024

anton-seaice commented Apr 21, 2024

minghangli-uni commented Apr 22, 2024

anton-seaice commented Apr 22, 2024

anton-seaice commented Apr 22, 2024

minghangli-uni commented Apr 22, 2024

anton-seaice commented Apr 23, 2024

anton-seaice commented Apr 23, 2024

minghangli-uni commented Apr 23, 2024

anton-seaice commented Apr 24, 2024 • edited Loading

minghangli-uni commented Apr 24, 2024

anton-seaice commented Apr 24, 2024

anton-seaice commented Apr 24, 2024

minghangli-uni commented Apr 29, 2024

anton-seaice commented Apr 24, 2024 •

edited

Loading