Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker for Code Aster MSVC support #65

Open
5 of 10 tasks
Krande opened this issue Jun 1, 2024 · 3 comments
Open
5 of 10 tasks

Tracker for Code Aster MSVC support #65

Krande opened this issue Jun 1, 2024 · 3 comments
Labels
MSVC Code Aster question Further information is requested

Comments

@Krande
Copy link
Contributor

Krande commented Jun 1, 2024

Comment:

Here is a "live" summary of the ongoing/remaining tasks for successfully compiling and distributing Code Aster for windows MSVC.

As of today the number of passing tests for the sequential tests for MSVC Code Aster is

  • (04.06.2024) 45% passing
  • (16.06.2024) 79% passing
  • (02.08.2024) 85% passing

The MSVC version is currently using LLVM Intel Fortran (IFX) to compile Code Aster and LLVM Flang/IFX to compile the various dependencies

Code Aster

  • Need to find an alternative to symlinking dlls. Symlinking requires administator privileges on windows and will not work on cf.
    • I have asked Mathieu about it on the official Code Aster issue https://gitlab.com/codeaster/src/-/issues/1#note_1929414100
    • To me it seems like we either make a patch with a rewrite of the pybind11 module definitions into separate pyd's or we try to see if static linking and making copies fixes things. I dont have much experience with static linking, so this might take me some time to figure out.
    • Alternatives to symlinking Code Aster pyd files #66
    • I managed to solve this by simply replacing the symlinks with code that loads the dll and returns the pybind11 module definition.
  • Ensure changes to source files are safe wrt. memory. Some tests are currently running into memory issues.
    • Solved by enabling openmp in Mumps and use MKL64 (not MKL) as recommended by mkl link advisor AND ensured that the Intel compiler bin path is not added to PATH (did cause conflicts). Memory increase was due to the integer represnting number of available openmp threads was huge (should be only 1).
  • Make sure we're using the appropriate lapack, liblapack package variants together with mkl.
    • We are currently using MKL together with blas/lapack/liblapack compiled against MKL. So far it's looking good.
  • Solve clobbering issue between intel-openmp and llvm-openmp. This will cause failure when llvm-openmp replaces the libiomp5md.dll from intel-openmp.
    • I solved this temporarily by simply re-compiling intel-fortran-rt against intel-openmp instead of llvm-openmp. No longer conflicts
  • Solve intermittent issue with OpenMP returning a too high number for the number of threads (>1e6 threads) causing extreme spikes in memory usages
    • I solved this by simply disabling openmp on windows.
  • Pass majority of sequential tests (>90%)

Dependencies

  1. Build and compile HDF5 with Fortran enabled on cf.
    LLVM Flang needs a fix:
  2. Compile libmed with Fortran enabled (and use "long long" MED int type) once HDF5 is compiled
  3. Compile Medcoupling with updated libmed
  4. Compile Metis with 64 bit integers
    • We need to either create a separate cf package "metis-aster" or make a dedicated variant of such as metis=5.1.0=*aster*
  5. Compile scotch with 64 bit integers
    • We need to either create a separate cf package "scotch-aster" or make a dedicated variant of such as scotch=7.0.4=*aster*
  6. Compile mumps against scotch and metis and use 64 bit integers
@Krande Krande added the question Further information is requested label Jun 1, 2024
@Krande
Copy link
Contributor Author

Krande commented Jun 13, 2024

This week I finally managed to properly run debugging on the fortran code. Primarily my attempts at making it work with CLION and the LLDB debugger was in vain. When using Visual Studio it worked straight away.

And in my first attempt I found a simple NULL_POINTER bug in the fortran code that for some reason works fine on GCC. By applying the following fix in bibfor/nonlinear/nmdoch.F90 at line 162 (in this commit) the number of passing tests jumped from 45% to 79%.

image

With a good debugging tool in place I am confident that I'll be able to fix the remaining issues related to source code in Code Aster.

Then it is a matter of figuring out

  • how we can resolve the intel-openmp/llvm-openmp conflict (which hopefully will be resolved once we can compile using LLVM flang on the fortran code and we can use LLVM openmp on all packages).
  • Skip symlinking. With a good debugging suite up and running I believe it will be easier to track down the issue in Alternatives to symlinking Code Aster pyd files #66

@Krande
Copy link
Contributor Author

Krande commented Jun 13, 2024

fyi @ldallolio

@ldallolio
Copy link
Contributor

ldallolio commented Jun 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MSVC Code Aster question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants