Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with mpirun after updating to Ubuntu 22 #179

Open
fermza opened this issue Nov 24, 2022 · 21 comments
Open

issue with mpirun after updating to Ubuntu 22 #179

fermza opened this issue Nov 24, 2022 · 21 comments

Comments

@fermza
Copy link

fermza commented Nov 24, 2022

Dear Stephane,

after updating two PCs from Ubuntu 18 to Ubuntu 22.04, PhyML stopped to work in both. No matter the command, it stops with message:

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[20043,1],7]
  Exit code:    1

I tried to uninstal and reinstall both phyml and openmpi-bin packages, but it didn't help. I'm stuck since I haven't found a solution yet. Any pointers or guidance you can provide will be greatly appreciated.

I noted that in Ubuntu 22 the PhyML version is different from the one I have in other computers with older Ubuntu. Incidentally, the error also pops when I try with "phyml --version":

pc10@pc10:~/Desktop/running$ phyml --version


. Running the analysis on 8 CPUs..
. This is PhyML version 3.3.3:3.3.20211231-1.

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[19216,1],2]
  Exit code:    1
--------------------------------------------------------------------------
@fermza
Copy link
Author

fermza commented Nov 25, 2022

Update: I removed PhyML from the computer and resinstall via source. After running sh ./autogen.sh followed by configure --enable-phyml-mpi, all good. Then, after make, error pops:

~/phyml-3.3.20220408 make
make  all-recursive
make[1]: Entering directory '/home/fer/phyml-3.3.20220408'
Making all in src
make[2]: Entering directory '/home/fer/phyml-3.3.20220408/src'


.:  Building [phyml-mpi]. Version 3.3.20220408 :.


mpicc  -I. -I..     -std=c99 -O3 -fomit-frame-pointer -funroll-loops -Wall -Winline -finline -march=native -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
In file included from utilities.h:2543,
                 from spr.h:12,
                 from main.c:13:
mpi_boot.h:18:10: fatal error: mpi.h: No such file or directory
   18 | #include "mpi.h"
      |          ^~~~~~~
compilation terminated.
make[2]: *** [Makefile:1216: main.o] Error 1
make[2]: Leaving directory '/home/fer/phyml-3.3.20220408/src'
make[1]: *** [Makefile:365: all-recursive] Error 1
make[1]: Leaving directory '/home/fer/phyml-3.3.20220408'
make: *** [Makefile:306: all] Error 2

Not sure if it help with the diagnostics, but this is the situation.

@stephaneguindon
Copy link
Owner

Hi there. On my Linux box (Ubuntu 20.04), I have a file /usr/include/x86_64-linux-gnu/mpich/mpi.h which seems to be missing on your side. I'll try to upgrade my OS this afternoon to see if I can reproduce this issue.

@fermza
Copy link
Author

fermza commented Dec 1, 2022

Hi Stephane, I have tried to find something about the missing mpi.h file, but I couldn't find any fix yet. Any luck on your end?

@stephaneguindon
Copy link
Owner

stephaneguindon commented Dec 4, 2022

Yes :
sudo apt-get purge mpich
sudo apt-get install mpich
did the trick for me.

@fermza
Copy link
Author

fermza commented Dec 7, 2022

Hi Stephane, didn't work on my end (I tried on a couple of computers with the same issue, and it didn't fix it in neither).
Thanks anyway, I will try to keep looking for a solution.

@stephaneguindon
Copy link
Owner

Well, that's too bad. Please keep me updated on your progress as this issue will likely impact other users...

@fermza
Copy link
Author

fermza commented Dec 16, 2022

Will do! As soon as I get something I let you know.

@papelypluma
Copy link

Hi @fermza @stephaneguindon, I'm also encountering the same problem. I tried both on Ubuntu 22 and 18, but I keep getting the same error. I was wondering if by any chance you have had a quick fix to address this issue? Thanks!

@fermza
Copy link
Author

fermza commented Jan 17, 2023 via email

@papelypluma
Copy link

Thanks for the suggestion @fermza. Will consider that as a roundabout for the meantime.

@stephaneguindon
Copy link
Owner

Can try locate mpi.h in a terminal and post the result please?

@fermza
Copy link
Author

fermza commented Jan 19, 2023

Hi @stephaneguindon, here it is:

pc10@pc10:/$ sudo find . -print | grep -w 'mpi[.]h'
[sudo] password for pc10: 
./usr/src/linux-headers-5.15.0-58/include/linux/mpi.h
./usr/src/linux-headers-5.15.0-56/include/linux/mpi.h
./usr/lib/x86_64-linux-gnu/fortran/gfortran-mod-15/openmpi/mpi.h
./usr/lib/x86_64-linux-gnu/openmpi/include/mpi.h
./usr/include/x86_64-linux-gnu/mpich/mpi.h

I've got similar results running the command in other Ubuntu 22 computers at out lab.

@stephaneguindon
Copy link
Owner

What about mpicc -compile_info ? It should return something like gcc -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -I/usr/include/x86_64-linux-gnu/mpich -L/usr/lib/x86_64-linux-gnu -lmpich.

@fermza
Copy link
Author

fermza commented Jan 23, 2023

Hi, with this I found something that might explain what's happening. So I run the command you suggested, but didn't work:

~ mpicc -compile_info
gcc: error: unrecognized command-line option ‘-compile_info’
gcc: fatal error: no input files
compilation terminated.

Indeed, doesn't seem to be a parameter:

~ mpicc --help
Usage: gcc [options] file...
Options:
  -pass-exit-codes         Exit with highest error code from a phase.
  --help                   Display this information.
  --target-help            Display target specific command line options.
  --help={common|optimizers|params|target|warnings|[^]{joined|separate|undocumented}}[,...].
                           Display specific types of command line options.
  (Use '-v --help' to display command line options of sub-processes).
  --version                Display compiler version information.
  -dumpspecs               Display all of the built in spec strings.
  -dumpversion             Display the version of the compiler.
  -dumpmachine             Display the compiler's target processor.
  -print-search-dirs       Display the directories in the compiler's search path.
  -print-libgcc-file-name  Display the name of the compiler's companion library.
  -print-file-name=<lib>   Display the full path to library <lib>.
  -print-prog-name=<prog>  Display the full path to compiler component <prog>.
  -print-multiarch         Display the target's normalized GNU triplet, used as
                           a component in the library path.
  -print-multi-directory   Display the root directory for versions of libgcc.
  -print-multi-lib         Display the mapping between command line options and
                           multiple library search directories.
  -print-multi-os-directory Display the relative path to OS libraries.
  -print-sysroot           Display the target libraries directory.
  -print-sysroot-headers-suffix Display the sysroot suffix used to find headers.
  -Wa,<options>            Pass comma-separated <options> on to the assembler.
  -Wp,<options>            Pass comma-separated <options> on to the preprocessor.
  -Wl,<options>            Pass comma-separated <options> on to the linker.
  -Xassembler <arg>        Pass <arg> on to the assembler.
  -Xpreprocessor <arg>     Pass <arg> on to the preprocessor.
  -Xlinker <arg>           Pass <arg> on to the linker.
  -save-temps              Do not delete intermediate files.
  -save-temps=<arg>        Do not delete intermediate files.
  -no-canonical-prefixes   Do not canonicalize paths when building relative
                           prefixes to other gcc components.
  -pipe                    Use pipes rather than intermediate files.
  -time                    Time the execution of each subprocess.
  -specs=<file>            Override built-in specs with the contents of <file>.
  -std=<standard>          Assume that the input sources are for <standard>.
  --sysroot=<directory>    Use <directory> as the root directory for headers
                           and libraries.
  -B <directory>           Add <directory> to the compiler's search paths.
  -v                       Display the programs invoked by the compiler.
  -###                     Like -v but options quoted and commands not executed.
  -E                       Preprocess only; do not compile, assemble or link.
  -S                       Compile only; do not assemble or link.
  -c                       Compile and assemble, but do not link.
  -o <file>                Place the output into <file>.
  -pie                     Create a dynamically linked position independent
                           executable.
  -shared                  Create a shared library.
  -x <language>            Specify the language of the following input files.
                           Permissible languages include: c c++ assembler none
                           'none' means revert to the default behavior of
                           guessing the language based on the file's extension.

Options starting with -g, -f, -m, -O, -W, or --param are automatically
 passed on to the various sub-processes invoked by gcc.  In order to pass
 other options on to these processes the -W<letter> options must be used.

For bug reporting instructions, please see:
<file:///usr/share/doc/gcc-11/README.Bugs>.

In my case, this is mpicc version:

~ mpicc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So I am now wondering if this version, which I assume is the default version installed in Ubuntu 22 upon upgrading, may be the reason of this issue with PhyML? Can you indicate which version of the compiler you've got installed? Maybe if I downgrade to an older mpicc version I can use PhyML again. In this sense, I checked in an older computer (with Ubuntu 20 and working PhyML) I have an older mpicc:

fer@fer ~/Desktop $ mpicc --version 
gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I hope this can help. Thanks!

@stephaneguindon
Copy link
Owner

I'm using the same version of gcc.
My best guess at the moment is that you have both openmpi and mpich installed on your machine and that these two are conflicting. On my side, /bin/mpicc points to /usr/bin/mpicc.mpich. Could you please check that it is also the case for you?

@fermza
Copy link
Author

fermza commented Dec 14, 2023

Hi Stephane,
it's been a while and we are still unable to run Phyml in Ubuntu 22 (the same issue keeps popping up). I have tried several solutions in several computers. However, a few days ago I was reading something somewhere (sorry, I do not recall where), and I tried running with phyml-mpi in the command rather than simply phyml (not sure why I didn't try it before). Now the error I used to get all the time:

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[19216,1],2]
  Exit code:    1
--------------------------------------------------------------------------

did not popo up. It actually seems to start running, but now something new is happening:

~/Desktop/test phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check


. Running the analysis on 1 CPU..

. Command line: phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 





  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

        . Sequence filename:				 perk_2023_h08.phy
        . Data type:					 aa
        . Alphabet size:				 20
        . Sequence format:				 interleaved
        . Number of data sets:				 1
        . Nb of bootstrapped data sets:			 0
        . Compute approximate likelihood ratio test:	 yes (aBayes branch supports)
        . Model name:					 WAG
        . Proportion of invariable sites:		 0.000000
        . RAS model:					 discrete Gamma
        . Number of subst. rate catgs:			 4
        . Gamma distribution parameter:			 estimated
        . 'Middle' of each rate class:			 mean
        . Amino acid equilibrium frequencies:		 model
        . Optimise tree topology:			 yes
        . Starting tree:				 BioNJ
        . Add random input tree:			 no
        . Optimise branch lengths:			 yes
        . Minimum length of an edge:			 1e-08
        . Optimise substitution model parameters:	 yes
        . Run ID:					 none
        . Random seed:					 1702560731
        . Subtree patterns aliasing:			 no
        . Version:					 3.3.3:3.3.20211231-1
        . Byte alignment:				 1
        . AVX enabled:					 no
        . SSE enabled:					 no

  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////



. 462 patterns found (out of a total of 507 sites). 

. 72 sites without polymorphism (14.20%).


. Computing pairwise distances...

. Building BioNJ tree...

. WARNING: this analysis will use at least 131 MB of memory space...


. Score of initial tree: -34205.08
. -34202.259274 -- -36858.581928
. Edge: 187
. is_mixt_tree: 0
. Err. in file 'optimiz.c' (line 875)
. PhyML finished prematurely.

Besides the error, it also looks like it's using only one core (usually by default Phyml was using half of the threads available, this is a Ryzen 7 computer with 8 cores).
I am using version PhyML 3.3.3:3.3.20211231-1. Any suggestions on how to fix these issues? I feel I'm close to be able to run Phyml locally again, but what I have found online about this new error didn't help me.

@stephaneguindon
Copy link
Owner

stephaneguindon commented Dec 14, 2023

Hi there. From the command-line you're using here, it looks like you do not need to use the MPI version of PhyML (as you're not running any bootstrap analysis). I'd therefore suggest using the "standard" PhyML executable and post the error message returned, if any.

@fermza
Copy link
Author

fermza commented Dec 14, 2023

The problem is that running with phyml command alone I still get the error that originated this whole thread (which I haven't been able to fix). Here's an example running with phyml:

~/Desktop/test phyml -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 


. Running the analysis on 6 CPUs..

. Command line: /usr/lib/phyml/bin/phyml-mpi -i perk_2023_h08.phy -d aa -m WAG -a e --no_memory_check 





  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////

        . Sequence filename:				 perk_2023_h08.phy
        . Data type:					 aa
        . Alphabet size:				 20
        . Sequence format:				 interleaved
        . Number of data sets:				 1
        . Nb of bootstrapped data sets:			 0
        . Compute approximate likelihood ratio test:	 yes (aBayes branch supports)
        . Model name:					 WAG
        . Proportion of invariable sites:		 0.000000
        . RAS model:					 discrete Gamma
        . Number of subst. rate catgs:			 4
        . Gamma distribution parameter:			 estimated
        . 'Middle' of each rate class:			 mean
        . Amino acid equilibrium frequencies:		 model
        . Optimise tree topology:			 yes
        . Starting tree:				 BioNJ
        . Add random input tree:			 no
        . Optimise branch lengths:			 yes
        . Minimum length of an edge:			 1e-08
        . Optimise substitution model parameters:	 yes
        . Run ID:					 none
        . Random seed:					 1702574801
        . Subtree patterns aliasing:			 no
        . Version:					 3.3.3:3.3.20211231-1
        . Byte alignment:				 1
        . AVX enabled:					 no
        . SSE enabled:					 no

  ////////////////////////////////////.\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.//////////////////////////////////////////



. 462 patterns found (out of a total of 507 sites). 

. 72 sites without polymorphism (14.20%).


. Computing pairwise distances...

. Building BioNJ tree...

. WARNING: this analysis will use at least 131 MB of memory space...
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------


. Score of initial tree: -34205.08--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[45724,1],5]
  Exit code:    1
--------------------------------------------------------------------------

Note the bottom part, which is the same error shown at the beginning of the thread.
Thanks for you reply!

Regards

@stephaneguindon
Copy link
Owner

You probably need to talk to your sysadmin here. The command 'phyml' points to 'phyml-mpi', which is wrong. It should point to a binary called 'phyml' (instead of 'phyml-mpi')

@centrebiodiversitat
Copy link

centrebiodiversitat commented Dec 21, 2023

Hi @stephaneguindon,
I hope you can help me.

I'm using

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

and I am having the same issue as @fermza.

$phyml --version                  
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified


. Running the analysis on 64 CPUs..
. This is PhyML version 3.3.3:3.3.20211231-1.

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[30668,1],11]
  Exit code:    1
--------------------------------------------------------------------------

I've been following the different steps that you've mentioned above:

sudo apt-get purge mpich
sudo apt-get install mpich

then I've checked if /bin/mpicc points to /usr/bin/mpicc.mpich and I found out that it doesn't, so I've changed it doing:

$ readlink -f /bin/mpicc
/usr/bin/opal_wrapper
$ readlink -f /usr/bin/mpicc.mpich 
/usr/bin/mpicc.mpich
$ sudo ln -sf /usr/bin/mpicc.mpich /bin/mpicc
$ readlink -f /bin/mpicc
/usr/bin/mpicc.mpich

After that, I still have the same problem. When using phyml with not many sequences it works properly, but when trying to use it with a larger file this error appears

Just in case, my mpicc version is 12.3.0

Thank you very much for your attention,

@fermza
Copy link
Author

fermza commented Sep 26, 2024

Hi, I am back to mention that we finally found a way to get Phyml running again in our system.
For unrelated reasons, we decided to switch to Debian 12. We also installed Phyml via apt in the system, but kept getting the same error. A colleague came up with a potential solution:

  • First, we installed Phyml via conda (conda install -c bioconda phyml). In our case, we created a dedicated environment for this.
  • Now, running phyml via mpirun we are able to successfully execute Phyml! We use for example mpirun -np 8 phyml -i G2_2_h08.phy -d aa -m WAG -a 4

For some reason, Phyml doesn't seem to have issues with mpirun under conda. Maybe it's helpful to someone else.

Best,
Fer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants