Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] CPU load when idle #177

Closed
jan-janssen opened this issue Sep 12, 2023 · 7 comments
Closed

[bug] CPU load when idle #177

jan-janssen opened this issue Sep 12, 2023 · 7 comments

Comments

@jan-janssen
Copy link
Member

jan-janssen commented Sep 12, 2023

It works for a single core:

lmp = LammpsBase(
    cores=1,
    oversubscribe=False,
    enable_flux_backend=False,
    working_directory=".",
)

but calling:

lmp = LammpsBase(
    cores=2,
    oversubscribe=False,
    enable_flux_backend=False,
    working_directory=".",
)

results in one process with 100% CPU load.

Found by @pmrv

@jan-janssen jan-janssen changed the title CPU load [bug] CPU load when idle Sep 12, 2023
@jan-janssen
Copy link
Member Author

When the number of cores is increased further, than the number of cores with high CPU load also increases. So it seems the mpi broadcast https://github.com/pyiron/pylammpsmpi/blob/main/pylammpsmpi/mpi/lmpmpi.py#L488 is waiting for the socket to receive information https://github.com/pyiron/pympipool/blob/main/pympipool/shared/communication.py#L140

@jan-janssen
Copy link
Member Author

jan-janssen commented Sep 12, 2023

The CPU load is related to the MPI broadcast, here is a reduced example. Script reply.py:

import sys

from mpi4py import MPI

from pympipool.shared.communication import (
    interface_connect,
    interface_send,
    interface_receive,
)
from pympipool.shared.backend import parse_arguments


def main(argument_lst=None):
    if argument_lst is None:
        argument_lst = sys.argv
    argument_dict = parse_arguments(argument_lst=argument_lst)
    if MPI.COMM_WORLD.rank == 0:
        context, socket = interface_connect(
            host=argument_dict["host"], port=argument_dict["zmqport"]
        )
    else:
        context, socket = None, None

    while True:
        if MPI.COMM_WORLD.rank == 0:
            input_dict = interface_receive(socket=socket)
        else:
            input_dict = None
        input_dict = MPI.COMM_WORLD.bcast(input_dict, root=0)
        if MPI.COMM_WORLD.rank == 0 and input_dict is not None:
            interface_send(socket=socket, result_dict={"result": input_dict})


if __name__ == "__main__":
    main(argument_lst=sys.argv)

jupyter notebook to control the reply.py script:

import os
from pympipool import interface_bootup, interface_send, interface_receive

interface = interface_bootup(
    command_lst=["python", os.path.join(os.path.abspath("."), "reply.py")],
    cwd=None,
    cores=8,
    gpus_per_core=0,
    oversubscribe=False,
    enable_flux_backend=False,
    enable_slurm_backend=False,
    queue_adapter=None,
    queue_type=None,
    queue_adapter_kwargs=None,
)
interface.send_and_receive_dict(input_dict={"a": 1})

With this code 8 cores remain busy. In contrast when the reply script is replaced with:

import sys

from pympipool.shared.communication import (
    interface_connect,
    interface_send,
    interface_receive,
)
from pympipool.shared.backend import parse_arguments


def main(argument_lst=None):
    if argument_lst is None:
        argument_lst = sys.argv
    argument_dict = parse_arguments(argument_lst=argument_lst)
    context, socket = interface_connect(
        host=argument_dict["host"], port=argument_dict["zmqport"]
    )

    while True:
        input_dict = interface_receive(socket=socket)
        interface_send(socket=socket, result_dict={"result": input_dict})


if __name__ == "__main__":
    main(argument_lst=sys.argv)

And the number of cores is reduced to cores=1 in the jupyter notebook everything works fine.

@jan-janssen jan-janssen transferred this issue from pyiron/pylammpsmpi Sep 12, 2023
@jan-janssen
Copy link
Member Author

@pmrv I moved the issue to pympipool as it is related to the SocketInterface class. The openMPI documentation suggests export OMPI_MCA_mpi_yield_when_idle=1 but at least for me this did not work out of the box.

@jan-janssen
Copy link
Member Author

If you are testing with OpenMPI, you might have to set oversubscribe=True depending on you configuration.

@jan-janssen
Copy link
Member Author

If you are testing with OpenMPI, you might have to set oversubscribe=True depending on you configuration.

The debugging is simplified by #178

@jan-janssen
Copy link
Member Author

Maybe it is related to mpi4py/mpi4py#468

@jan-janssen
Copy link
Member Author

Maybe it is related to mpi4py/mpi4py#468

This fix was added in #279

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant