Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace psutil with multiprocessing #1119

Merged
merged 4 commits into from
Dec 5, 2023
Merged

Replace psutil with multiprocessing #1119

merged 4 commits into from
Dec 5, 2023

Conversation

alecandido
Copy link
Member

psutil is only used to get the CPU count

#814

In [8]: psutil.cpu_count(logical=False)
Out[8]: 64

In [9]: psutil.cpu_count(logical=True) == multiprocessing.cpu_count()
Out[9]: True

Checklist:

  • Reviewers confirm new code works as expected.
  • Tests are passing.
  • Coverage does not decrease.
  • Documentation is updated.

@scarrazza
Copy link
Member

scarrazza commented Dec 4, 2023

Could you please double-check if this works in the cluster when selecting a specific number of cpus?

Copy link

codecov bot commented Dec 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c1227d8) 100.00% compared to head (9ce8313) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master     #1119   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           65        65           
  Lines         9059      9058    -1     
=========================================
- Hits          9059      9058    -1     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alecandido
Copy link
Member Author

Could you please double-check if this works in the cluster when selecting a specific number of cpus?

For sure. Is there a specific test that you want me to run?

In any case, the atomic change has been checked (as you can see from the IPython snippet above), and I've done it in maryah.
If that works, I don't imagine any way of failing during an actual run.

@scarrazza
Copy link
Member

Psutils was introduced when we realized that the cpu count/affinity was not correct using other strategies when submitting jobs via slurm. Therefore, the best test is to send jobs by specifying the number of cpu cores (less and more than the default) and check if the output is correct.

@stavros11
Copy link
Member

Psutils was introduced when we realized that the cpu count/affinity was not correct using other strategies when submitting jobs via slurm. Therefore, the best test is to send jobs by specifying the number of cpu cores (less and more than the default) and check if the output is correct.

What is the correct way of limiting the number of cores in slurm in order to test this? I have tried both srun and sbatch with different combinations of --ntasks, --cpus-per-task, --cores-per-socket but I always get the total number of threads (128) with both psutil and multiprocessing.

@scarrazza
Copy link
Member

@stavros11, indeed these options are correct but I think the only way to set the proper cpu affinity is this.

import psutil

self.nthreads = psutil.cpu_count(logical=True)
self.nthreads = multiprocessing.cpu_count()
Copy link
Member

@scarrazza scarrazza Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of guessing the number of available threads we might use tf.config.threading.get_inter_op_parallelism_threads or tf.config.threading.get_intra_op_parallelism_threads (please check which one is for single operator multi-threading) directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [2]: tf.config.get_inter_op_parallelism_threads
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 tf.config.get_inter_op_parallelism_threads

AttributeError: module 'tensorflow._api.v2.config' has no attribute 'get_inter_op_parallelism_threads'

In [3]: tf.config.get_intra_op_parallelism_threads
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 tf.config.get_intra_op_parallelism_threads

AttributeError: module 'tensorflow._api.v2.config' has no attribute 'get_intra_op_parallelism_threads'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@alecandido alecandido Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [10]:  tf.config.threading.get_inter_op_parallelism_threads()
Out[10]: 0

In [11]:  tf.config.threading.get_intra_op_parallelism_threads()
Out[11]: 0

And if we do not set them, by default they are 0:

A value of 0 means the system picks an appropriate number.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, so ideally you could check for 0, if True assume tf is using all threads given by psutil affinity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what should I do if not zero?

  1. set to zero myself
  2. ignore it (I will assume the user chose it manually)
  3. raise an error
  4. check if it's the same of the affinity (and then do something)
  5. set to the affinity

My favorite option is 2., since it conflicts the least with user choices (definitely a case in which less is more).
But in that case I should do nothing in general (not even check it if it's zero, since I will ignore anyhow the outcome)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 2 is reasonable, moreover qibo.set_threads consistently does not set threads when using the tf backend, but please double check if this value is not really needed anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @stavros11 I found it this interesting piece of Qibo:

def set_threads(self, nthreads):
log.warning(
"`set_threads` is not supported by the tensorflow "
"backend. Please use tensorflow's thread setters: "
"`tf.config.threading.set_inter_op_parallelism_threads` "
"or `tf.config.threading.set_intra_op_parallelism_threads` "
"to switch the number of threads."
)

So, I believe that it qibo.set_threads is not consistently doing its job. Especially for the TensorflowBackend. But I would not try to fix it in this PR.

@alecandido
Copy link
Member Author

$ srun -w jubail poetry run python -c "import psutil; print(psutil.cpu_count(logical=False))"
128
$ srun -w jubail poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256
$ srun -w jubail poetry run python -c "import multiprocessing; print(multiprocessing.cpu_count())"
256
$ srun -w jubail --cpus-per-task 1 poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256
256
$ srun -w jubail --cpus-per-task 1 --ntasks 1 poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256

@scarrazza
Copy link
Member

Indeed, the cluster uses affinity, therefore you should try: len(psutil.Process().cpu_affinity())

@alecandido
Copy link
Member Author

alecandido commented Dec 4, 2023

Indeed, the cluster uses affinity, therefore you should try: len(psutil.Process().cpu_affinity())

In [6]: import psutil

In [7]: len(psutil.Process().cpu_affinity())
Out[7]: 2

This works, but it's not what we are currently using. Here, I just wanted to clean some unneeded dependency, not to introduce a new feature.

In case, I would open a dedicated issue.

@scarrazza
Copy link
Member

Probably you have spotted a bug, in the sense that:

  • the affinity is the only proper way to set the total number of available threads for OMP or thread pool library
  • the tensorflow backend is the only backend which is not using affinity and instead randomly guessing the total number of threads instead of getting it from tf.config.threading directly.

The .nthreads attribute is then set to 0, to avoid the default coming from the abstract (currently 1), since TensorFlow will use multiple threads by default
Copy link
Member

@scarrazza scarrazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@alecandido alecandido merged commit eea1e76 into master Dec 5, 2023
21 checks passed
@alecandido alecandido deleted the drop-psutil branch December 5, 2023 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants