-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace psutil with multiprocessing #1119
Conversation
Could you please double-check if this works in the cluster when selecting a specific number of cpus? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1119 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 65 65
Lines 9059 9058 -1
=========================================
- Hits 9059 9058 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
For sure. Is there a specific test that you want me to run? In any case, the atomic change has been checked (as you can see from the IPython snippet above), and I've done it in maryah. |
Psutils was introduced when we realized that the cpu count/affinity was not correct using other strategies when submitting jobs via slurm. Therefore, the best test is to send jobs by specifying the number of cpu cores (less and more than the default) and check if the output is correct. |
What is the correct way of limiting the number of cores in slurm in order to test this? I have tried both |
@stavros11, indeed these options are correct but I think the only way to set the proper cpu affinity is this. |
src/qibo/backends/tensorflow.py
Outdated
import psutil | ||
|
||
self.nthreads = psutil.cpu_count(logical=True) | ||
self.nthreads = multiprocessing.cpu_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of guessing the number of available threads we might use tf.config.threading.get_inter_op_parallelism_threads
or tf.config.threading.get_intra_op_parallelism_threads
(please check which one is for single operator multi-threading) directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [2]: tf.config.get_inter_op_parallelism_threads
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 tf.config.get_inter_op_parallelism_threads
AttributeError: module 'tensorflow._api.v2.config' has no attribute 'get_inter_op_parallelism_threads'
In [3]: tf.config.get_intra_op_parallelism_threads
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 tf.config.get_intra_op_parallelism_threads
AttributeError: module 'tensorflow._api.v2.config' has no attribute 'get_intra_op_parallelism_threads'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they are now scoped in tf.config.threading
:
https://www.tensorflow.org/api_docs/python/tf/config/threading/get_intra_op_parallelism_threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [10]: tf.config.threading.get_inter_op_parallelism_threads()
Out[10]: 0
In [11]: tf.config.threading.get_intra_op_parallelism_threads()
Out[11]: 0
And if we do not set them, by default they are 0
:
A value of 0 means the system picks an appropriate number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, so ideally you could check for 0, if True assume tf is using all threads given by psutil affinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what should I do if not zero?
- set to zero myself
- ignore it (I will assume the user chose it manually)
- raise an error
- check if it's the same of the affinity (and then do something)
- set to the affinity
My favorite option is 2., since it conflicts the least with user choices (definitely a case in which less is more).
But in that case I should do nothing in general (not even check it if it's zero, since I will ignore anyhow the outcome)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think option 2 is reasonable, moreover qibo.set_threads
consistently does not set threads when using the tf backend, but please double check if this value is not really needed anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @stavros11 I found it this interesting piece of Qibo:
qibo/src/qibo/backends/tensorflow.py
Lines 207 to 214 in c1227d8
def set_threads(self, nthreads): | |
log.warning( | |
"`set_threads` is not supported by the tensorflow " | |
"backend. Please use tensorflow's thread setters: " | |
"`tf.config.threading.set_inter_op_parallelism_threads` " | |
"or `tf.config.threading.set_intra_op_parallelism_threads` " | |
"to switch the number of threads." | |
) |
So, I believe that it qibo.set_threads
is not consistently doing its job. Especially for the TensorflowBackend
. But I would not try to fix it in this PR.
$ srun -w jubail poetry run python -c "import psutil; print(psutil.cpu_count(logical=False))"
128
$ srun -w jubail poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256
$ srun -w jubail poetry run python -c "import multiprocessing; print(multiprocessing.cpu_count())"
256
$ srun -w jubail --cpus-per-task 1 poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256
256
$ srun -w jubail --cpus-per-task 1 --ntasks 1 poetry run python -c "import psutil; print(psutil.cpu_count(logical=True))"
256 |
Indeed, the cluster uses affinity, therefore you should try: |
In [6]: import psutil
In [7]: len(psutil.Process().cpu_affinity())
Out[7]: 2 This works, but it's not what we are currently using. Here, I just wanted to clean some unneeded dependency, not to introduce a new feature. In case, I would open a dedicated issue. |
Probably you have spotted a bug, in the sense that:
|
97aac5e
to
8b7f908
Compare
The .nthreads attribute is then set to 0, to avoid the default coming from the abstract (currently 1), since TensorFlow will use multiple threads by default
e026b9b
to
9ce8313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
psutil
is only used to get the CPU countqibo/src/qibo/backends/tensorflow.py
Line 200 in 1dd73f7
multiprocessing.cpu_count()
(which is standard library) - it would have not been possible forlogical = False
, but it is not what it's being used#814
Checklist: