You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Issue #224 reported problems with a multi-GPU setup and #234 introduced a quick fix. However, commenting out the fix, I cannot reproduce the earlier problems anymore.
Describe the solution you'd like
I wonder if the quick fix to avoid applying DataParallel twice can be removed by deleting the following three lines of code in reader and ranker nodes:
self.inferencer.model.save("tmp_model")
model = BaseAdaptiveModel.load(load_dir="tmp_model", device=device, strict=True)
shutil.rmtree('tmp_model')
Throughout the rest of the code, model should be replaced with self.inferencer.model if these lines are removed.
Additional context
Having apex installed on a machine with 4 GPUs and running tutorial 5 with python -m torch.distributed.launch , I couldn't find any difference in the logging output with or without the quick fix. It seems to run fine but I have never used apex before so I might have overlooked something. I could not find a check in FARM's optimize_model() that avoids applying DataParallel there if it was already done before: https://github.com/deepset-ai/FARM/blob/816b4e3e65c142f8a31a63833058b75fe0419ed4/farm/modeling/optimization.py#L272
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Issue #224 reported problems with a multi-GPU setup and #234 introduced a quick fix. However, commenting out the fix, I cannot reproduce the earlier problems anymore.
Describe the solution you'd like
I wonder if the quick fix to avoid applying DataParallel twice can be removed by deleting the following three lines of code in reader and ranker nodes:
Throughout the rest of the code,
model
should be replaced withself.inferencer.model
if these lines are removed.Additional context
Having apex installed on a machine with 4 GPUs and running tutorial 5 with
python -m torch.distributed.launch
, I couldn't find any difference in the logging output with or without the quick fix. It seems to run fine but I have never used apex before so I might have overlooked something. I could not find a check in FARM's optimize_model() that avoids applying DataParallel there if it was already done before: https://github.com/deepset-ai/FARM/blob/816b4e3e65c142f8a31a63833058b75fe0419ed4/farm/modeling/optimization.py#L272The text was updated successfully, but these errors were encountered: