The tutorial 7 program hangs before entering the run_encrypted_training function #520

whcjimmy · 2024-10-23T12:43:50Z

Hi everyone,

I'm trying to run the tutorial 7 on my computer. However, my program hangs before entering the run_encrypted_training function.

Can anyone help me to solve this issue?

Thanks in advance!

The error message:

% CUDA_VISIBLE_DEVICES= python3 Tutorial_7_Training_an_Encrypted_Neural_Network.py
2024-10-23 19:59:00.609660: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-23 19:59:00.620031: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-23 19:59:00.630573: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-23 19:59:00.633892: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-23 19:59:00.642926: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-23 19:59:01.092705: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/__init__.py:334: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  result = load_closure(f, **kwargs)
/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/nn/onnx_converter.py:176: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
  param = torch.from_numpy(numpy_helper.to_array(node))
Epoch: 0 Loss: 0.5381
/home/whcjimmy/workspace/MPCLIBS/CrypTen/tutorials/Tutorial_7_Training_an_Encrypted_Neural_Network.py:159: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  labels = torch.load('/tmp/train_labels.pth')
Process Process-2:
Traceback (most recent call last):
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 2425, in broadcast
    work.wait()
RuntimeError: [../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:81] Timed out waiting 1800000ms for recv operation to complete

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/whcjimmy/miniconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/whcjimmy/miniconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/mpc/context.py", line 30, in _launch
    return_value = func(*func_args, **func_kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/whcjimmy/workspace/MPCLIBS/CrypTen/tutorials/Tutorial_7_Training_an_Encrypted_Neural_Network.py", line 168, in run_encrypted_training
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/__init__.py", line 353, in load_from_party
    result = comm.get().broadcast_obj(None, src)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/communicator/communicator.py", line 234, in logging_wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/crypten/communicator/distributed_communicator.py", line 318, in broadcast_obj
    dist.broadcast(size, src, group=group)
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 85, in wrapper
    msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/whcjimmy/miniconda3/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 51, in _get_msg_dict
    def _get_msg_dict(func_name, *args, **kwargs) -> Dict[str, Any]:

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The tutorial 7 program hangs before entering the run_encrypted_training function #520

The tutorial 7 program hangs before entering the run_encrypted_training function #520

whcjimmy commented Oct 23, 2024

The tutorial 7 program hangs before entering the run_encrypted_training function #520

The tutorial 7 program hangs before entering the run_encrypted_training function #520

Comments

whcjimmy commented Oct 23, 2024