Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Circular dependency error with 0.13.2 #5140

Closed
jingyanwangms opened this issue Feb 15, 2024 · 3 comments
Closed

[BUG] Circular dependency error with 0.13.2 #5140

jingyanwangms opened this issue Feb 15, 2024 · 3 comments
Assignees
Labels
bug Something isn't working training

Comments

@jingyanwangms
Copy link

Describe the bug
Run huggingface transformers
CUDA_VISIBLE_DEVICES=0 python transformers/examples/pytorch/text-classification/run_glue.py --model_name_or_path microsoft/deberta-large --task_name MRPC --max_seq_length 128 --learning_rate 3e-6 --do_train --output_dir /dev/shm --overwrite_output_dir --max_steps 200 --logging_steps 20 --per_device_train_batch_size 32 --fp16

No more error if downgrading to deepspeed 0.13.1

/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_functional_collectives.py:28: UserWarning: Unable to import torchdynamo util is_torchdynamo_compiling, so won't support torchdynamo correctly
warnings.warn(
Traceback (most recent call last):
File "transformers/examples/pytorch/text-classification/run_glue.py", line 28, in
import evaluate
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/init.py", line 29, in
from .evaluation_suite import EvaluationSuite
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/evaluation_suite/init.py", line 10, in
from ..evaluator import evaluator
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/evaluator/init.py", line 17, in
from transformers.pipelines import SUPPORTED_TASKS as SUPPORTED_PIPELINE_TASKS
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/init.py", line 47, in
from .audio_classification import AudioClassificationPipeline
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/audio_classification.py", line 21, in
from .base import Pipeline, build_pipeline_init_args
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/base.py", line 34, in
from ..modelcard import ModelCard
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/modelcard.py", line 48, in
from .training_args import ParallelMode
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/training_args.py", line 70, in
from accelerate.state import AcceleratorState, PartialState
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/init.py", line 3, in
from .accelerator import Accelerator
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/accelerator.py", line 35, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/utils/init.py", line 158, in
from .fsdp_utils import load_fsdp_model, load_fsdp_optimizer, save_fsdp_model, save_fsdp_optimizer
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/utils/fsdp_utils.py", line 26, in
import torch.distributed.checkpoint as dist_cp
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/init.py", line 7, in
from .state_dict_loader import load_state_dict
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 10, in
from .default_planner import DefaultLoadPlanner
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/default_planner.py", line 14, in
from torch.distributed._tensor import DTensor
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_tensor/init.py", line 353, in
import torch.distributed._tensor._dynamo_utils
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_tensor/_dynamo_utils.py", line 1, in
from torch._dynamo import allow_in_graph
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/init.py", line 2, in
from . import allowed_functions, convert_frame, eval_frame, resume_execution
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/allowed_functions.py", line 27, in
from . import config
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/config.py", line 58, in
torch.onnx.is_in_onnx_export: False,
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/init.py", line 1884, in getattr
return importlib.import_module(f".{name}", name)
File "/opt/conda/envs/ptca/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/init.py", line 58, in
from ._internal.onnxruntime import (
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/_internal/onnxruntime.py", line 35, in
import onnxruntime # type: ignore[import]
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/init.py", line 54, in
from onnxruntime.capi import onnxruntime_validation
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 145, in
has_ortmodule, package_name, version, cuda_version = validate_build_package_info()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 140, in validate_build_package_info
raise import_ortmodule_exception
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 70, in validate_build_package_info
from onnxruntime.training.ortmodule import ORTModule # noqa: F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/init.py", line 36, in
from .ortmodule import ORTModule # noqa: F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/init.py", line 132, in
from .ortmodule import ORTModule # noqa: E402, F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/ortmodule.py", line 8, in
from ._torch_module_factory import TorchModuleFactory
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_torch_module_factory.py", line 8, in
from ._torch_module_ort import TorchModuleORT
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 13, in
from ._graph_execution_manager_factory import GraphExecutionManagerFactory
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager_factory.py", line 10, in
from ._inference_manager import InferenceManager
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 17, in
from ._graph_execution_manager import GraphExecutionManager, _RunStateInfo
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 23, in
from onnxruntime.training.utils.hooks import configure_ort_compatible_zero_stage3
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/utils/hooks/init.py", line 19, in
from ._zero_offload_subscriber import ZeROOffloadSubscriber, configure_ort_compatible_zero_stage3
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/utils/hooks/_zero_offload_subscriber.py", line 137, in
import deepspeed
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/init.py", line 25, in
from . import ops
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/init.py", line 6, in
from . import adam
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/adam/init.py", line 6, in
from .cpu_adam import DeepSpeedCPUAdam
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in
from deepspeed.utils import logger
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/utils/init.py", line 10, in
from .groups import *
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/utils/groups.py", line 28, in
from deepspeed import comm as dist
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/init.py", line 7, in
from .comm import *
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 31, in
from deepspeed.comm.ccl import CCLBackend
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/ccl.py", line 12, in
from .torch import TorchBackend
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 100, in
class TorchBackend(Backend):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 125, in TorchBackend
def get_all_gather_function(self):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/runtime/compiler.py", line 21, in disable
return torch.compiler.disable(func)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/compiler/init.py", line 95, in disable
return torch._dynamo.disable(fn, recursive)
AttributeError: partially initialized module 'torch._dynamo' has no attribute 'disable' (most likely due to a circular import)

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
importing deepspeed should not give error

ds_report output
Please run ds_report to give us details about your setup.

Screenshots
If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • GPU count and types [e.g. two machines with x8 A100s each]
  • Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]
  • Python version
  • Any other relevant info about your setup

Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else?

Docker context
Are you using a specific docker image that you can share?

Additional context
Add any other context about the problem here.

@jingyanwangms jingyanwangms added bug Something isn't working training labels Feb 15, 2024
@loadams
Copy link
Contributor

loadams commented Feb 16, 2024

@jingyanwangms - can you share the pip list from your machine? Specifically your torch version?

@loadams loadams self-assigned this Feb 16, 2024
jingyanwangms added a commit to microsoft/onnxruntime that referenced this issue Feb 22, 2024
### Description
Move import to when needed to avoid circular dependency error


### Motivation and Context
Fixes dependency error described here:
microsoft/DeepSpeed#5140

---------

Co-authored-by: Thiago Crepaldi <[email protected]>
@loadams
Copy link
Contributor

loadams commented Feb 23, 2024

@jingyanwangms - can you share an update on the status of this? Looks like you had to move the import, but its not clear to me why?

@loadams
Copy link
Contributor

loadams commented Mar 11, 2024

Closing this as the issue seems stale now. Please comment if you are able to come back to this.

@loadams loadams closed this as completed Mar 11, 2024
rraminen pushed a commit to rraminen/onnxruntime that referenced this issue May 7, 2024
…oft#19579)

### Description
Move import to when needed to avoid circular dependency error


### Motivation and Context
Fixes dependency error described here:
microsoft/DeepSpeed#5140

---------

Co-authored-by: Thiago Crepaldi <[email protected]>
groenenboomj pushed a commit to ROCm/onnxruntime that referenced this issue May 9, 2024
…oft#19579) (#35)

### Description
Move import to when needed to avoid circular dependency error


### Motivation and Context
Fixes dependency error described here:
microsoft/DeepSpeed#5140

---------

Co-authored-by: jingyanwangms <[email protected]>
Co-authored-by: Thiago Crepaldi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants