-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Circular dependency error with 0.13.2 #5140
Comments
@jingyanwangms - can you share the pip list from your machine? Specifically your torch version? |
jingyanwangms
added a commit
to microsoft/onnxruntime
that referenced
this issue
Feb 22, 2024
### Description Move import to when needed to avoid circular dependency error ### Motivation and Context Fixes dependency error described here: microsoft/DeepSpeed#5140 --------- Co-authored-by: Thiago Crepaldi <[email protected]>
@jingyanwangms - can you share an update on the status of this? Looks like you had to move the import, but its not clear to me why? |
Closing this as the issue seems stale now. Please comment if you are able to come back to this. |
rraminen
pushed a commit
to rraminen/onnxruntime
that referenced
this issue
May 7, 2024
…oft#19579) ### Description Move import to when needed to avoid circular dependency error ### Motivation and Context Fixes dependency error described here: microsoft/DeepSpeed#5140 --------- Co-authored-by: Thiago Crepaldi <[email protected]>
groenenboomj
pushed a commit
to ROCm/onnxruntime
that referenced
this issue
May 9, 2024
…oft#19579) (#35) ### Description Move import to when needed to avoid circular dependency error ### Motivation and Context Fixes dependency error described here: microsoft/DeepSpeed#5140 --------- Co-authored-by: jingyanwangms <[email protected]> Co-authored-by: Thiago Crepaldi <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Run huggingface transformers
CUDA_VISIBLE_DEVICES=0 python transformers/examples/pytorch/text-classification/run_glue.py --model_name_or_path microsoft/deberta-large --task_name MRPC --max_seq_length 128 --learning_rate 3e-6 --do_train --output_dir /dev/shm --overwrite_output_dir --max_steps 200 --logging_steps 20 --per_device_train_batch_size 32 --fp16
No more error if downgrading to deepspeed 0.13.1
/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_functional_collectives.py:28: UserWarning: Unable to import torchdynamo util
is_torchdynamo_compiling
, so won't support torchdynamo correctlywarnings.warn(
Traceback (most recent call last):
File "transformers/examples/pytorch/text-classification/run_glue.py", line 28, in
import evaluate
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/init.py", line 29, in
from .evaluation_suite import EvaluationSuite
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/evaluation_suite/init.py", line 10, in
from ..evaluator import evaluator
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/evaluate/evaluator/init.py", line 17, in
from transformers.pipelines import SUPPORTED_TASKS as SUPPORTED_PIPELINE_TASKS
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/init.py", line 47, in
from .audio_classification import AudioClassificationPipeline
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/audio_classification.py", line 21, in
from .base import Pipeline, build_pipeline_init_args
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/pipelines/base.py", line 34, in
from ..modelcard import ModelCard
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/modelcard.py", line 48, in
from .training_args import ParallelMode
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/transformers-4.38.0.dev0-py3.8.egg/transformers/training_args.py", line 70, in
from accelerate.state import AcceleratorState, PartialState
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/init.py", line 3, in
from .accelerator import Accelerator
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/accelerator.py", line 35, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/utils/init.py", line 158, in
from .fsdp_utils import load_fsdp_model, load_fsdp_optimizer, save_fsdp_model, save_fsdp_optimizer
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/accelerate/utils/fsdp_utils.py", line 26, in
import torch.distributed.checkpoint as dist_cp
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/init.py", line 7, in
from .state_dict_loader import load_state_dict
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/state_dict_loader.py", line 10, in
from .default_planner import DefaultLoadPlanner
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/checkpoint/default_planner.py", line 14, in
from torch.distributed._tensor import DTensor
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_tensor/init.py", line 353, in
import torch.distributed._tensor._dynamo_utils
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/distributed/_tensor/_dynamo_utils.py", line 1, in
from torch._dynamo import allow_in_graph
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/init.py", line 2, in
from . import allowed_functions, convert_frame, eval_frame, resume_execution
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/allowed_functions.py", line 27, in
from . import config
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/_dynamo/config.py", line 58, in
torch.onnx.is_in_onnx_export: False,
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/init.py", line 1884, in getattr
return importlib.import_module(f".{name}", name)
File "/opt/conda/envs/ptca/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/init.py", line 58, in
from ._internal.onnxruntime import (
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/onnx/_internal/onnxruntime.py", line 35, in
import onnxruntime # type: ignore[import]
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/init.py", line 54, in
from onnxruntime.capi import onnxruntime_validation
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 145, in
has_ortmodule, package_name, version, cuda_version = validate_build_package_info()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 140, in validate_build_package_info
raise import_ortmodule_exception
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_validation.py", line 70, in validate_build_package_info
from onnxruntime.training.ortmodule import ORTModule # noqa: F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/init.py", line 36, in
from .ortmodule import ORTModule # noqa: F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/init.py", line 132, in
from .ortmodule import ORTModule # noqa: E402, F401
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/ortmodule.py", line 8, in
from ._torch_module_factory import TorchModuleFactory
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_torch_module_factory.py", line 8, in
from ._torch_module_ort import TorchModuleORT
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 13, in
from ._graph_execution_manager_factory import GraphExecutionManagerFactory
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager_factory.py", line 10, in
from ._inference_manager import InferenceManager
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 17, in
from ._graph_execution_manager import GraphExecutionManager, _RunStateInfo
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 23, in
from onnxruntime.training.utils.hooks import configure_ort_compatible_zero_stage3
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/utils/hooks/init.py", line 19, in
from ._zero_offload_subscriber import ZeROOffloadSubscriber, configure_ort_compatible_zero_stage3
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/onnxruntime/training/utils/hooks/_zero_offload_subscriber.py", line 137, in
import deepspeed
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/init.py", line 25, in
from . import ops
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/init.py", line 6, in
from . import adam
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/adam/init.py", line 6, in
from .cpu_adam import DeepSpeedCPUAdam
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 8, in
from deepspeed.utils import logger
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/utils/init.py", line 10, in
from .groups import *
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/utils/groups.py", line 28, in
from deepspeed import comm as dist
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/init.py", line 7, in
from .comm import *
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 31, in
from deepspeed.comm.ccl import CCLBackend
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/ccl.py", line 12, in
from .torch import TorchBackend
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 100, in
class TorchBackend(Backend):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 125, in TorchBackend
def get_all_gather_function(self):
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/deepspeed/runtime/compiler.py", line 21, in disable
return torch.compiler.disable(func)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/torch/compiler/init.py", line 95, in disable
return torch._dynamo.disable(fn, recursive)
AttributeError: partially initialized module 'torch._dynamo' has no attribute 'disable' (most likely due to a circular import)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
importing deepspeed should not give error
ds_report output
Please run
ds_report
to give us details about your setup.Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
Launcher context
Are you launching your experiment with the
deepspeed
launcher, MPI, or something else?Docker context
Are you using a specific docker image that you can share?
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: