Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error occured in Torch files #497

Open
Ext1nguisher opened this issue Jan 31, 2025 · 0 comments
Open

[BUG] Error occured in Torch files #497

Ext1nguisher opened this issue Jan 31, 2025 · 0 comments

Comments

@Ext1nguisher
Copy link

Ext1nguisher commented Jan 31, 2025

I`m new to AI model, but I`m not new to python, so I`m sure I`m going to be of help.

Describe the bug
I tried to setup the model locally and I followed the instruction in README. At first, everything went smoothly.
Until I tried this command:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /mypath/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200 
And then a Traceback occured:

Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 348, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
    run(args)
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 229, in launch_agent
    master_addr, master_port = _get_addr_and_port(rdzv_parameters)
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 169, in _get_addr_and_port
    master_addr, master_port = parse_rendezvous_endpoint(endpoint, default_port=-1)
  File "/usr/local/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/utils.py", line 104, in parse_rendezvous_endpoint
    raise ValueError(
ValueError: The hostname of the rendezvous endpoint '/deepseek/DeepSeek-V3-Demo:29500' must be a dot-separated list of labels, an IPv4 address, or an IPv6 address.

To Reproduce
Follow the instruction in README until this.,than run my command instead.
Expected behavior
I am able to chat with Deepseek-V3 locally.

Screenshots

Image

Additional context
I`m using a wsl Linux subsystem in Windows 11. So it's kinda like using a virtual machine.
I`m not so sure wether this was a Deepseek issue or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant