-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are InfiniBand and Torch Elastic system requirements? #7
Comments
Got this working on my local 3090, am adding modifications here: https://github.com/abacaj/adept-inference-local-3090 |
This change is working for me: abacaj@4e6a503 If you don't want to run this in docker, I didn't. You'll need to follow these steps at minimum:
Next:
Next find directory of the installed fused kernels path, mine was as below (venv):
Try running app with command in root project directory:
Send JSON PUT request to address:
Example body:
|
Thanks for figuring this out for everyone @abacaj |
Hi all, is there a solution for the problem of missing /dev/infiniband device? I have no rights to install the infiniband driver in my system. |
Megatron seems to be trying to connect to InfiniBand even when
NCCL_NET=Socket
,Error: network IB not found.
.docker_launch.sh
has--device=/dev/infiniband
with no mention in the readme of any related architecture requirements.Running
run_text_generation_server.py
hitsValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set
error which again seems to suggest the environment is expected to be running on a specific server configuration.Here are some miscellaneous other error I ran into when building:
Readme docker build command errors on Windows.
docker build -f docker/Dockerfile -t 'adeptdocker' .
->docker build -f docker/Dockerfile -t adeptdocker .
flash-attn==2.0.0.post1
fails to install and retries for 20 minutes locking docker into an operation that can't be aborted without rebooting.pip install flash-attn==2.0.0.post1
->pip install flash-attn==2.2.1
The text was updated successfully, but these errors were encountered: