-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xFormers can't load C++/CUDA extensions. xFormers was built for: #10
Comments
+) This is my pip list (gnfactor) root@node100:/GNFactor# pip list absl-py 2.1.0 |
I got an error when starting gnfactor training:
my env setting is below
GPU: nvidia l40s
cuda: 11.7 ( in a docker environment)
ubuntu: 22.04
it seems that xformers is incompatible with torch 2.4.0 but " pip install torchvision --upgrade" in the installation.md occurs torch==2.4.0.
Do you know how to handle this? please help me ! :(
`
Error executing job with overrides: ['method=GNFACTOR_BC', 'rlbench.task_name=GNFACTOR_BC_20240725', 'rlbench.demo_path=/GNFactor/data/train_data', 'replay.path=/GNFactor/replay/GNFACTOR_BC_20240725', 'framework.start_seed=0', 'framework.use_wandb=False', 'method.use_wandb=False', 'framework.wandb_group=GNFACTOR_BC_20240725', 'ddp.num_devices=1', 'ddp.master_port=12345']
Traceback (most recent call last):
File "/GNFactor/GNFactor/train.py", line 100, in main
mp.spawn(run_seed_fn.run_seed,
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 282, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/GNFactor/third_party/YARR/yarr/runners/offline_train_runner.py", line 175, in start
loss = self._step(i, batch)
File "/GNFactor/third_party/YARR/yarr/runners/offline_train_runner.py", line 97, in _step
update_dict = self._agent.update(i, sampled_batch)
File "/GNFactor/GNFactor/helpers/preprocess_agent.py", line 58, in update
return self._pose_agent.update(step, replay_sample)
File "/GNFactor/GNFactor/agents/gnfactor_bc/qattention_stack_agent.py", line 47, in update
update_dict = qa.update(step, replay_sample)
File "/GNFactor/GNFactor/agents/gnfactor_bc/qattention_gnfactor_bc_agent.py", line 772, in update
rendering_loss_dict = self._q(obs,
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/GNFactor/GNFactor/agents/gnfactor_bc/qattention_gnfactor_bc_agent.py", line 227, in forward
rendering_loss_dict = self._neural_renderer(voxel_feat=voxel_grid_feature,
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward
else self._run_dd
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op)
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 196, in memory_efficient_attention
return _memory_efficient_attention(
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 294, in _memory_efficient_attention
return _memory_efficient_attention_forward(
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/ops/fmha/init.py", line 310, in _memory_efficient_attention_forward
op = _dispatch_fw(inp)
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/ops/fmha/dispatch.py", line 98, in _dispatch_fw
return _run_priority_list(
File "/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/ops/fmha/dispatch.py", line 73, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for
memory_efficient_attention_forward
with inputs:query : shape=(1, 4096, 1, 512) (torch.float32)
key : shape=(1, 4096, 1, 512) (torch.float32)
value : shape=(1, 4096, 1, 512) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
cutlassF
is not supported because:xFormers wasn't build with CUDA support
Operator wasn't built - see
python -m xformers.info
for more infoflshattF
is not supported because:xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
Operator wasn't built - see
python -m xformers.info
for more infotritonflashattF
is not supported because:xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
requires A100 GPU
smallkF
is not supported because:xFormers wasn't build with CUDA support
max(query.shape[-1] != value.shape[-1]) > 32
Operator wasn't built - see
python -m xformers.info
for more infounsupported embed per head: 512
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[2024-07-25 07:46:05,821][root][INFO] - Using env device cuda:0.
[2024-07-25 07:46:05,836][root][INFO] - Evaluating seed 0.
[2024-07-25 07:46:06,043][xformers][WARNING] - WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.0+cu118 with CUDA 1108 (you have 2.4.0+cu121)
Python 3.9.16 (you have 3.9.19)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/triton/softmax.py:30: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/root/miniconda3/envs/gnfactor/lib/python3.9/site-packages/xformers/triton/softmax.py:87: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Pl File "/romaking attention of type 'vanilla-xformers' with 512 in_channels [89/1941]building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
INFO:fvcore.common.checkpoint:[Checkpointer] Loading from sd://v1-3 ...
INFO:iopath.common.file_io:URL https://huggingface.co/CompVis/stable-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt cached in /root/.torch/iopath_cache/CompVis/stable
-diffusion-v-1-3-original/resolve/main/sd-v1-3.ckpt
WARNING:fvcore.common.checkpoint:The checkpoint state_dict contains keys that are not used by the model:
model_ema.{decay, num_updates}
diffusion feature dims: [512, 512, 2560, 1920, 960, 640, 512, 512]
[NeuralRenderer] foundation model diffusion is build. loss weight: 0.01
[NeuralRenderer] rgb loss weight: 1.0
[NeuralRenderer]: True
INFO:root:# Q Params: 44 M
[rank0]:[W725 07:45:58.101337864 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the fo
rward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unu
sed parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations
to have unused parameters. (function operator())
Error executing job with overrides: ['method=GNFACTOR_BC', 'rlbench.task_name=GNFACTOR_BC_20240725', 'rlbench.demo_path=/GNFactor/data/train_data', 'replay.path=/GNFactor
/replay/GNFACTOR_BC_20240725', 'framework.start_seed=0', 'framework.use_wandb=False', 'method.use_wandb=False', 'framework.wandb_group=GNFACTOR_BC_20240725', 'ddp.num_dev
ices=1', 'ddp.master_port=12345']
Traceback (most recent call last):
File "/GNFactor/GNFactor/train.py", line 100, in main`
The text was updated successfully, but these errors were encountered: