-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Pin Bumps across PT/AO/tune/ET #1367
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 1 PendingAs of commit 9579f18 with merge base 570aebc (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows. |
initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now. |
torchchat/distributed/checkpoint.py
Outdated
@@ -96,6 +96,7 @@ def _load_checkpoints_from_storage( | |||
checkpoint_path, | |||
map_location=builder_args.device, | |||
mmap=True, | |||
weight_only=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it needs false? All LLMs should be loadable with weights_only, shouldn't they? (Also, there are no such option as weight_only
(or so I hope :P ))
weight_only=False, | |
weights_only=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the typo;
As for setting it to False: I'd rather keep it behavior consistent in a pin bump PR; we can flip in a separate PR
torchchat/export.py
Outdated
@@ -238,7 +238,7 @@ def _to_core_aten( | |||
raise ValueError( | |||
f"Expected passed in model to be an instance of fx.GraphModule, got {type(model)}" | |||
) | |||
core_aten_ep = export(model, example_inputs, dynamic_shapes=dynamic_shapes) | |||
core_aten_ep = export_for_training(model, example_inputs, dynamic_shapes=dynamic_shapes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what we are doing here, but shouldn't TorchChat be exporting for inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was picked up from @tugsbayasgalan's PR migrating away from export(), but export_for_inference does sound more in line with what we want
@tugsbayasgalan Can you share info on the new APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the intended use for inference IR is that user will export to a training IR and call run_decompositions() to lower to inference IR. In this flow, after core_aten_ep, there is to_edge call which lowers to inference. Export team is moving the IR to non-functional training IR so export_for_training will exist as an alias to official export. After we actually migrate official export, we will replace this call with export.
generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:
The problem seems to be |
@desertfire any change the above is a quick fix for you? |
actually we might just need pytorch/pytorch#139411 |
no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today? |
I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go Update: yup, vision is strict; will need to wait again |
_convert_weight_to_int4pack breakage appears to be from pytorch/pytorch#139611; I guess it's now called _convert_weight_to_int4pack_for_cpu . |
Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278 |
pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun |
pytorch/pytorch#139411 is relanded. |
Missing omp.h for aarch is a regression on pt/pt, we're looking into it and will pinbump to a fix when found |
@@ -9,23 +9,17 @@ on: | |||
|
|||
jobs: | |||
test-cuda: | |||
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main | |||
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Former is being replaced in the new wheel build: pytorch/pytorch#123649
Note that the changes to gguf_loader are suspicious at best and an attempt to recover previous behavior prior to pytorch/pytorch#139611 Depending on the remaining CI, I may disable gguf-compile temporarily if it is the remaining failure #1404 Looks into this |
Hmmm looks like the wheels aren't fixed with the changes... |
Should fix the MacOS wheel regression
gpu-aoti errors are from byte alignment (compiled as 16-byte, but input is not aligned) related to pytorch/pytorch#140664, that we will fix in a later bump pytorch/pytorch#143236 |
Accounts for:
PyTorch changing
weight_only
default fromFalse
toTrue
weights_only
default flip fortorch.load
#1356Moving fromexport
toexport_for_training
Use training IR in torchchat export #1319Should also fix cuDNN error: RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. huggingface/diffusers#9704_convert_weight_to_int4pack API change from Split int4wo weight packing pytorch#139611, requiring AO PinBump for Add Int4CPULayout and update int4 woq ao#1278
Change in CUDA support in PT wheels Nightly builds missing from PyTorch cu121 repository since November 12, 2024 pytorch#140885
Lock on llama.cpp SHA to avoid active changes (e.g. deprecating make make : deprecate ggerganov/llama.cpp#10514)PT buildwheel migration [RFC] PyTorch next wheel build platform: manylinux-2.28 pytorch#123649
Missing OMP in nightly regression on aarch64 MacOS wheels are no longer built with OpenMP runtime pytorch#142266
Update:
Merging while accepting the errors in
*-gpu-aoti
,aot-cuda
: Due to byte alignment related to [AOTI] Fix #140546 and support AOTI package load for Intel GPU. pytorch#140664. Fix in progress and will be picked up in separate pin bump [AOTI] Relax input alignment assertion pytorch#143236compile-gguf
: Working on updating the adoption in torchchat Update int4pack related in torchchat gguf #1404