Multi Pin Bumps across PT/AO/tune/ET #1367

Jack-Khuu · 2024-11-12T19:27:23Z

Accounts for:

PyTorch changing weight_only default from False to True weights_only default flip for torch.load #1356
~~Moving from export to export_for_training Use training IR in torchchat export #1319~~
- Fixed in Bump PT pin to 20241028 #1419
~~Should also fix cuDNN error: RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. huggingface/diffusers#9704~~
- Resolution indirect from this PR
_convert_weight_to_int4pack API change from Split int4wo weight packing pytorch#139611, requiring AO PinBump for Add Int4CPULayout and update int4 woq ao#1278
Change in CUDA support in PT wheels Nightly builds missing from PyTorch cu121 repository since November 12, 2024 pytorch#140885
~~Lock on llama.cpp SHA to avoid active changes (e.g. deprecating make make : deprecate ggerganov/llama.cpp#10514)~~
- Fixed in Bump torch pin to 20241010 #1400
PT buildwheel migration [RFC] PyTorch next wheel build platform: manylinux-2.28 pytorch#123649
Missing OMP in nightly regression on aarch64 MacOS wheels are no longer built with OpenMP runtime pytorch#142266

Update:
Merging while accepting the errors in

*-gpu-aoti, aot-cuda: Due to byte alignment related to [AOTI] Fix #140546 and support AOTI package load for Intel GPU. pytorch#140664. Fix in progress and will be picked up in separate pin bump [AOTI] Relax input alignment assertion pytorch#143236
compile-gguf: Working on updating the adoption in torchchat Update int4pack related in torchchat gguf #1404

pytorch-bot · 2024-11-12T19:27:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Pending

As of commit 9579f18 with merge base 570aebc ():

NEW FAILURES - The following jobs have failed:

pull / compile-gguf (macos-14) (gh)
RuntimeError: Error(s) in loading state_dict for TextOnlyModel:
pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job (gh)
RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107
pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh)
RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107
pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh)
RuntimeError: run_func_( container_handle_, input_handles.data(), input_handles.size(), output_handles.data(), output_handles.size(), reinterpret_cast<AOTInductorStreamHandle>(stream_handle), proxy_executor_handle_) API call failed at /pytorch/torch/csrc/inductor/aoti_runner/model_container_runner.cpp, line 107
Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 953098219f3d19c11f82261431afb9c650ae9cd156ce94a98ebfdfe386042f7e /exec failed with exit code 134

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok · 2024-11-12T19:48:16Z

Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111

this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.

swolchok · 2024-11-12T23:35:21Z

initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.

malfet · 2024-11-12T23:41:15Z

torchchat/distributed/checkpoint.py

@@ -96,6 +96,7 @@ def _load_checkpoints_from_storage(
        checkpoint_path,
        map_location=builder_args.device,
        mmap=True,
+        weight_only=False,


Why does it needs false? All LLMs should be loadable with weights_only, shouldn't they? (Also, there are no such option as weight_only (or so I hope :P ))

Suggested change

weight_only=False,

weights_only=True,

Good catch on the typo;

As for setting it to False: I'd rather keep it behavior consistent in a pin bump PR; we can flip in a separate PR

malfet · 2024-11-12T23:41:56Z

torchchat/export.py

@@ -238,7 +238,7 @@ def _to_core_aten(
            raise ValueError(
                f"Expected passed in model to be an instance of fx.GraphModule, got {type(model)}"
            )
-        core_aten_ep = export(model, example_inputs, dynamic_shapes=dynamic_shapes)
+        core_aten_ep = export_for_training(model, example_inputs, dynamic_shapes=dynamic_shapes)


Not sure what we are doing here, but shouldn't TorchChat be exporting for inference?

This was picked up from @tugsbayasgalan's PR migrating away from export(), but export_for_inference does sound more in line with what we want

@tugsbayasgalan Can you share info on the new APIs?

Yep the intended use for inference IR is that user will export to a training IR and call run_decompositions() to lower to inference IR. In this flow, after core_aten_ep, there is to_edge call which lowers to inference. Export team is moving the IR to non-functional training IR so export_for_training will exist as an alias to official export. After we actually migrate official export, we will replace this call with export.

swolchok · 2024-11-12T23:51:16Z

digging up the generated source now.

generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:

    AtenTensorHandle buf0_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
    RAIIAtenTensorHandle buf0(buf0_handle);
    AtenTensorHandle buf1_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
    RAIIAtenTensorHandle buf1(buf1_handle);
    cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
    // Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
    static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
    static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
    auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
    auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
    const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
    AtenTensorHandle buf3_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));

The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)}; -- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.

swolchok · 2024-11-13T00:00:43Z

@desertfire any change the above is a quick fix for you?

swolchok · 2024-11-13T00:04:08Z

actually we might just need pytorch/pytorch#139411

swolchok · 2024-11-13T15:51:12Z

no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?

Jack-Khuu · 2024-11-13T18:54:29Z

I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go

Update: yup, vision is strict; will need to wait again

swolchok · 2024-11-14T16:37:50Z

_convert_weight_to_int4pack breakage appears to be from pytorch/pytorch#139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .

Jack-Khuu · 2024-11-14T17:28:07Z

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

Jack-Khuu · 2024-11-14T19:34:07Z

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

desertfire · 2024-11-18T14:48:54Z

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

pytorch/pytorch#139411 is relanded.

…to 11 for almalinux

Jack-Khuu · 2024-12-06T22:50:26Z

Missing omp.h for aarch is a regression on pt/pt, we're looking into it and will pinbump to a fix when found

pytorch/pytorch#142266

larryliu0820 · 2024-12-06T22:51:50Z

.github/workflows/more-tests.yml

@@ -9,23 +9,17 @@ on:

 jobs:
  test-cuda:
-    uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
+    uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main


What's the difference between these two?

Former is being replaced in the new wheel build: pytorch/pytorch#123649

Jack-Khuu · 2024-12-07T01:28:38Z

Note that the changes to gguf_loader are suspicious at best and an attempt to recover previous behavior prior to pytorch/pytorch#139611

Depending on the remaining CI, I may disable gguf-compile temporarily if it is the remaining failure

#1404 Looks into this

Jack-Khuu · 2024-12-10T19:20:25Z

Missing omp.h for aarch is a regression on pt/pt, we're looking into it and will pinbump to a fix when found

pytorch/pytorch#142266

Hmmm looks like the wheels aren't fixed with the changes...

Should fix the MacOS wheel regression

Jack-Khuu · 2024-12-13T23:00:15Z

gpu-aoti errors are from byte alignment (compiled as 16-byte, but input is not aligned) related to pytorch/pytorch#140664, that we will fix in a later bump pytorch/pytorch#143236

Bump PyTorch pin to 20241111

bcdfc54

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 12, 2024

bump to 1112

a976734

Jack-Khuu changed the title ~~Bump PyTorch pin to 20241111~~ Bump PyTorch pin to 20241112 Nov 12, 2024

Merge branch 'main' into pinbump1111

23b4536

malfet approved these changes Nov 12, 2024

View reviewed changes

Jack-Khuu mentioned this pull request Nov 13, 2024

Use training IR in torchchat export #1319

Closed

Jack-Khuu added 4 commits November 13, 2024 11:11

Update install_requirements.sh

6328935

Update install_requirements.sh

7aa96d7

Merge branch 'main' into pinbump1111

4a977a5

Update checkpoint.py typo

774ebb6

This was referenced Nov 15, 2024

cpp_wrapper_cpu: Ensure reinterpret_view results in RAIIAtenTensorHandle pytorch/pytorch#139411

Closed

Add Intel XPU device support to generate and serve #1361

Open

Merge branch 'main' into pinbump1111

655dc4a

Jack-Khuu mentioned this pull request Nov 16, 2024

AOTI filesize regression *.pt2 filesize is bigger than .*so #1365

Closed

Update install_requirements.sh

a6cb90c

Jack-Khuu mentioned this pull request Nov 18, 2024

Add Int4CPULayout and update int4 woq pytorch/ao#1278

Merged

Merge branch 'main' into pinbump1111

8cb415d

Jack-Khuu added 5 commits December 5, 2024 16:37

Drop PT version to 1126 (friendly vision version), update devtoolset …

6e54cba

…to 11 for almalinux

Test download toolchain instead of binutils

a05683d

Test removing devtoolset

411cf94

Remove dep on devtoolset 11 that doesnt' exist on the new machine

953a42e

Bump ET pin

6e8bfb1

Jack-Khuu changed the title ~~Multi Pin Bumps across PT/AO/tune~~ Multi Pin Bumps across PT/AO/tune/ET Dec 6, 2024

Jack-Khuu mentioned this pull request Dec 6, 2024

bump pytorch nightly version #1364

Closed

Merge branch 'main' into pinbump1111

5a80f5f

larryliu0820 reviewed Dec 6, 2024

View reviewed changes

Jack-Khuu added 3 commits December 6, 2024 15:42

Test nightly with updated vision

59e00d5

Merge branch 'main' into pinbump1111

d67eb86

Attempt to account for int4wo packing pt#139611

aae4eb3

Jack-Khuu added 3 commits December 6, 2024 18:32

Naive gguf int4wo attempt

25da485

Update install_requirements.sh to 1210

a9fa27e

Merge branch 'main' into pinbump1111

bdd2356

This was referenced Dec 10, 2024

MacOS wheels are no longer built with OpenMP runtime pytorch/pytorch#142266

Closed

Update int4pack related in torchchat gguf #1404

Merged

This was referenced Dec 12, 2024

Bump PT pin to 20241028 #1419

Merged

[KNOWN BUG] CPU AOTI Inference bug due to PT version, fix in progress #1420

Closed

Jack-Khuu added 3 commits December 13, 2024 11:30

Update install_requirements.sh to 20241213

bfe5826

Should fix the MacOS wheel regression

Merge branch 'main' into pinbump1111

02dc6a4

Update torchvision minor version to 22

dbb090f

Merge branch 'main' into pinbump1111

9579f18

Jack-Khuu merged commit bb72b09 into main Dec 14, 2024
48 of 53 checks passed

This was referenced Dec 14, 2024

[KNOWN BUG] gguf + GPU AOTI Inference bug due to PT version, fix in progress #1423

Closed

Misaligned AOTI input; potential perf gains by fixing? #1424

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi Pin Bumps across PT/AO/tune/ET #1367

Multi Pin Bumps across PT/AO/tune/ET #1367

Jack-Khuu commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

swolchok commented Nov 12, 2024

swolchok commented Nov 12, 2024

malfet Nov 12, 2024

Jack-Khuu Nov 13, 2024

malfet Nov 12, 2024

Jack-Khuu Nov 13, 2024

tugsbayasgalan Nov 13, 2024

swolchok commented Nov 12, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

Jack-Khuu commented Nov 13, 2024 •

edited

Loading

swolchok commented Nov 14, 2024

Jack-Khuu commented Nov 14, 2024 •

edited

Loading

Jack-Khuu commented Nov 14, 2024

desertfire commented Nov 18, 2024

Jack-Khuu commented Dec 6, 2024 •

edited

Loading

larryliu0820 Dec 6, 2024

Jack-Khuu Dec 6, 2024 •

edited

Loading

Jack-Khuu commented Dec 7, 2024 •

edited

Loading

Jack-Khuu commented Dec 10, 2024

Jack-Khuu commented Dec 13, 2024 •

edited

Loading

Multi Pin Bumps across PT/AO/tune/ET #1367

Multi Pin Bumps across PT/AO/tune/ET #1367

Conversation

Jack-Khuu commented Nov 12, 2024 • edited Loading

pytorch-bot bot commented Nov 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

❌ 5 New Failures, 1 Pending

swolchok commented Nov 12, 2024

swolchok commented Nov 12, 2024

malfet Nov 12, 2024

Choose a reason for hiding this comment

Jack-Khuu Nov 13, 2024

Choose a reason for hiding this comment

malfet Nov 12, 2024

Choose a reason for hiding this comment

Jack-Khuu Nov 13, 2024

Choose a reason for hiding this comment

tugsbayasgalan Nov 13, 2024

Choose a reason for hiding this comment

swolchok commented Nov 12, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

Jack-Khuu commented Nov 13, 2024 • edited Loading

swolchok commented Nov 14, 2024

Jack-Khuu commented Nov 14, 2024 • edited Loading

Jack-Khuu commented Nov 14, 2024

desertfire commented Nov 18, 2024

Jack-Khuu commented Dec 6, 2024 • edited Loading

larryliu0820 Dec 6, 2024

Choose a reason for hiding this comment

Jack-Khuu Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Jack-Khuu commented Dec 7, 2024 • edited Loading

Jack-Khuu commented Dec 10, 2024

Jack-Khuu commented Dec 13, 2024 • edited Loading

Jack-Khuu commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

Jack-Khuu commented Nov 13, 2024 •

edited

Loading

Jack-Khuu commented Nov 14, 2024 •

edited

Loading

Jack-Khuu commented Dec 6, 2024 •

edited

Loading

Jack-Khuu Dec 6, 2024 •

edited

Loading

Jack-Khuu commented Dec 7, 2024 •

edited

Loading

Jack-Khuu commented Dec 13, 2024 •

edited

Loading