sync : ggml (conv 1d + 2d updates) #3468

ggerganov · 2023-10-04T09:51:52Z

While doing this sync, I noticed that the ./bin/test-quantize-perf example produces undefined behaviour when the Addr and UB sanitizers are enabled:

resolved:

43bbfb7

unresolved:

q4_1
  quantize_row_q_reference
    4096 values (0.02 MB)
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16: runtime error: -1.20472 is outside the range of representable values of type 'unsigned long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16 in 
      min cycles/32 vals   :      0.00
      avg cycles/32 vals   :      0.00
      float32 throughput   :      0.17 GB/s
      quantized throughput :      0.03 GB/s

  quantize_row_q
    4096 values (0.02 MB)
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16: runtime error: -1.20472 is outside the range of representable values of type 'unsigned long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16 in 
      min cycles/32 vals   :      0.00
      avg cycles/32 vals   :      0.00
      float32 throughput   :      0.17 GB/s
      quantized throughput :      0.03 GB/s

  dequantize_row_q
    4096 values (0.02 MB)
      min cycles/32 vals   :      0.00
      avg cycles/32 vals   :      0.00
      float32 throughput   :      0.30 GB/s
      quantized throughput :      0.05 GB/s

  quantize_row_q_dot
    4096 values (0.02 MB)
      min cycles/32 vals   :      0.00
      avg cycles/32 vals   :      0.00
      float32 throughput   :      0.19 GB/s
      quantized throughput :      0.03 GB/s

  vec_dot_q
    4096 values (0.02 MB)
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16: runtime error: -7.84407e+35 is outside the range of representable values of type 'unsigned long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__functional/invoke.h:478:16 in

Also, ./bin/test-quantize-perf crashes in Release on latest Mac Sonoma 14:

q2_K
  quantize_row_q_reference
    4096 values (0.02 MB)
Illegal instruction: 4

Bisecting, this starts crashing at 2db94d9, which does not make much sense. I'm wondering if something is wrong in my build environment after updating the OS

ggml-ci

ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci

slaren · 2023-10-04T10:40:22Z

I tried running the tests with the sanitizers enabled, but I didn't get any errors. Other people reported issues with illegal instructions on Macs (#3279), and I think that the conclusion was that it might be a compiler bug.

ggerganov · 2023-10-04T11:18:28Z

On M2 Ultra, I have Sonoma 14 with Xcode 15 and running ./bin/test-quantize-perf on master crashes with illegal instruction: 4 in nearest_int():

$ /usr/bin/xcodebuild -version
Xcode 15.0
Build version 15A240d

$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

On M1 Pro, I also have Sonoma 14, but I haven't updated Xcode yet and it does not crash:

$ /usr/bin/xcodebuild -version
Xcode 14.3.1
Build version 14E300c

$ g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin23.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

However, both setups produce sanitizer errors in Q4_1, Q5_0 and Q5_1.

The errors in Q5_0 and Q5_1 are fixed with 2db94d9, but the others I cannot figure out what is causing them yet

slaren · 2023-10-04T11:35:05Z

I tried building with clang 15, but I still don't get any sanitizer errors. Maybe this check is something exclusive to the Apple clang version.

Looking at the errors and the code, I think that the cause might be in the lambdas such as this one:

                    auto quantize_fn = [&](void ) {
                        qfns.from_float_reference(test_data1, test_q1, size);
                        return test_q1[0];
                    };

This returns a float, but benchmark_function takes a function that returns a size_t, so there is probably a conversion happening inside std::function, which causes that error. Maybe changing the std::function argument in benchmark_function to return float would fix this.

ggerganov · 2023-10-04T12:04:37Z

Nice! That fixed all UBs

The Illegal instruction remains, but it's very suspicious - if I put a print or try to build in Debug or RelWithDebInfo, it stops crashing. Running in Release with lldb I only get this:

q2_K
  quantize_row_q_reference
    4096 values (0.02 MB)
Process 39004 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011222)
    frame #0: 0x0000000100099afc test-quantize-perf`quantize_row_q2_K_reference + 520
test-quantize-perf`quantize_row_q2_K_reference:
->  0x100099afc <+520>: .long  0x1e011222                ; unknown opcode
    0x100099b00 <+524>: ldp    s4, s26, [x9, #0x1c]
    0x100099b04 <+528>: fcmp   s18, s1
    0x100099b08 <+532>: fcsel  s1, s18, s1, mi
Target 0: (test-quantize-perf) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011222)
  * frame #0: 0x0000000100099afc test-quantize-perf`quantize_row_q2_K_reference + 520
    frame #1: 0x0000000100008260 test-quantize-perf`std::__1::__function::__func<main::$_0, std::__1::allocator<main::$_0>, float ()>::operator()() + 44
    frame #2: 0x0000000100007c8c test-quantize-perf`benchmark_function(unsigned long, unsigned long, long long, std::__1::function<float ()> const&) + 72
    frame #3: 0x0000000100007034 test-quantize-perf`main + 7252
    frame #4: 0x0000000189139058 dyld`start + 2224
(lldb)

Replacing nearest_int with roundf still crashes:

llama.cpp/k_quants.c

Line 288 in f72f8f2

int l = nearest_int(iscale*(x[i] - min));

Replacing with int l = 0; does not crash.

I guess I'll just merge as it is and blame clang for now.

…example * 'master' of github.com:ggerganov/llama.cpp: (24 commits) convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299) readme : add project status link ggml : fix build after ggerganov#3329 llm : add Refact model (ggerganov#3329) sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468) finetune : readme fix typo (ggerganov#3465) ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453) main : consistent prefix/suffix coloring (ggerganov#3425) llama : fix session saving/loading (ggerganov#3400) llama : expose model's rope_freq_scale in the API (ggerganov#3418) metal : alibi for arbitrary number of heads (ggerganov#3426) cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273) Work on the BPE tokenizer (ggerganov#3252) convert : fix vocab size when not defined in hparams (ggerganov#3421) cmake : increase minimum version for add_link_options (ggerganov#3444) CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402) gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408) gguf : general usability improvements (ggerganov#3409) cmake : make CUDA flags more similar to the Makefile (ggerganov#3420) finetune : fix ggerganov#3404 (ggerganov#3437) ...

* sync : ggml (conv 1d + 2d updates) ggml-ci * ggml : fix UB in q5_0 and q5_1 quantize code ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ggml-ci * tests : fix UB in test-quantize-perf

ggerganov added 2 commits October 4, 2023 12:47

sync : ggml (conv 1d + 2d updates)

2327777

ggml-ci

ggerganov added the need feedback Testing and feedback with results are needed label Oct 4, 2023

tests : fix UB in test-quantize-perf

5b8d505

ggerganov force-pushed the sync branch from 0eabd46 to 5b8d505 Compare October 4, 2023 11:53

ggerganov merged commit f93af02 into master Oct 4, 2023
33 checks passed

ggerganov deleted the sync branch October 4, 2023 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml (conv 1d + 2d updates) #3468

sync : ggml (conv 1d + 2d updates) #3468

ggerganov commented Oct 4, 2023 •

edited

Loading

slaren commented Oct 4, 2023

ggerganov commented Oct 4, 2023

slaren commented Oct 4, 2023 •

edited

Loading

ggerganov commented Oct 4, 2023 •

edited

Loading

sync : ggml (conv 1d + 2d updates) #3468

sync : ggml (conv 1d + 2d updates) #3468

Conversation

ggerganov commented Oct 4, 2023 • edited Loading

slaren commented Oct 4, 2023

ggerganov commented Oct 4, 2023

slaren commented Oct 4, 2023 • edited Loading

ggerganov commented Oct 4, 2023 • edited Loading

ggerganov commented Oct 4, 2023 •

edited

Loading

slaren commented Oct 4, 2023 •

edited

Loading

ggerganov commented Oct 4, 2023 •

edited

Loading