Allow fp16 re-write of pack-lh nodes #7731

copybara-service · 2025-01-28T10:10:19Z

Allow fp16 re-write of pack-lh nodes

felix-johnny · 2025-01-28T15:26:30Z

@alankelly I tried this out and it goes further than before and I get the following error message now.

Note (XNNPACK): Analyzing subgraph for FP16 compatibility (xnn_subgraph_rewrite_for_fp16, external/XNNPACK/src/subgraph.c:862)
Warning in XNNPACK: FP16 rewrite aborted: node #1 (Fully Connected). Invalid compute type (input=PFP32, weights=FP32, output=FP32) (xnn_subgraph_rewrite_for_fp16, external/XNNPACK/src/subgraph.c:978)
Error in XNNPACK: failed to force FP16 inference: subgraph is incompatible with FP16 operators (xnn_subgraph_optimize, external/XNNPACK/src/subgraph.c:1624)
Error in XNNPACK: failed to optimize subgraph (xnn_create_runtime_v4, external/XNNPACK/src/runtime.c:541)

If I disable SME --define=xnn_enable_arm_sme=false --define=xnn_enable_arm_sme2=false, then it goes through fine and the same F32 layer is executed as F16..

PiperOrigin-RevId: 720489462

felix-johnny · 2025-01-29T16:58:16Z

src/subgraph.c

@@ -1009,7 +1022,7 @@ bool xnn_subgraph_rewrite_for_fp16(xnn_subgraph_t subgraph)
    value->fp16_id = XNN_INVALID_VALUE_ID;
    value->fp32_id = XNN_INVALID_VALUE_ID;
    if (value->fp16_compatible) {
-      assert(value->datatype == xnn_datatype_fp32);
+      assert(value->datatype == xnn_datatype_fp32 || value->datatype == xnn_datatype_pfp32);


I guess line 1105 needs this as well.
With that, the execution went further till another assert.. The last few lines of log is as

VERBOSE: Replacing 1 out of 1 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 1 partitions for subgraph 0.
Note (XNNPACK): Analyzing subgraph for FP16 compatibility (xnn_subgraph_rewrite_for_fp16, external/XNNPACK/src/subgraph.c:862)
Note (XNNPACK): XNNPACK has switched to FP16 inference mode! (xnn_subgraph_rewrite_for_fp16, external/XNNPACK/src/subgraph.c:1216)
INFO: Successfully applied the default TensorFlow Lite delegate indexed at 0.
NOTE: because a delegate has been applied, the precision of computations should be unchanged, but the exact output tensor values may have changed. If such output values are checked in your code, like in your tests etc., please consider increasing error tolerance for the check.
Note (XNNPACK): Tile size for GEMM with num_groups=1, m=4096, n=32 and mr=6, nr=16 set to [6, 32] (683 tiles) (xnn_gemm_best_tile_size, external/XNNPACK/src/microkernel-utils.c:110)
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
Assertion failed: (kernel_data == NULL), function setup_fully_connected_operator, file fully-connected.c, line 896.

copybara-service bot force-pushed the test_720489462 branch 4 times, most recently from e27aeff to 09781cf Compare January 29, 2025 08:57

Allow fp16 re-write of pack-lh nodes

e072477

PiperOrigin-RevId: 720489462

copybara-service bot force-pushed the test_720489462 branch from 09781cf to e072477 Compare January 29, 2025 09:07

felix-johnny reviewed Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow fp16 re-write of pack-lh nodes #7731

Allow fp16 re-write of pack-lh nodes #7731

copybara-service bot commented Jan 28, 2025

felix-johnny commented Jan 28, 2025

felix-johnny Jan 29, 2025

Allow fp16 re-write of pack-lh nodes #7731

Are you sure you want to change the base?

Allow fp16 re-write of pack-lh nodes #7731

Conversation

copybara-service bot commented Jan 28, 2025

felix-johnny commented Jan 28, 2025

felix-johnny Jan 29, 2025

Choose a reason for hiding this comment