Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when translating keras NASNetMobile model to FlatBuffer for Vulkan backend #2764

Closed
Vooblin opened this issue Aug 4, 2020 · 6 comments
Assignees

Comments

@Vooblin
Copy link

Vooblin commented Aug 4, 2020

Code on python to get MLIR module:

import os
import tensorflow as tf
from pyiree.tf import compiler as ireec

SAVE_PATH = os.path.join(os.environ["HOME"], "saved_models")
os.makedirs(SAVE_PATH, exist_ok=True)

INPUT_SHAPE = [1, 224, 224, 3]

tf_model = tf.keras.applications.NASNetMobile(
    weights="imagenet", include_top=True, input_shape=tuple(INPUT_SHAPE[1:]))

tf_module = tf.Module()
tf_module.m = tf_model
tf_module.predict = tf.function(
    input_signature=[tf.TensorSpec(INPUT_SHAPE, tf.float32)])(tf_model.call)

saved_model_path = "/tmp/model.sm"
save_options = tf.saved_model.SaveOptions(save_debug_info=True)
tf.saved_model.save(tf_module, saved_model_path, options=save_options)

compiler_module = ireec.tf_load_saved_model(
    saved_model_path, exported_names=["predict"])

model_mlir_path = os.path.join(SAVE_PATH, "model.mlir")
with open(model_mlir_path, "wt") as output_file:
  output_file.write(compiler_module.to_asm())

Then the following command:

iree-translate \
    -iree-mlir-to-vm-bytecode-module \
    --iree-hal-target-backends=vulkan-spirv \
    $HOME/saved_models/model.mlir -o $HOME/saved_models/model.vmfb &> result.txt

Some lines of error:

./model.mlir:3483:13: error: cannot separate Linalg/Parallel ops into multiple kernels
     %2336 = "mhlo.batch_norm_inference"(%2335, %2097, %2096, %2095, %2094) {epsilon = 1.000000e-03 : f32, feature_index = 3 : i64} : (tensor<1x56x56x11xf32>, tensor<11xf32>, tensor<11xf32>, tensor<11xf32>, tensor<11xf32>) -> tensor<1x56x56x11xf32>
             ^
 ./model.mlir:3483:13: note: see current operation: "func"() ( {
   %0 = "iree.placeholder"() {binding = @legacy_io::@ret0, purpose = "interface buffer"} : () -> memref<1x56x56x11xf32>
   %1 = "linalg.reshape"(%0) {reassociation = [affine_map<(d0, d1, d2, d3) -> (d0, d1)>, affine_map<(d0, d1, d2, d3) -> (d2)>, affine_map<(d0, d1, d2, d3) -> (d3)>]} : (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
   %2 = "iree.placeholder"() {binding = @legacy_io::@ret1, purpose = "interface buffer"} : () -> memref<1x56x56x11xf32>
   %3 = "linalg.reshape"(%2) {reassociation = [affine_map<(d0, d1, d2, d3) -> (d0, d1)>, affine_map<(d0, d1, d2, d3) -> (d2)>, affine_map<(d0, d1, d2, d3) -> (d3)>]} : (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
   %4 = "iree.placeholder"() {binding = @legacy_io::@arg0, purpose = "interface buffer"} : () -> memref<1x56x56x11xf32>
   %5 = "iree.placeholder"() {binding = @legacy_io::@arg1, purpose = "interface buffer"} : () -> memref<11xf32>
   %6 = "iree.placeholder"() {binding = @legacy_io::@arg2, purpose = "interface buffer"} : () -> memref<11xf32>
   %7 = "iree.placeholder"() {binding = @legacy_io::@arg3, purpose = "interface buffer"} : () -> memref<11xf32>
   %8 = "iree.placeholder"() {binding = @legacy_io::@arg4, purpose = "interface buffer"} : () -> memref<11xf32>
   %9 = "iree.placeholder"() {binding = @legacy_io::@arg5, purpose = "interface buffer"} : () -> memref<1x56x56x11xf32>
   %10 = "iree.placeholder"() {binding = @legacy_io::@arg6, purpose = "interface buffer"} : () -> memref<1x56x56x11xf32>
   %11 = "linalg.reshape"(%9) {reassociation = [affine_map<(d0, d1, d2, d3) -> (d0, d1)>, affine_map<(d0, d1, d2, d3) -> (d2)>, affine_map<(d0, d1, d2, d3) -> (d3)>]} : (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
   %12 = "linalg.reshape"(%4) {reassociation = [affine_map<(d0, d1, d2, d3) -> (d0, d1)>, affine_map<(d0, d1, d2, d3) -> (d2)>, affine_map<(d0, d1, d2, d3) -> (d3)>]} : (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
   "linalg.generic"(%11, %12, %5, %6, %7, %8, %1) ( {
   ^bb0(%arg0: f32, %arg1: f32, %arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32, %arg6: f32):  // no predecessors
     %14 = "std.subf"(%arg1, %arg2) : (f32, f32) -> f32
     %15 = "std.mulf"(%14, %arg3) : (f32, f32) -> f32
     %16 = "std.divf"(%15, %arg4) : (f32, f32) -> f32
     %17 = "std.addf"(%16, %arg5) : (f32, f32) -> f32
     %18 = "std.addf"(%arg0, %17) : (f32, f32) -> f32
     "linalg.yield"(%18) : (f32) -> ()
   }) {args_in = 6 : i64, args_out = 1 : i64, indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} : (memref<56x56x11xf32>, memref<56x56x11xf32>, memref<11xf32>, memref<11xf32>, memref<11xf32>, memref<11xf32>, memref<56x56x11xf32>) -> ()
   %13 = "linalg.reshape"(%10) {reassociation = [affine_map<(d0, d1, d2, d3) -> (d0, d1)>, affine_map<(d0, d1, d2, d3) -> (d2)>, affine_map<(d0, d1, d2, d3) -> (d3)>]} : (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
   "linalg.generic"(%11, %12, %5, %6, %7, %8, %13, %3) ( {
   ^bb0(%arg0: f32, %arg1: f32, %arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32, %arg6: f32, %arg7: f32):  // no predecessors
     %14 = "std.subf"(%arg1, %arg2) : (f32, f32) -> f32
     %15 = "std.mulf"(%14, %arg3) : (f32, f32) -> f32
     %16 = "std.divf"(%15, %arg4) : (f32, f32) -> f32
     %17 = "std.addf"(%16, %arg5) : (f32, f32) -> f32
     %18 = "std.addf"(%arg0, %17) : (f32, f32) -> f32
     %19 = "std.addf"(%18, %arg6) : (f32, f32) -> f32
     "linalg.yield"(%19) : (f32) -> ()
   }) {args_in = 7 : i64, args_out = 1 : i64, indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} : (memref<56x56x11xf32>, memref<56x56x11xf32>, memref<11xf32>, memref<11xf32>, memref<11xf32>, memref<11xf32>, memref<56x56x11xf32>, memref<56x56x11xf32>) -> ()
   "std.return"() : () -> ()
 }) {sym_name = "predict_ex_dispatch_45", type = () -> ()} : () -> ()
 ./model.mlir:3483:13: error: failed to run translation of source executable to target executable for backend vulkan*
     %2336 = "mhlo.batch_norm_inference"(%2335, %2097, %2096, %2095, %2094) {epsilon = 1.000000e-03 : f32, feature_index = 3 : i64} : (tensor<1x56x56x11xf32>, tensor<11xf32>, tensor<11xf32>, tensor<11xf32>, tensor<11xf32>) -> tensor<1x56x56x11xf32>
             ^
@hanhanW
Copy link
Contributor

hanhanW commented Aug 4, 2020

There could be several reasons that it fails. I can see some from bottom up.

One reason is that there are ops interleave with Linalg structure ops, so it fails in split-dispatch-function-pass.

"linalg.generic" ...
%13 = "linalg.reshape"(%10) ...  (memref<1x56x56x11xf32>) -> memref<56x56x11xf32>
"linalg.generic"

This might be addressed by #2763

From higher view, the reshape op is not fused into the later generic op in tensor's world. This is because it's the reshape op is collapsing dims and it's a producer of the later generic op. We can only fuse these two ops only if the reshape op is expanding dims.

However, even if we can somehow fuse these two ops. The generic ops won't be fused because there is no deps (like RAW or WAR) between them, since the common operands are all for inputs. Thus, I'm not sure if this should be addressed in higher level like flow dialect or something because we might not expect them to be in the same dispatch function.

For short-term solution (or if they can be in the same dispatch function), I think it would be addressed by #2763. Could you attach the model.mlir file so I can do further test or could you patch #2763 and see if it's resolved?

@hanhanW
Copy link
Contributor

hanhanW commented Aug 5, 2020

The patch was merged, so maybe you could sync to the latest and give it a shot.

@Vooblin
Copy link
Author

Vooblin commented Aug 5, 2020

@hanhanW Thanks for answering! I've tried the version with #2763 but it still has the same error. This is the link to google drive with the model.mlir file: link

@hanhanW hanhanW self-assigned this Aug 5, 2020
@hanhanW
Copy link
Contributor

hanhanW commented Aug 5, 2020

Thanks for the attachment, I will take a look!

@hanhanW
Copy link
Contributor

hanhanW commented Aug 5, 2020

Could you rebase to the main and do it again? I compiled the module with upstream/main and got no errors.

Eg,

git fetch upstream main
git checkout upstream/main
git submodule update --init
config...
build...

Note, it seems that there are depthwise conv ops in the file, and it's not supported yet. This is WIP, see #2678. I suspect that there is a numerical issue even it's compiled.

@Vooblin
Copy link
Author

Vooblin commented Aug 5, 2020

Oh, sorry, I've just rebuilt it and it works, thanks!

@Vooblin Vooblin closed this as completed Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants