Skip to content

Commit

Permalink
* review
Browse files Browse the repository at this point in the history
  • Loading branch information
srkreddy1238 committed Feb 2, 2023
1 parent 9194d9e commit a43658d
Show file tree
Hide file tree
Showing 3 changed files with 366 additions and 321 deletions.
247 changes: 29 additions & 218 deletions docs/how_to/deploy/adreno.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,6 @@ This guide is organized to demonstrate various design aspects of
- :ref:`Build and Deploy<build_deploy>`



.. how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno™ devices) and TVM RPC
.. enabled. It will also provide :ref:`example code<build_and_deploy_model_for_adreno>` to better understand the differences in compiling and deploying models
.. for Adreno™ devices.
.. _opencl_enhancements:

OpenCL Backend Enhancements
Expand Down Expand Up @@ -84,6 +78,29 @@ Reasons of using textures:
Overall, with textures, it is possible to achieve a significant performance boost
compared to OpenCL buffer based solutions.

In general we specify target as ``target="opencl"`` for a regular OpenCL based target which generates the kernels as shown below.

.. code:: c
__kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* restrict p0, __global double* restrict p1, __global float* restrict conv2d_nhwc) {
// body..
Above OpenCL kernel definition has ``__global float*`` poniters which are essestially OpenCL ``buffer`` objects.

When enabled texture based enhancements by modifying target definition as ``target="opencl -device=adreno"`` we can see the generated
kernels using texture backed OpenCL image objects as shown below.

.. code:: c
__kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t pad_temp_global_texture, __read_only image2d_t p0) {
// body..
*image2d_t* is a built-in OpenCL types that represents two-dimensional image object and provides several additional functions.
When we use *image2d_t* we read *4 elements at one time*, and it helps to utilize hardware in a more efficient way.

Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about generation and inspection of kernel sources.


.. _about_openclml:

About OpenCLML
Expand Down Expand Up @@ -468,80 +485,8 @@ to a relay module. Relay module will be used across the auto tuning, compilation

**TVMC Interface:**

TVMC interface can be accessed as shown below to import, compile and run a model.

.. code:: python
from tvm.driver import tvmc
from tvm.driver.tvmc.model import TVMCPackage
# Convert a model from any framework to a tvm relay module.
# tvmc.load supports models from any framework (like tensorflow saves_model, onnx, tflite ..etc) and auto detects the filetype.
tvmc_model = tvmc.load("resnet50.h5")
# tvmc_model consists of tvmc_mode.mod which is relay module and tvmc_model.params which parms of the module.
# Now, the below api can be used for autotuning the model for any target. Tuning required RPC setup and please refer to
# :ref:`RPC Setup<rpc_setup>` for the same.
tvmc.tune(
tvmc_model,
target="opencl -device=adreno",
output="keras-resnet50.log",
tuning_records="keras-resnet50-records.log",
target_host="llvm -mtriple=aarch64-linux-gnu"
rpc_tracker="127.0.0.1:9120",
rpc_key=android,
repeat=30,
trials=1024,
early_stopping=0,
)
# Compilation to produce tvm artifacts
tvmc_package = tvmc.compile(
tvmc_model,
target="opencl -device=adreno",
target_host="llvm -mtriple=aarch64-linux-gnu",
cross="/android_ndk}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang",
tuning_records="keras-resnet50.log",
)
# tvmc_package consists of tvmc_package.lib_path, tvmc_package.graph, tvmc_package.params
# Altrernatively, we can ave the cmpilation output and save it as a TVMCPackage.
# This way avoids loading of compiled module without compiling again.
tvmc.compile(
tvmc_model,
target="opencl -device=adreno",
target_host="llvm -mtriple=aarch64-linux-gnu",
cross="/android_ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang",
tuning_records="keras-resnet50.log",
package_path="keras-resnet50.tar"
)
# Load the compiled package
tvmc_package = TVMCPackage(package_path=module_file)
# Saved TVMPackage is nothing but tar archive with mod.so, mod.json and mod.params.
# Deploy and run the compiled model on RPC
# Prepare input data dict
input_data = tvm.nd.array((np.random.uniform(size=(1, 229, 229, 3))).astype("float32"))
input_dict = {"input": input_data}
# Run on RPC setup
result = tvmc.run(
tvmc_package,
device="cl",
rpc_key="android",
hostname="127.0.0.1",
port=9120,
inputs=input_dict
)
# result is a dictionary of outputs.
TVMC interface can be accessed as shown below to import, compile and run a model. Please refer to the tutorial for the same
`How To Deploy model on Adreno using TVMC <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno_tvmc.html>`_

tvmc compiled package can be used for native deploy also using "rtvm" utility.
Please refer to `rtvm <https://github.com/apache/tvm/tree/main/apps/cpp_rtvm#readme>`_ for more details about this tool.
Expand All @@ -551,144 +496,10 @@ Also, please refer to tvmc documentation for more details about the api interfac
**Relay Interface:**

Relay api interface gives lower level api access to the tvm compiler interface.
Relay interface follows tvmc kind os a flow where we produce TVM module first followed by auto tuning, compilation and deployment.

Below example explains about relay interface usage
Relay interface follows tvmc kind of a flow where we produce TVM module first followed by auto tuning, compilation and deployment.

.. code:: python
import tvm
from tvm import relay
from tvm.relay.op.contrib import clml
import numpy as np
from tensorflow.keras.applications import InceptionV3
import tensorflow as tf
target = "opencl -device=adreno"
target_host = "llvm -mtriple=arm64-linux-android"
# We first need to get a handle for a model from any framework.
# In this example we will prepare a keras InceptionV3 model
tf.keras.backend.clear_session()
keras_net = InceptionV3(
include_top=True, weights=None, input_shape=(299, 299, 3), classes=1000
)
input_info = {inceptionV3.input_names[0]: (1, 3, 299, 299)}
input_data = {inceptionV3.input_names[0], np.random.uniform(-1, -1, (1, 3, 299, 299)).astype("float32")}
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
def get_bottom_top_model(model, layer_name):
layer = model.get_layer(layer_name)
bottom_input = model.layers[0].input
bottom_output = layer.output
bottom_model = Model(bottom_input, bottom_output)
return bottom_model
keras_model = get_bottom_top_model(keras_net, "predictions")
ref_output = keras_model.predict(data["input_1"].transpose(0, 2, 3, 1))
# Now we have a keras_model with input "input_1" with shape (1, 3, 299,299), output "predictions" and a reference output ref_output.
# Lets import the model and get a relay module. TVM has frontend api for various frameworks under relay.frontend and now for keras
# model import we have relay.frontend.from_keras api.
mod, params = relay.frontend.from_keras(keras_model, input_info, layout="NCHW")
# With relay module mod and parameters params we can not fo for tuning followed by compilation.
# The below few instructions can auto tune the relay module with xgboost being the tuner algorithm.
# Auto Tuning process involces stages of extracting the tasks, defining tuning congiguration and
# tuning each task for best performing kernel configuration.
# Auto Tuning Stage 1: Extract tunable tasks
tasks = autotvm.task.extract_from_program(
net, target=target, target_host=target_host, params=params
)
# Auto Tuning Stage 2: Define tuning configuration
tune_log = "adreno-resnet50.log"
tmp_log_file = tune_log + ".tmp"
measure_option = autotvm.measure_option(
builder=autotvm.LocalBuilder(build_func=ndk.create_shared, timeout=15), # Build the test kernel locally
runner=autotvm.RPCRunner( # The runner would be on a remote device.
"android", # RPC Key
host="127.0.0.1", # Tracker host
port=9120, # Tracker port
number=3, # Number of runs before averaging
timeout=600, # RPC Timeout
),
),
n_trail = 1024 # Number of iteration of training before choosing the best kernel config
early_stopping=False, # Do we apply early stopping when the loss is not minimizing
# Iterate through each task and call the tuner
from tvm.autotvm.tuner import XGBTuner
for i, tsk in enumerate(reversed(tasks)):
tuner_obj = XGBTuner(tsk, loss_type="rank")
tsk_trial = min(n_trial, len(tsk.config_space))
tuner_obj.tune(
n_trial=tsk_trial,
early_stopping=early_stopping,
measure_option=measure_option,
callbacks=[
autotvm.callback.progress_bar(tsk_trial, prefix=prefix),
autotvm.callback.log_to_file(tmp_log_file),
],
)
# Pick the best performing kerl configurations from the overall log.
autotvm.record.pick_best(tmp_log_file, log_filename)
# Given we have relay module and it's best performing kernel configurations
# We can now go for compilation with tuned log or without tuning log if auto tuning is not enabled.
if os.path.exists(tune_log):
with autotvm.apply_history_best(tune_log):
with tvm.transform.PassContext(opt_level=3):
# Enable CLML partitioning if required.
net = clml.partition_for_clml(net, params)
lib = relay.build(
net, target=tvm.target.Target(target, host=target_host), params=params
)
else:
with tvm.transform.PassContext(opt_level=3):
# Enable CLML partitioning if required.
net = clml.partition_for_clml(net, params)
lib = relay.build(
net, target=tvm.target.Target(target, host=target_host), params=params
)
# Compilation results a lib module and it has everything required to deploy on target.
# We can save the compiler artifacts as shoun below and reload them later without entire compilation.
lib.export_library("mod.so", ndk.create_shared)
with open("mod.json", "w") as fo:
fo.write(graph.json())
with open("mod.params", "wb") as fo:
fo.write(runtime.save_param_dict(params))
# We can prepare TVMPackage from above files by art archiveing the same.
# The tar archive can be used with tvmc tool or tvmc api interfae to deploy and run.
# The tar archive can be used with "rtvm" tool also for native deploy on target device.
# Now, lets look at deploying the compiled tvm artifact on remote target and run
tmp = tempdir()
filename = "%s.so" % network
lib.export_library(tmp.relpath(filename), ndk.create_shared)
# connect to remote device
tracker = tvm.rpc.connect_tracker("127.0.0.1", 9120)
remote = tracker.request("android")
dev = remote.device(str(target), 0)
remote.upload(tmp.relpath(filename))
rlib = remote.load_module(filename)
# Create Graph runtime module on remote device
module = runtime.GraphModule(rlib["default"](dev))
# Set input
module.set_input("input_1", input_data["input_1"])
# Get output
output = module.get_output(0)
Please refer to the tutorial `How To Deploy model on Adreno <https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
for a step by step explanation of the same.


.. _application_integration:
Expand All @@ -713,7 +524,7 @@ tvm_runner interface too for further simplified version of the same.
Advanced Usage:
---------------

This section details some of the advanced usage and additional information whihc using Adreno™ target on TVM.
This section details some of the advanced usage and additional information while using Adreno™ target on TVM.

Generated Source Inspection
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit a43658d

Please sign in to comment.