Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VTA] Fix an issue in updating uop_idx in the TensorGemm module #4694

Merged
merged 1 commit into from
Jan 14, 2020

Conversation

liangfu
Copy link
Member

@liangfu liangfu commented Jan 13, 2020

This PR fixed an issue in updating uop_idx counter in the TensorGemm module.

Problem

When evaluating deploy_vision_on_vta.py with TSIM backend, it is required to run the module with

m.run()

However, it throws an instance of dmlc::Error, saying

virtual_memory.cc:48: Check failed: phy_addr != 0 (0 vs. 0) : trying to get address that is nullptr

The root cause is that uop_idx increases unexpectedly.

This PR fixes the above issue and enables successful evaluation of deploy_vision_on_vta.py with TSIM backend.

@liangfu
Copy link
Member Author

liangfu commented Jan 13, 2020

@kevinyuan Can you help reproduce the results?

A hard requirement is that we should run the module only once with

m.run()

, instead of running it in time_evaluator.

In addition, I have a patch

--- a/vta/hardware/dpi/tsim_device.cc
+++ b/vta/hardware/dpi/tsim_device.cc
@@ -141,6 +141,8 @@ int VTADPISim() {
       tfp->dump(static_cast<vluint64_t>(trace_count * 2 + 1));
 #endif
     trace_count++;
+    if ((trace_count % 1000000) == 1)
+      fprintf(stderr, "[traced %dM cycles]\n", trace_count / 1000000);
     while (top->sim_wait) {
       top->clock = 0;
       std::this_thread::sleep_for(std::chrono::milliseconds(100));

to trace the number of cycles in TSIM.
It requires 31M cycles to run resnet18_v1 prediction.

@kevinyuan
Copy link
Contributor

Hi @liangfu,

With your recommended changes, tsim produced the same prediction results as fsim.

Can you explain why m.run() must be used instead of time_evaluator to generate the correct prediction result ?

Furthermore, the cycle_count in "Execution statistics" is equal to 0. Is this expected or it can be improved ?

Thanks.

@liangfu
Copy link
Member Author

liangfu commented Jan 14, 2020

Can you explain why m.run() must be used instead of time_evaluator to generate the correct prediction result?

time_evaluator takes the first run as a warm-up, and the timing results would be ignored. The prediction results are generated from the second run, which might use the buffer contaminated by the first run.

Furthermore, the cycle_count in "Execution statistics" is equal to 0. Is this expected or it can be improved?

It prints out correct "Execution statistics" when we use time_evaluator. The problem is how to get correct prediction results with time_evaluator, instead of running the module directly.

cc @tmoreau89

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @liangfu for the patch

@tmoreau89 tmoreau89 merged commit 5699637 into apache:master Jan 14, 2020
@kevinyuan
Copy link
Contributor

Hi @liangfu,

Have you got a chance to figure out the root cause of "The problem is how to get correct prediction results with time_evaluator, instead of running the module directly" ?

Also, I found using the following wrong code #1 will also generate wrong prediction, which looks like the inference was run with 11 passed, since I saw [traced 329M cycles] at the end, while the correct run show [traced 29M cycles] at the end.

----------- wrong code #1 ------------

from tvm.contrib.debugger import debug_runtime as graph_runtime
m = graph_runtime.create(graph, lib, ctx)
m.run()

Output

...
[traced 329M cycles]

----------- wrong code #2 ------------

from tvm.contrib import graph_runtime, util, download
m = graph_runtime.create(graph, lib, ctx)
timer = m.module.time_evaluator("run", ctx, number=1, repeat=1)

Output

...
[traced 59M cycles]

----------- correct ---------

from tvm.contrib import graph_runtime, util, download
m = graph_runtime.create(graph, lib, ctx)
m.run()

Output

...
[traced 29M cycles]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants