Skip to content

Commit

Permalink
gpu: jit: handle tails in zero_out
Browse files Browse the repository at this point in the history
  • Loading branch information
kealan-barbieri authored and karturov committed Dec 4, 2023
1 parent 2dc95a2 commit c9c0b09
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions src/gpu/jit/codegen/codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -511,9 +511,10 @@ class ir_to_ngen_t : public ir_visitor_t {
int grf_size = ngen::GRF::bytes(hw);
int step = 2 * grf_size;
for (int i = 0; i < size; i += step) {
int exec_size = std::min(step, size - i) / type.size();
step = std::min(step, size - i);
step = utils::rnd_down_pow2(step);
int exec_size = step / type.size();
auto sub_rd_mov = rd.format(i, to_ngen(type), exec_size).reg_data();
ir_assert(math::is_pow2(exec_size));
host_->emov(exec_size, sub_rd_mov, ngen::Immediate(0.0f));
}
}
Expand Down

0 comments on commit c9c0b09

Please sign in to comment.