Skip to content

Commit

Permalink
gate the cast before movements in lazy (tinygrad#3452)
Browse files Browse the repository at this point in the history
it made gpt2 slower (2ms -> 2.5ms on 3090, 7ms -> 8ms on M1 Max with BEAM=2).
disabled it in gpt2 benchmark before understanding the full issue
  • Loading branch information
chenyuxyz authored Feb 20, 2024
1 parent 0d326a4 commit 02683a8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ jobs:
- name: Run GPT2 w HALF
run: CUDA=1 JIT=1 HALF=1 python3 examples/gpt2.py --count 10 --temperature 0 --timing
- name: Run GPT2 w HALF/BEAM
run: CUDA=1 JIT=1 HALF=1 BEAM=2 CACHELEVEL=0 python3 examples/gpt2.py --count 10 --temperature 0 --timing | tee gpt2_half_beam.txt
run: CUDA=1 JIT=1 HALF=1 BEAM=2 CACHELEVEL=0 CAST_BEFORE_VIEW=0 python3 examples/gpt2.py --count 10 --temperature 0 --timing | tee gpt2_half_beam.txt
- uses: actions/upload-artifact@v4
with:
name: Speed (NVIDIA)
Expand Down
4 changes: 3 additions & 1 deletion tinygrad/lazy.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,9 @@ def contiguous(self):

def cast(self, dtype:DType, bitcast:bool=False):
if self.dtype == dtype: return self
if dtype.itemsize <= self.dtype.itemsize and self != self.base: return self.base.cast(dtype, bitcast)._view(self.st)
# TODO: applying this makes gpt2 slower
if getenv("CAST_BEFORE_VIEW", 1) and dtype.itemsize <= self.dtype.itemsize and self != self.base:
return self.base.cast(dtype, bitcast)._view(self.st)
return create_lazybuffer(self.device, ShapeTracker.from_shape(self.shape), dtype, UnaryOps.CAST, (dtype, bitcast), (self,))

def is_unrealized_const(self): return not self.base.realized and self.base.op is LoadOps.CONST
Expand Down

0 comments on commit 02683a8

Please sign in to comment.