-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement register spilling #60
Labels
Comments
Notice: Example (assembled by py-videocore): import numpy as np
from videocore.assembler import qpu
from videocore.driver import Driver
@qpu
def dma_store(asm, preload):
mov(ra0, uniform)
mov(ra1, uniform)
shl(r0, element_number, 2)
iadd(r0, r0, ra0)
if preload: # ATTENTION: difference of of the value `preload` chenge the result
# we don't use the value loaded here
mov(tmu0_s, r0)
nop(sig='load tmu0')
setup_vpm_write()
mov(vpm, element_number)
setup_dma_store(nrows=1)
start_dma_store(ra0) # store the value
wait_dma_store()
mov(tmu0_s, r0)
nop(sig='load tmu0') # load the value, which is just store into the buffer
setup_vpm_write()
mov(vpm, r4)
setup_dma_store(nrows=1)
start_dma_store(ra1) # resultにstore
wait_dma_store()
exit()
with Driver() as drv:
print('----- enable preload -----')
buffer = drv.alloc(16, 'uint32')
buffer[:] = 0
result = drv.alloc(16, 'uint32')
result[:] = 0
print('[Before]')
print(buffer)
print(result)
drv.execute(
n_threads=1,
program=drv.program(dma_store, True),
uniforms=[buffer.address, result.address]
)
print('[After]')
print(buffer)
print(result)
with Driver() as drv:
print('----- disable preload -----')
buffer = drv.alloc(16, 'uint32')
buffer[:] = 0
result = drv.alloc(16, 'uint32')
result[:] = 0
print('[Before]')
print(buffer)
print(result)
drv.execute(
n_threads=1,
program=drv.program(dma_store, False),
uniforms=[buffer.address, result.address]
)
print('[After]')
print(buffer)
print(result) |
Yeah, I think this is also the problem in #30 |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We need to implement register spilling to be able to support more complex kernels.
If the size of spilled registers (times 12 QPUs!) is small enough, we could store them in VPM and save from accessing memory. Access to the spilled registers would still need to be synchronized via the hardware-mutex.
The actual problem of this implementation is not the spilling/loading of locals, but in determining the minimum number of registers to spill.
(see doe300/VC4CL#24)
The text was updated successfully, but these errors were encountered: