-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.float32 smallest subnormal set to zero after ti.init
#8357
Comments
Ah, nevermind - it appears that I have a race condition going on, as indicated by running with the CPU backend. I'm completely new to GPU programming - what is the recommendation here? Should I use intermediate containers to store the results of a given block? Or is there some other pattern that I should be following? Also, I'm still curious about the original warning - where could that be coming from? |
Here's what I'm doing now - is this a good approach? It seems to work, although obviously there's some loss of precision with float32. def ti_volume_convolution(kernel_slab: NDArray, stimulus: NDArray):
"""Use Taichi to do a volume convolution.
This should wrap the process of creating ti.field objects and whatnot.
"""
# Response container.
# response_container = ti.field(dtype=ti.f32, shape=stimulus.shape[2] - kernel_size)
width, height, kernel_size = kernel_slab.shape
stimulus_length = stimulus.shape[2]
batch_count = stimulus_length - kernel_size
# Critically, batch_count is the length of the response container.
compute_cells = np.ascontiguousarray(
np.zeros(shape=(kernel_size, batch_count), dtype=np.float32)
)
response_container = np.ascontiguousarray(np.zeros(batch_count), dtype=np.float32)
# Should modify in place.
_gpu_convolve(
response_container,
compute_cells,
np.ascontiguousarray(kernel_slab, dtype=np.float32),
np.ascontiguousarray(stimulus, dtype=np.float32),
width,
height,
kernel_size,
)
return response_container
@ti.kernel
def _gpu_convolve(
response_container: ti.types.ndarray(dtype=ti.f32, ndim=1),
compute_cells: ti.types.ndarray(dtype=ti.f32, ndim=2),
kernel_slab: ti.types.ndarray(dtype=ti.f32, ndim=3),
stimulus: ti.types.ndarray(dtype=ti.f32, ndim=3),
width: ti.int32,
height: ti.int32,
kernel_size: ti.int32,
):
"""Brute force GPU convolution.
Note that in taichi kernels, only the outermost loops are parallelized.
Want to parallelize:
- computation of piecewise products
- Summation of convolved slabs
"""
for t_idx, batch_idx in compute_cells:
for x_idx in range(width):
for y_idx in range(height):
compute_cells[t_idx, batch_idx] += (
kernel_slab[x_idx, y_idx, t_idx]
* stimulus[x_idx, y_idx, t_idx + batch_idx]
)
for batch_idx in response_container:
batch_total = 0
for t_idx in range(kernel_size):
batch_total += compute_cells[t_idx, batch_idx]
response_container[batch_idx] = batch_total |
Hi Datamance,
I think you're doing a good job so far - the goal of Taichi is to make sure the parallel codes work efficiently without having to worry about these staffs. There are some more advanced patterns like ti.simt.SharedArray, but it's very trivial and I would only recommend if you're seeking for extreme performance |
@jim19930609 So the exact warning was
It sounds like it might be the same issue that this person was facing in another context, but I can't be sure. Here's the key part of what they wrote:
Does taichi depend on any shared libraries that might do this, or something like it? |
Interesting...That was correct, taichi does use fast math by default and this is likely the reason. But honestly that sounds more like an issue with |
Describe the bug
The smallest subnormal of the
numpy.float32
dtype is set to zero afterti.init
, causing computations to fail completely. Oddly, the warning disappears and the correct subnormal of1.4012985e-45
gets printed if youprint(np.finfo(np.float32))
before initializing taichi, but the computation still fails regardless.EDIT I see that one issue was how I was initializing the response container - so at least I get results now! I'm just wondering now if the results are incongruent due to my logic or this subnormal issue...
To Reproduce
Log/Screenshots
Additional comments
Output from
ti diagnose
:The text was updated successfully, but these errors were encountered: