Decrease need for template arguments #1035

bernhardmgruber · 2020-06-08T10:01:23Z

I gave an introduction to alpaka to a group of people from the EP-SFT group at CERN and while presenting an example there was feedback that alpaka requires specifying too many template arguments. Especially the Idx needs to be specified in many places.

Example:

    const alpaka::vec::Vec<Dim, Idx> sizeVec{size};
    auto bufHostA = alpaka::mem::buf::alloc<float, Idx>(dev, sizeVec);

Here the Idx as argument to alloc should not be necesary, since it should be carried over from the type of sizeVec or from the type of dev, which is created using an accelerator where the type of Idx is also specified.

If possible please try to derive more types from the argument lists of functions and types!

The text was updated successfully, but these errors were encountered:

bernhardmgruber · 2020-06-08T10:07:58Z

During a VC today, we also briefly discussed this topic and the idea was raised to lock the type of Idx more globally.
Alpaka could just choose a fixed type, like uint32 or size_t. CUDA e.g. for specifying the size of the grid and blocks uses int or dim3 (which is 3 unsigned int). OpenCL uses size_t.
Alternatively, alpaka could at least use a default argument for the Idx template parameter.

sbastrakov · 2020-06-08T10:35:33Z

This index type in buffer is for linear index, so can't be derived from the extent. So can't be easily removed. However, I am with you that perhaps it can somehow have a default value, and it is not most natural anyways as I also first thought it's multidimensional.

Was just documented in #1026 .

bernhardmgruber · 2020-06-08T11:06:50Z

Hmm. Then I think that many examples are missleading, because the same alias Idx is used for the accelerator, dim, buffers, workdivs etc. Examples should then probably have a second BufferIdx alias. But since this type just represents the linear buffer size, this is a good candidate for just using size_t in the interface of buffers and drop the template parameters.

SimeonEhrig · 2020-06-08T11:10:06Z

A general question, is mixing different Idx types supported and useful? If not, It could solved by a global value.

BenjaminW3 · 2020-06-08T11:13:05Z

Mixing may be supported but might not be very useful and is also not tested. So deriving it should by a viable option in this case.
Going further, we could think about adding a default or hard coding it.
Exactly the difference between CUDA (int) and others (size_t) were the reason it is configurable. This makes it possible to get the most performance without any loss due to type conversion.

sbastrakov · 2020-06-08T11:22:00Z

@bernhardmgruber I think you are right, examples are misleading in this regard. I will make a PR to fix.

bernhardmgruber · 2020-06-08T11:29:22Z

@SimeonEhrig: I am very much in favor of a global value!

@BenjaminW3: I can hardly believe that type conversions significantly contribute to the programs runtime. Furthermore, alpaka is intended to write portable code, so the same program, whatever Idx is chosen by the programmer, will need to run on CUDA or others and therefore also map to int, uint, size_t etc. depending on the backend. So there will always be conversions if there are several backends in play, unless you make the index type dependent on the backend.

I noticed a further issue with the index type: it is the same on host and device code. size_t is probably a good choice for Idx on the host side (where it is typically 64bit). It might be a bad choice for 32bit GPU device code, because the hardware might not have support for 64bit arithmetic. So the index type should maybe be different in host code and in device code. I think OpenCL does it this way, there size_t is used on the host and uint on the device.

SimeonEhrig · 2020-06-08T12:00:47Z

@bernhardmgruber A global variable would be the best approach for easy development, but maybe not for the best performance. I think we should start to think about what a global variable means for the performance (which we already did) and then, which should "split" the variable and make it more depend for certain parts of the code, e.g. the accelerator or memory buffer.

using Acc = alpaka::acc::AccCpuSerial<Dim, Idx>;
//...
// internal code
Acc::get_idx_type();

In this case, we just need to define the Idx for every accelerator one time. But I'm not enough familiar with the internals of Alpaka. So it possible that we need the Idx at some more places for technical and/or performance reasons.

sbastrakov · 2020-06-08T12:12:00Z

@SimeonEhrig I am not saying your suggestion is somehow bad or anything. However, to me it feels to somewhat contradict the portability idea. Namely, imposing thread indexing on accelerators on internal indexing of buffers. I feel these two should never be tightly related, and e.g. using thread incides as buffer indices directly is also not the best (although very commonly done).
(Edit: I feel a little like @psychocoderHPC writing that)

SimeonEhrig · 2020-06-08T12:34:29Z

@SimeonEhrig I am not saying your suggestion is somehow bad or anything. However, to me it feels to somewhat contradict the portability idea. Namely, imposing thread indexing on accelerators on internal indexing of buffers. I feel these two should never be tightly related, and e.g. using thread incides as buffer indices directly is also not the best (although very commonly done).
(Edit: I feel a little like @psychocoderHPC writing that)

Sounds, that reducing the usage of idx is really hard, without hurting the goals of portability and performance. In this case, the usage of default values could be the better solution.

j-stephan · 2020-06-08T13:16:06Z

Can't we introduce alpaka::size_t that takes care of backend differences?

bernhardmgruber mentioned this issue Jun 8, 2020

simplification of the user interface #944

Open

sbastrakov added Type:Question Type:Refactoring labels Jun 8, 2020

psychocoderHPC mentioned this issue Jul 6, 2020

fix HIP support alpaka-group/mallocMC#182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease need for template arguments #1035

Decrease need for template arguments #1035

bernhardmgruber commented Jun 8, 2020

bernhardmgruber commented Jun 8, 2020

sbastrakov commented Jun 8, 2020 •

edited

Loading

bernhardmgruber commented Jun 8, 2020

SimeonEhrig commented Jun 8, 2020

BenjaminW3 commented Jun 8, 2020 •

edited

Loading

sbastrakov commented Jun 8, 2020

bernhardmgruber commented Jun 8, 2020

SimeonEhrig commented Jun 8, 2020 •

edited

Loading

sbastrakov commented Jun 8, 2020 •

edited

Loading

SimeonEhrig commented Jun 8, 2020

j-stephan commented Jun 8, 2020

Decrease need for template arguments #1035

Decrease need for template arguments #1035

Comments

bernhardmgruber commented Jun 8, 2020

bernhardmgruber commented Jun 8, 2020

sbastrakov commented Jun 8, 2020 • edited Loading

bernhardmgruber commented Jun 8, 2020

SimeonEhrig commented Jun 8, 2020

BenjaminW3 commented Jun 8, 2020 • edited Loading

sbastrakov commented Jun 8, 2020

bernhardmgruber commented Jun 8, 2020

SimeonEhrig commented Jun 8, 2020 • edited Loading

sbastrakov commented Jun 8, 2020 • edited Loading

SimeonEhrig commented Jun 8, 2020

j-stephan commented Jun 8, 2020

sbastrakov commented Jun 8, 2020 •

edited

Loading

BenjaminW3 commented Jun 8, 2020 •

edited

Loading

SimeonEhrig commented Jun 8, 2020 •

edited

Loading

sbastrakov commented Jun 8, 2020 •

edited

Loading