Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease need for template arguments #1035

Open
bernhardmgruber opened this issue Jun 8, 2020 · 11 comments
Open

Decrease need for template arguments #1035

bernhardmgruber opened this issue Jun 8, 2020 · 11 comments

Comments

@bernhardmgruber
Copy link
Member

I gave an introduction to alpaka to a group of people from the EP-SFT group at CERN and while presenting an example there was feedback that alpaka requires specifying too many template arguments. Especially the Idx needs to be specified in many places.

Example:

    const alpaka::vec::Vec<Dim, Idx> sizeVec{size};
    auto bufHostA = alpaka::mem::buf::alloc<float, Idx>(dev, sizeVec);

Here the Idx as argument to alloc should not be necesary, since it should be carried over from the type of sizeVec or from the type of dev, which is created using an accelerator where the type of Idx is also specified.

If possible please try to derive more types from the argument lists of functions and types!

@bernhardmgruber
Copy link
Member Author

During a VC today, we also briefly discussed this topic and the idea was raised to lock the type of Idx more globally.
Alpaka could just choose a fixed type, like uint32 or size_t. CUDA e.g. for specifying the size of the grid and blocks uses int or dim3 (which is 3 unsigned int). OpenCL uses size_t.
Alternatively, alpaka could at least use a default argument for the Idx template parameter.

@sbastrakov
Copy link
Member

sbastrakov commented Jun 8, 2020

This index type in buffer is for linear index, so can't be derived from the extent. So can't be easily removed. However, I am with you that perhaps it can somehow have a default value, and it is not most natural anyways as I also first thought it's multidimensional.

Was just documented in #1026 .

@bernhardmgruber
Copy link
Member Author

Hmm. Then I think that many examples are missleading, because the same alias Idx is used for the accelerator, dim, buffers, workdivs etc. Examples should then probably have a second BufferIdx alias. But since this type just represents the linear buffer size, this is a good candidate for just using size_t in the interface of buffers and drop the template parameters.

@SimeonEhrig
Copy link
Member

A general question, is mixing different Idx types supported and useful? If not, It could solved by a global value.

@BenjaminW3
Copy link
Member

BenjaminW3 commented Jun 8, 2020

Mixing may be supported but might not be very useful and is also not tested. So deriving it should by a viable option in this case.
Going further, we could think about adding a default or hard coding it.
Exactly the difference between CUDA (int) and others (size_t) were the reason it is configurable. This makes it possible to get the most performance without any loss due to type conversion.

@sbastrakov
Copy link
Member

@bernhardmgruber I think you are right, examples are misleading in this regard. I will make a PR to fix.

@bernhardmgruber
Copy link
Member Author

@SimeonEhrig: I am very much in favor of a global value!

@BenjaminW3: I can hardly believe that type conversions significantly contribute to the programs runtime. Furthermore, alpaka is intended to write portable code, so the same program, whatever Idx is chosen by the programmer, will need to run on CUDA or others and therefore also map to int, uint, size_t etc. depending on the backend. So there will always be conversions if there are several backends in play, unless you make the index type dependent on the backend.

I noticed a further issue with the index type: it is the same on host and device code. size_t is probably a good choice for Idx on the host side (where it is typically 64bit). It might be a bad choice for 32bit GPU device code, because the hardware might not have support for 64bit arithmetic. So the index type should maybe be different in host code and in device code. I think OpenCL does it this way, there size_t is used on the host and uint on the device.

@SimeonEhrig
Copy link
Member

SimeonEhrig commented Jun 8, 2020

@bernhardmgruber A global variable would be the best approach for easy development, but maybe not for the best performance. I think we should start to think about what a global variable means for the performance (which we already did) and then, which should "split" the variable and make it more depend for certain parts of the code, e.g. the accelerator or memory buffer.

using Acc = alpaka::acc::AccCpuSerial<Dim, Idx>;
//...
// internal code
Acc::get_idx_type();

In this case, we just need to define the Idx for every accelerator one time. But I'm not enough familiar with the internals of Alpaka. So it possible that we need the Idx at some more places for technical and/or performance reasons.

@sbastrakov
Copy link
Member

sbastrakov commented Jun 8, 2020

@SimeonEhrig I am not saying your suggestion is somehow bad or anything. However, to me it feels to somewhat contradict the portability idea. Namely, imposing thread indexing on accelerators on internal indexing of buffers. I feel these two should never be tightly related, and e.g. using thread incides as buffer indices directly is also not the best (although very commonly done).
(Edit: I feel a little like @psychocoderHPC writing that)

@SimeonEhrig
Copy link
Member

@SimeonEhrig I am not saying your suggestion is somehow bad or anything. However, to me it feels to somewhat contradict the portability idea. Namely, imposing thread indexing on accelerators on internal indexing of buffers. I feel these two should never be tightly related, and e.g. using thread incides as buffer indices directly is also not the best (although very commonly done).
(Edit: I feel a little like @psychocoderHPC writing that)

Sounds, that reducing the usage of idx is really hard, without hurting the goals of portability and performance. In this case, the usage of default values could be the better solution.

@j-stephan
Copy link
Member

Can't we introduce alpaka::size_t that takes care of backend differences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants