Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KDB updates for latest assembler #2996

Closed
wants to merge 12 commits into from
Closed

KDB updates for latest assembler #2996

wants to merge 12 commits into from

Conversation

cderb
Copy link
Contributor

@cderb cderb commented May 24, 2024

This PR updates db_sync and KDB files to support assembler changes and changes made in #2891
Kernel argument changes to LoadProgram function replicated in db_sync test.

@junliume
Copy link
Contributor

manually verified fixes of cases in #2891 (comment)

Comment on lines +567 to +576
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
" -mcpu=" +
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}.targetId;
}
else
{
compile_options += " -mcpu=" + handle.GetDeviceName();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
" -mcpu=" +
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}.targetId;
}
else
{
compile_options += " -mcpu=" + handle.GetDeviceName();
}
compile_options += " -mcpu=";
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}.targetId;
}
else
{
compile_options += handle.GetDeviceName();
}

[R] For me it looks a bit better.

Comment on lines +751 to +761
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
" -mcpu=" +
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}
.targetId;
}
else
{
compile_options += " -mcpu=" + handle.GetDeviceName();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
" -mcpu=" +
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}
.targetId;
}
else
{
compile_options += " -mcpu=" + handle.GetDeviceName();
}
compile_options += " -mcpu="
if(miopen::EndsWith(kern.kernel_file, ".s"))
{
compile_options +=
miopen::LcOptionTargetStrings{handle.GetTargetProperties()}
.targetId;
}
else
{
compile_options += handle.GetDeviceName();
}

@junliume
Copy link
Contributor

@cderb there are quite a few entries not found for gfx90a, could you help to double check? Thanks!

@cderb
Copy link
Contributor Author

cderb commented May 26, 2024

@cderb there are quite a few entries not found for gfx90a, could you help to double check? Thanks!

@junliume @JehandadKhan this is a side effect of the targetid being added to the-mcpu argument.
The kernels were generated on a machine with '-mcpu=gfx90a:xnack-'
While the ci machine appears to be using '-mcpu=gfx90a:sramecc+:xnack-'
This of course causes a miss when looking up these arguments.

@junliume
Copy link
Contributor

@atamazov can we revert this change?

targetid being added to the-mcpu argument

@atamazov
Copy link
Contributor

@junliume

@atamazov can we revert this change?

targetid being added to the-mcpu argument

I do not think so. More info here: #2309 (comment).

I am investigating what needs to be done.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 LGTM

@atamazov
Copy link
Contributor

atamazov commented May 27, 2024

However, I highly recommend getting back to using different databases for targets with different xnack and sramecc features. Otherwise we potentially losing performance. This is already partially implemented, KDB names include target features, see #2157 (comment). How the kernels are compiled for KDB is another question...

See also #2891 (comment)

Note that we don't need to generate DBs for all possible combinations. We need to do this only for the most common combinations.

@cderb
Copy link
Contributor Author

cderb commented May 28, 2024

@atamazov can we revert this change?

targetid being added to the-mcpu argument

@junliume if the kernel arguments change is reverted, then the code changes in this PR are no longer necessary. The kdb changes would also be inaccurate. If the assembler changes are causing an issue, then I can generate the kdbs once more. Otherwise we can throw out this PR.

@junliume
Copy link
Contributor

junliume commented May 28, 2024

@junliume

@atamazov can we revert this change?

targetid being added to the-mcpu argument

I do not think so. More info here: #2309 (comment).

I am investigating what needs to be done.

@atamazov actually I have the opposite impression, since we build kernel caches using offline compiler (not hipRTC), then the code generated without sramecc or xnack should work in a more general way: e.g. offline compiler generated kernel caches without sramecc or xnack specifics, like "gfx90a" alone, should work on gfx90a regardless of how the GPU reports these detailed features.

update: discussed #2891 (comment)

@atamazov @JehandadKhan please let me know if the above impression is mistaken. Thanks!

If a GPU reports XNACK feature then the code generated without xnack will likely work incorrectly.
If a GPU does not report XNACK then the code generated with xnack may work slower than it potentially can.
If a GPU reports SRAMECC feature then the code generated without sramecc will likely work incorrectly.
If a GPU does not report SRAMECC then the code generated with sramecc may work slower than it potentially can.

@cderb
Copy link
Contributor Author

cderb commented May 28, 2024

Confirmed that dbsync will pass with no kdb changes if we revert this change:

else if(miopen::EndsWith(program_name, ".s"))
{
params += " -mcpu=" + LcOptionTargetStrings{this->GetTargetProperties()}.targetId;
}

@atamazov
Copy link
Contributor

atamazov commented May 28, 2024

#2996 (comment) is updated (clarified)

@atamazov
Copy link
Contributor

atamazov commented May 29, 2024

@junliume

since we build kernel caches using offline compiler (not hipRTC),...

The reason is different, I guess. Offline compiler supports xnack target feature at least (well, there is some space for improvement wrt sramecc). But when Tuna builds kernels for KDBs, it uses some special hackery (MIOPEN_DEVICE_ARCH) to substitute actual GPU type and other target info with anything that Tuna wants. This allows to build kernels even on GPUless machines.

then the code generated without sramecc or xnack should work in a more general way: e.g. offline compiler generated kernel caches without sramecc or xnack specifics, like "gfx90a" alone, should work on gfx90a regardless of how the GPU reports these detailed features

Well, actually, if xnack is not specified but the GPU has target feature. then compiler assumes xnack+. And this is not free from the performance POV. I do not know the actual impact for our kernels, but it seems like current system KDB has some space for improvement.

The same considerations apply to sramecc.

If we want to get the best performance for certain targets (I believe we do want this), then we need to

  • identify the exact list of targets that we need to produce system KDBs
  • update Tuna and related stuff to produce KDBs for these targets

IIRC the full list of potential targets for KDB is like this:

gfx900
gfx906:sramecc+:xnack-
gfx906:sramecc+:xnack+
-----------------------------------------------------
gfx908:sramecc+:xnack-
gfx908:sramecc+:xnack+
-----------------------------------------------------
gfx90a:sramecc-:xnack-
gfx90a:sramecc-:xnack+
gfx90a:sramecc+:xnack-
gfx90a:sramecc+:xnack+
-----------------------------------------------------
gfx940/941/942:sramecc-:xnack-
gfx940/941/942:sramecc-:xnack+
gfx940/941/942:sramecc+:xnack-
gfx940/941/942:sramecc+:xnack+
-----------------------------------------------------
gfx1030
gfx1031
gfx1100
gfx1101
gfx1102

Again, we don't need to generate KDBs for all possible combinations, only for the important ones.

/cc @junliume @cderb @JehandadKhan

@atamazov
Copy link
Contributor

@junliume I am going to open a ticket about "KDB optimization" soon.

@cderb
Copy link
Contributor Author

cderb commented May 30, 2024

Discarding this change.

@cderb cderb closed this May 30, 2024
@junliume junliume deleted the cderb/dbsync_argfix branch June 25, 2024 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants