Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge develop branch into master for upcoming 1.0.x release #54

Merged
merged 83 commits into from
May 5, 2016
Merged

Conversation

whchung
Copy link
Collaborator

@whchung whchung commented May 5, 2016

Major features introduced:

  • auto detect AMD GPU architecture at cmake time (supports kaveri/carrizo/fiji now)
  • separate grid_launch header into 2 parts for better compatiblity with applications depending on libstdc++
  • promote kalmar_defines.h to hc_defines.h
  • performance improvements on lane shuffling functions
  • add hardware cycle counter function
  • adopt FNV-1a hash algorithm to speed up kernel code object lookup
  • some changes / fixes in unit tests
  • atomic wrap inc/dec functions
  • overhaul builtin functions based on LLVM IR
  • fix HCC runtime for certain applications

With HSAIL backend, there is only one failing unit now. @scchan would take care of it.

Failing Tests (1):
CPPAMP :: Unit/HSAIL/shfl_xor.cpp

aditya4d and others added 30 commits April 14, 2016 12:34
removed grid launch constructor to remove runtime errors
Instead of hardcoding the HSA_AMDGPU_GPU_TARGET at compile time,
autodetect it at runtime from the KFD topology.

Change-Id: I00af68084869ab4d439e70cf8816c1c8868f224d
[CMake] Autodetect HSA_AMDGPU_GPU_TARGET
Use new workitem intrinsics + range metadata, correct
some attributes on functions, and canonicalize.

Correct range metadata to be maximum theoretical workgroup size.

Change-Id: I9dedbe2dd62753858ccd0eb7841e228873a2c031
Cleanup wrapper IR functions
Compile with codes that use restrict for other purposes.
Need to move this code so we can re-enable the optimization.
this will improve hcc runtime performance when multiple kernels are used
in a program
use FNV-1a for kernel indexing instead of md5
Use lit config variables which would be initialized at cmake configuration time.
whchung and others added 28 commits May 3, 2016 22:34
Use @llvm.readcyclecounter() intrinsic, which would be lowered to
s_memtime GCN ISA.

This fixes one failing hcc unit test (HSAIL/clock.cpp).
A new unit test is introduced to check API hc::__cycle_u64()
As there is no corresponding HSAIL instruction, we make this function always return 0 for HSAIL backend
s_memrealtime keeps a constant clock frequency and is not affected by DPVS
Implement with s_memtime ISA.
Define const member function if needed
erase empty elements to prevent the map size growing
shfl implementation for LC
New ROCm KFD has solved race condition issues. Increase test threads from 2 to 8.
This fixes 2 failing unit tests:
- memcpy_symbol1
- memcpy_symbol3
The original size was 104, which was too small for different kernel code objects
with nearly identical kernel names. Change it to 512 to fix failing unit tests.
Fix unit tests which doesn't take into consideration that not necessary all hc::accelerator
are HSA agents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.