Releases: vortexgpgpu/vortex
Release v2.2
This release includes the following major changes and fixes:
- New vx_spawn_threads kernel launch API supporting 3D task-partitioning.
- Using the ../configure script without parameters to update the build repository during development.
- Support for the ZICOND RISC-V extension for branchless conditionals.
- OpenCL compiler migration from warp-level to thread-level scheduling.
- Support for OpenCL's just-in-time compilation.
- Support for OpenCL's 64-bit kernel.
- Support for Vortex runtime dynamic loading for driver-specific implementations simplifies linking for Vortex applications.
- Updated README instructions.
- New Xilinx FPGA setup documentation.
- Enabled Full logic synthesis test using Yosys.
- Added cache support for hierarchical flush.
- Added cache support for write-back mode with configurable dirty bytes.
- RTL scoreboard and operand speed optimization.
- Support for Ramulator 2.0 with HBM memory configuration.
- Migration to Verilator 5.0.
- Migration to LLVM 18.0.
- New Stencil3D regression test.
- Fixed Xilinx FPGA synthesis for cores with more than 256 threads.
- Updated Centos 7.9 toolchain
- Migration from Travis CI to GitHub CI workflow.
Release v2.1
This release includes the following major changes and fixes:
- new build configuration script to isolate the sources from the build directory
- added spawn_taskgroups kernel API for running kernels that use local memory and barriers (see tests/regression/sgemm2x)
- new runtime extension for relocatable kernel binary and arguments.
- new runtime memory API additions: vx_mem_reserve, vx_mem_access, vx_mem_address
- new runtime vx_check_occupancy API
- added GPU driver option to test OpenCL tests on local GPU (e.g. blackbox.sh --driver=gpu --app=sgemm)
- added OpenCL tests that use with local memory (psum, sgemm2, sgemm3)
- added vortex custom libc and librt libraries with control divergence instrumentation
- added memory coalescing support
- reduced CSR instructions pipeline stalls
- optimized split/join h/w area overhead with new split_n, pred_n inverted predicate instructions.
Release v2.x
Merge branch 'develop'
Release v1.x
minor update
Release v0.2.3
External Interface Refactoring for Third Party Integration
This new release includes major changes to Vortex’s external interface that will simplify integration with third party designs. These changes include; (1) memory mapped CSRs, (2) _ebreak _signal removal. To support memory mapped CSRs, we had to first added support for non-cacheable memory such that CSR write requests from the kernel will bypass the cache subsystem to go directly to memory. Details about individual features are described below.
New Features
- Non-Cacheable Memory
A new module VX_nc_bypass was added to the cache top module to detect requests to I/O memory regions (defined in the configuration file VX_config.vh) and redirect those requests to memory, bypassing its normal caching operation. This was implemented by extending the cache request tag interface with a I/O bypass flag that is computed inside the Load/Store Unit based on the address range. _VX_nc_bypass _manages core request to memory bypassing as well as memory response to core bypassing for I/O addresses.
- Memory Mapped CSRs
The original Vortex’s external interface had CSR request/response ports to allow the host processor to read the content of the CSR registers. This interface was mainly used for gathering performance counters. This feature removed that external interface from Vortex and instead implemented the performance counters support via memory mapped I/O. More specifically, we reserved a memory space for storing the performance counters and then added a new stage into the application exit routine to dump the performance counters to memory. Now, the host application reads the performance from a dedicated memory region instead of using a dedicated I/O bus.
- Multi-Bank Memory Support
Original Vortex implementation was using a single memory bank to handle all the memory transactions. This feature extends the command processor (AFU) module to expose the memory banks to the Vortex processor. Our current FPGA devices include Intel Arria 10 and Stratix 10 that support 2 memory channels and 8 memory channels respectively.
- OpenCL Debug Printf
This feature takes advantage of the new no-cacheable memory feature to support debug printf interface for OpenCL applications. Most of the changes related to this feature were implemented in our POCL codebase (https://github.com/vortexgpgpu/pocl).
- Memory Fence Support
This feature is about adding support for the RISC-V data fence extension. This work was completed last semester in our private repository and finally ported into the public repository.
Changes & Improvements
-
Documentation
- The public repository now includes a doc folder where we have the current documentation for the processor.
- ebreak external Interface cleanup
- The Vortex public interface used to have an ebreak signal that was used in simulation to trap the returned exitcode of RISC-V unit tests. This change removes the signal from the external interface and instead uses an internal debug interface to retrieve the exitcode.
-
New regression tests
- Io_addr: non-cacheable memory test
- Diverge: branch divergence test
- Fence: fence feature test
- mtress: memory stress
- printf: opencl printf test
- sort: parallel sort benchmark
-
Tests folders reorganization
- We reorganized all Vortex tests into one test location which includes OpenCL benchmark, driver tests, runtime tests.
-
Regression Tests Migration to travis.org
- Vortex was using travis.com for the continuous integration tests but the service was discontinued last month. This task is about migrating our regression tests to the new service travis.com.
Bug Fixes
- Shared Memory Bug
- This was a synchronization bug in the dcache/shared memory arbiter.