-
Notifications
You must be signed in to change notification settings - Fork 104
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #426 from Chia-Network/develop
Version 3.1.0 - Add CUDA disk-hybrid mode with 128G of system DRAM. - Add integrate plot checker into CUDA plotter. - Exposes `--no-direct-io` to disable direct-IO to the output plot directory. - Fix some related issues on Windows. - Fix bug where some plots overflowed slice buffers. - Fix build issues and other trivial issues. - Expose experimental/WIP CUDA 16G disk -hybrid mode on Linux. - Update README with CUDA and compression information.
Showing
79 changed files
with
5,724 additions
and
2,148 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
3.0.0 | ||
3.1.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/usr/bin/env bash | ||
set -e | ||
_dir=$(cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd) | ||
cd $_dir | ||
|
||
build_dir=build-release | ||
mkdir -p ${build_dir} | ||
cd ${build_dir} | ||
|
||
cmake .. -DCMAKE_BUILD_TYPE=Release | ||
cmake --build . --target bladebit_cuda --config Release --clean-first -j24 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,385 @@ | ||
#include "GpuStreams.h" | ||
#include "GpuQueue.h" | ||
#include "plotting/DiskBucketBuffer.h" | ||
#include "plotting/DiskBuffer.h" | ||
|
||
|
||
/// | ||
/// DownloadBuffer | ||
/// | ||
void* GpuDownloadBuffer::GetDeviceBuffer() | ||
{ | ||
const uint32 index = self->outgoingSequence % self->bufferCount; | ||
|
||
CudaErrCheck( cudaEventSynchronize( self->events[index] ) ); | ||
|
||
return self->deviceBuffer[index]; | ||
} | ||
|
||
void* GpuDownloadBuffer::LockDeviceBuffer( cudaStream_t stream ) | ||
{ | ||
ASSERT( self->lockSequence >= self->outgoingSequence ); | ||
ASSERT( self->lockSequence - self->outgoingSequence < self->bufferCount ); | ||
|
||
const uint32 index = self->lockSequence % self->bufferCount; | ||
self->lockSequence++; | ||
|
||
// Wait for the device buffer to be free to be used by kernels | ||
CudaErrCheck( cudaStreamWaitEvent( stream, self->events[index] ) ); | ||
return self->deviceBuffer[index]; | ||
} | ||
|
||
void GpuDownloadBuffer::Download( void* hostBuffer, const size_t size ) | ||
{ | ||
Download2D( hostBuffer, size, 1, size, size ); | ||
} | ||
|
||
void GpuDownloadBuffer::Download( void* hostBuffer, const size_t size, cudaStream_t workStream, bool directOverride ) | ||
{ | ||
Download2D( hostBuffer, size, 1, size, size, workStream, directOverride ); | ||
} | ||
|
||
void GpuDownloadBuffer::DownloadAndCopy( void* hostBuffer, void* finalBuffer, const size_t size, cudaStream_t workStream ) | ||
{ | ||
Panic( "Unavailable" ); | ||
// ASSERT( self->outgoingSequence < BBCU_BUCKET_COUNT ); | ||
// ASSERT( hostBuffer ); | ||
// ASSERT( workStream ); | ||
// ASSERT( self->lockSequence > 0 ); | ||
// ASSERT( self->outgoingSequence < self->lockSequence ); | ||
// ASSERT( self->lockSequence - self->outgoingSequence <= self->bufferCount ); | ||
|
||
// auto& cpy = self->copies[self->outgoingSequence]; | ||
// cpy.self = self; | ||
// cpy.sequence = self->outgoingSequence; | ||
// cpy.copy.hostBuffer = finalBuffer; | ||
// cpy.copy.srcBuffer = hostBuffer; | ||
// cpy.copy.size = size; | ||
|
||
|
||
// const uint32 index = self->outgoingSequence % self->bufferCount; | ||
// self->outgoingSequence++; | ||
|
||
// void* pinnedBuffer = self->pinnedBuffer[index]; | ||
// const void* devBuffer = self->deviceBuffer[index]; | ||
|
||
// // Signal from the work stream when it has finished doing kernel work with the device buffer | ||
// CudaErrCheck( cudaEventRecord( self->readyEvents[index], workStream ) ); | ||
|
||
|
||
// // Ensure the work stream has completed writing data to the device buffer | ||
// cudaStream_t stream = self->queue->_stream; | ||
|
||
// CudaErrCheck( cudaStreamWaitEvent( stream, self->readyEvents[index] ) ); | ||
|
||
// // Copy | ||
// CudaErrCheck( cudaMemcpyAsync( hostBuffer, devBuffer, size, cudaMemcpyDeviceToHost, stream ) ); | ||
|
||
// // Signal that the device buffer is free to be re-used | ||
// CudaErrCheck( cudaEventRecord( self->events[index], stream ) ); | ||
|
||
// // Launch copy command | ||
// CudaErrCheck( cudaLaunchHostFunc( stream, []( void* userData ){ | ||
|
||
// const CopyInfo& c = *reinterpret_cast<CopyInfo*>( userData ); | ||
// IGpuBuffer* self = c.self; | ||
|
||
// auto& cmd = self->queue->GetCommand( GpuQueue::CommandType::Copy ); | ||
// cmd.copy.info = &c; | ||
|
||
// self->queue->SubmitCommands(); | ||
|
||
// // Signal the download completed | ||
// self->fence.Signal( ++self->completedSequence ); | ||
// }, &cpy ) ); | ||
} | ||
|
||
void GpuDownloadBuffer::DownloadWithCallback( void* hostBuffer, const size_t size, GpuDownloadCallback callback, void* userData, cudaStream_t workStream, bool directOverride ) | ||
{ | ||
Download2DWithCallback( hostBuffer, size, 1, size, size, callback, userData, workStream, directOverride ); | ||
} | ||
|
||
void GpuDownloadBuffer::Download2D( void* hostBuffer, size_t width, size_t height, size_t dstStride, size_t srcStride, cudaStream_t workStream, bool directOverride ) | ||
{ | ||
Download2DWithCallback( hostBuffer, width, height, dstStride, srcStride, nullptr, nullptr, workStream, directOverride ); | ||
} | ||
|
||
void GpuDownloadBuffer::Download2DWithCallback( void* hostBuffer, size_t width, size_t height, size_t dstStride, size_t srcStride, | ||
GpuDownloadCallback callback, void* userData, cudaStream_t workStream, bool directOverride ) | ||
{ | ||
PerformDownload2D( hostBuffer, width, height, dstStride, srcStride, | ||
callback, userData, | ||
workStream, directOverride ); | ||
} | ||
|
||
void GpuDownloadBuffer::PerformDownload2D( void* hostBuffer, size_t width, size_t height, size_t dstStride, size_t srcStride, | ||
GpuDownloadCallback postCallback, void* postUserData, | ||
cudaStream_t workStream, bool directOverride ) | ||
{ | ||
PanicIf( !(hostBuffer || self->pinnedBuffer[0] ), "" ); | ||
ASSERT( workStream ); | ||
ASSERT( self->lockSequence > 0 ); | ||
ASSERT( self->outgoingSequence < self->lockSequence ); | ||
ASSERT( self->lockSequence - self->outgoingSequence <= self->bufferCount ); | ||
|
||
const uint32 index = self->outgoingSequence++ % self->bufferCount; | ||
|
||
void* pinnedBuffer = self->pinnedBuffer[index]; | ||
void* finalHostBuffer = hostBuffer; | ||
const void* devBuffer = self->deviceBuffer[index]; | ||
|
||
const bool isDirect = (directOverride || self->pinnedBuffer[0] == nullptr) && !self->diskBuffer; ASSERT( isDirect || self->pinnedBuffer[0] ); | ||
const bool isSequentialCopy = dstStride == srcStride; | ||
const size_t totalSize = height * width; | ||
|
||
|
||
// Signal from the work stream when it has finished doing kernel work with the device buffer | ||
CudaErrCheck( cudaEventRecord( self->workEvent[index], workStream ) ); | ||
|
||
// From the download stream, wait for the work stream to finish | ||
cudaStream_t downloadStream = self->queue->_stream; | ||
CudaErrCheck( cudaStreamWaitEvent( downloadStream, self->workEvent[index] ) ); | ||
|
||
|
||
if( self->diskBuffer ) | ||
{ | ||
// Wait until the next disk buffer is ready for use. | ||
// This also signals that the pinned buffer is ready for re-use | ||
CallHostFunctionOnStream( downloadStream, [this](){ | ||
self->diskBuffer->GetNextWriteBuffer(); | ||
}); | ||
|
||
pinnedBuffer = self->diskBuffer->PeekWriteBufferForBucket( self->outgoingSequence-1 ); | ||
} | ||
|
||
if( !isDirect ) | ||
{ | ||
// Ensure that the pinned buffer is ready for use | ||
// (we signal pinned buffers are ready when using disks without events) | ||
if( !self->diskBuffer ) | ||
CudaErrCheck( cudaStreamWaitEvent( downloadStream, self->pinnedEvent[index] ) ); | ||
|
||
// Set host buffer as the pinned buffer | ||
hostBuffer = pinnedBuffer; | ||
} | ||
|
||
|
||
// Copy from device to host buffer | ||
// #NOTE: Since the pinned buffer is simply the same size (a full bucket) as the device buffer | ||
// we also always copy as 1D if we're copying to our pinned buffer. | ||
ASSERT( hostBuffer ); | ||
if( isSequentialCopy || hostBuffer == pinnedBuffer ) | ||
CudaErrCheck( cudaMemcpyAsync( hostBuffer, devBuffer, totalSize, cudaMemcpyDeviceToHost, downloadStream ) ); | ||
else | ||
CudaErrCheck( cudaMemcpy2DAsync( hostBuffer, dstStride, devBuffer, srcStride, width, height, cudaMemcpyDeviceToHost, downloadStream ) ); | ||
|
||
// Dispatch a host callback if one was set | ||
if( postCallback ) | ||
{ | ||
CallHostFunctionOnStream( downloadStream, [=](){ | ||
(*postCallback)( finalHostBuffer, totalSize, postUserData ); | ||
}); | ||
} | ||
|
||
|
||
// Signal that the device buffer is free to be re-used | ||
CudaErrCheck( cudaEventRecord( self->deviceEvents[index], downloadStream ) ); | ||
|
||
if( self->diskBuffer ) | ||
{ | ||
// If it's a disk-based copy, then write the pinned buffer to disk | ||
CallHostFunctionOnStream( downloadStream, [=]() { | ||
|
||
auto* diskBucketBuffer = dynamic_cast<DiskBucketBuffer*>( self->diskBuffer ); | ||
if( diskBucketBuffer != nullptr ) | ||
diskBucketBuffer->Submit( srcStride ); | ||
else | ||
static_cast<DiskBuffer*>( self->diskBuffer )->Submit( totalSize ); | ||
}); | ||
|
||
// #NOTE: We don't need to signal that the pinned buffer is ready for re-use here as | ||
// we do that implicitly with DiskBuffer::GetNextWriteBuffer (see above). | ||
} | ||
else if( !isDirect ) | ||
{ | ||
// #TODO: Do this in a different host copy stream, and signal from there. | ||
// #MAYBE: Perhaps use multiple host threads/streams to do host-to-host copies. | ||
// for now do it on the same download stream, but we will be blocking the download stream, | ||
// unless other download streams are used by other buffers. | ||
|
||
|
||
ASSERT( hostBuffer == pinnedBuffer ); | ||
if( isSequentialCopy ) | ||
CudaErrCheck( cudaMemcpyAsync( finalHostBuffer, hostBuffer, totalSize, cudaMemcpyHostToHost, downloadStream ) ); | ||
else | ||
CudaErrCheck( cudaMemcpy2DAsync( finalHostBuffer, dstStride, hostBuffer, srcStride, width, height, cudaMemcpyHostToHost, downloadStream ) ); | ||
|
||
// Signal the pinned buffer is free to be re-used | ||
CudaErrCheck( cudaEventRecord( self->pinnedEvent[index], downloadStream ) ); | ||
} | ||
} | ||
|
||
void GpuDownloadBuffer::CallHostFunctionOnStream( cudaStream_t stream, std::function<void()> func ) | ||
{ | ||
auto* fnCpy = new std::function<void()>( std::move( func ) ); | ||
CudaErrCheck( cudaLaunchHostFunc( stream, []( void* userData ) { | ||
|
||
auto& fn = *reinterpret_cast<std::function<void()>*>( userData ); | ||
fn(); | ||
delete& fn; | ||
|
||
}, fnCpy ) ); | ||
} | ||
|
||
void GpuDownloadBuffer::HostCallback( std::function<void()> func ) | ||
{ | ||
CallHostFunctionOnStream( self->queue->GetStream(), func ); | ||
} | ||
|
||
void GpuDownloadBuffer::GetDownload2DCommand( void* hostBuffer, size_t width, size_t height, size_t dstStride, size_t srcStride, | ||
uint32& outIndex, void*& outPinnedBuffer, const void*& outDevBuffer, GpuDownloadCallback callback, void* userData ) | ||
{ | ||
ASSERT( width ); | ||
ASSERT( height ); | ||
ASSERT( hostBuffer ); | ||
|
||
const uint32 index = self->outgoingSequence % self->bufferCount; | ||
|
||
// We need to block until the pinned buffer is available. | ||
if( self->outgoingSequence > self->bufferCount-1 ) | ||
self->fence.Wait( self->outgoingSequence - self->bufferCount + 1 ); | ||
|
||
void* pinnedBuffer = self->pinnedBuffer[index]; | ||
const void* devBuffer = self->deviceBuffer[index]; | ||
|
||
//auto& cmd = self->commands[index]; | ||
//cmd.type = GpuQueue::CommandType::Copy2D; | ||
//cmd.sequenceId = self->outgoingSequence++; | ||
//cmd.finishedSignal = &self->fence; | ||
//cmd.dstBuffer = hostBuffer; | ||
//cmd.srcBuffer = pinnedBuffer; | ||
//cmd.copy2d.width = width; | ||
//cmd.copy2d.height = height; | ||
//cmd.copy2d.dstStride = dstStride; | ||
//cmd.copy2d.srcStride = srcStride; | ||
//cmd.copy2d.callback = callback; | ||
//cmd.copy2d.userData = userData; | ||
|
||
outIndex = index; | ||
outPinnedBuffer = pinnedBuffer; | ||
outDevBuffer = devBuffer; | ||
} | ||
|
||
|
||
void GpuDownloadBuffer::DownloadAndPackArray( void* hostBuffer, const uint32 length, size_t srcStride, const uint32* counts, const uint32 elementSize ) | ||
{ | ||
ASSERT( length ); | ||
ASSERT( elementSize ); | ||
ASSERT( counts ); | ||
|
||
uint32 totalElements = 0; | ||
for( uint32 i = 0; i < length; i++ ) | ||
totalElements += counts[i]; | ||
|
||
const size_t totalSize = (size_t)totalElements * elementSize; | ||
|
||
uint32 index; | ||
void* pinnedBuffer; | ||
const void* devBuffer; | ||
GetDownload2DCommand( hostBuffer, totalSize, 1, totalSize, totalSize, index, pinnedBuffer, devBuffer ); | ||
|
||
|
||
srcStride *= elementSize; | ||
|
||
byte* dst = (byte*)pinnedBuffer; | ||
const byte* src = (byte*)devBuffer; | ||
|
||
cudaStream_t stream = self->queue->_stream; | ||
|
||
// Copy all buffers from device to pinned buffer | ||
for( uint32 i = 0; i < length; i++ ) | ||
{ | ||
const size_t copySize = counts[i] * (size_t)elementSize; | ||
|
||
// #TODO: Determine if there's a cuda (jagged) array copy | ||
CudaErrCheck( cudaMemcpyAsync( dst, src, copySize, cudaMemcpyDeviceToHost, stream ) ); | ||
|
||
src += srcStride; | ||
dst += copySize; | ||
} | ||
|
||
// Signal that the device buffer is free | ||
CudaErrCheck( cudaEventRecord( self->events[index], stream ) ); | ||
|
||
// Submit command to do the final copy from pinned to host | ||
CudaErrCheck( cudaLaunchHostFunc( stream, GpuQueue::CopyPendingDownloadStream, self ) ); | ||
} | ||
|
||
void GpuDownloadBuffer::WaitForCompletion() | ||
{ | ||
if( self->outgoingSequence > 0 ) | ||
{ | ||
//const uint32 index = (self->outgoingSequence - 1) % self->bufferCount; | ||
|
||
// cudaEvent_t event = self->completedEvents[index]; | ||
//const cudaError_t r = cudaEventQuery( event ); | ||
|
||
//if( r == cudaSuccess ) | ||
// return; | ||
|
||
//if( r != cudaErrorNotReady ) | ||
// CudaErrCheck( r ); | ||
|
||
//CudaErrCheck( cudaEventSynchronize( event ) ); | ||
|
||
|
||
cudaStream_t downloadStream = self->queue->_stream; | ||
// this->self->fence.Reset( 0 ); | ||
CallHostFunctionOnStream( downloadStream, [this](){ | ||
this->self->fence.Signal( this->self->outgoingSequence ); | ||
}); | ||
self->fence.Wait( self->outgoingSequence ); | ||
|
||
} | ||
} | ||
|
||
void GpuDownloadBuffer::WaitForCopyCompletion() | ||
{ | ||
if( self->outgoingSequence > 0 ) | ||
{ | ||
self->copyFence.Wait( self->outgoingSequence ); | ||
} | ||
} | ||
|
||
void GpuDownloadBuffer::Reset() | ||
{ | ||
self->lockSequence = 0; | ||
self->outgoingSequence = 0; | ||
self->completedSequence = 0; | ||
self->copySequence = 0; | ||
self->fence.Reset( 0 ); | ||
self->copyFence.Reset( 0 ); | ||
} | ||
|
||
GpuQueue* GpuDownloadBuffer::GetQueue() const | ||
{ | ||
return self->queue; | ||
} | ||
|
||
void GpuDownloadBuffer::AssignDiskBuffer( DiskBufferBase* diskBuffer ) | ||
{ | ||
// ASSERT( self->pinnedBuffer[0] ); | ||
|
||
void* nullBuffers[2] = { nullptr, nullptr }; | ||
if( self->diskBuffer ) | ||
self->diskBuffer->AssignWriteBuffers( nullBuffers ); | ||
|
||
self->diskBuffer = diskBuffer; | ||
if( self->diskBuffer ) | ||
self->diskBuffer->AssignWriteBuffers( self->pinnedBuffer ); | ||
} | ||
|
||
DiskBufferBase* GpuDownloadBuffer::GetDiskBuffer() const | ||
{ | ||
return self->diskBuffer; | ||
} |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
#pragma once | ||
|
||
#include "GpuStreams.h" | ||
#include <functional> | ||
|
||
class DiskQueue; | ||
|
||
struct GpuStreamDescriptor | ||
{ | ||
size_t entrySize; | ||
size_t entriesPerSlice; | ||
uint32 sliceCount; | ||
uint32 sliceAlignment; | ||
uint32 bufferCount; | ||
IAllocator* deviceAllocator; | ||
IAllocator* pinnedAllocator; | ||
DiskQueue* diskQueue; // DiskQueue to use when disk offload mode is enabled. | ||
const char* diskFileName; // File name to use when disk offload mode is enabled. The diskQueue must be set. | ||
bool bucketedDiskBuffer; // If true, a DiskBucketBuffer will be used instead of a DiskBuffer. | ||
bool directIO; // If true, direct I/O will be used when using disk offload mode. | ||
}; | ||
|
||
typedef std::function<void()> GpuCallbackDispath; | ||
|
||
class GpuQueue | ||
{ | ||
friend struct IGpuBuffer; | ||
friend struct GpuDownloadBuffer; | ||
friend struct GpuUploadBuffer; | ||
|
||
enum class CommandType | ||
{ | ||
None = 0, | ||
Copy, | ||
CopyArray, | ||
Callback, | ||
}; | ||
|
||
struct Command | ||
{ | ||
CommandType type; | ||
|
||
union | ||
{ | ||
struct CopyInfo* copy; | ||
|
||
struct { | ||
GpuDownloadCallback callback; | ||
size_t copySize; | ||
void* dstbuffer; | ||
void* userData; | ||
} callback; | ||
}; | ||
}; | ||
|
||
public: | ||
|
||
enum Kind | ||
{ | ||
Downloader, | ||
Uploader | ||
}; | ||
|
||
GpuQueue( Kind kind ); | ||
virtual ~GpuQueue(); | ||
|
||
static size_t CalculateSliceSizeFromDescriptor( const GpuStreamDescriptor& desc ); | ||
static size_t CalculateBufferSizeFromDescriptor( const GpuStreamDescriptor& desc ); | ||
|
||
//GpuDownloadBuffer CreateDownloadBuffer( void* dev0, void* dev1, void* pinned0, void* pinned1, size_t size = 0, bool dryRun = false ); | ||
//GpuDownloadBuffer CreateDownloadBuffer( const size_t size, bool dryRun = false ); | ||
GpuDownloadBuffer CreateDirectDownloadBuffer( size_t size, IAllocator& devAllocator, size_t alignment, bool dryRun = false ); | ||
GpuDownloadBuffer CreateDownloadBuffer( size_t size, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment, bool dryRun = false ); | ||
GpuDownloadBuffer CreateDownloadBuffer( size_t size, uint32 bufferCount, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment, bool dryRun = false ); | ||
|
||
GpuDownloadBuffer CreateDownloadBuffer( const GpuStreamDescriptor& desc, bool dryRun = false ); | ||
|
||
/// Create with descriptor and override entry size | ||
inline GpuDownloadBuffer CreateDownloadBuffer( const GpuStreamDescriptor& desc, size_t entrySize, bool dryRun = false ) | ||
{ | ||
GpuStreamDescriptor copy = desc; | ||
copy.entrySize = entrySize; | ||
|
||
return CreateDownloadBuffer( copy, dryRun ); | ||
} | ||
|
||
template<typename T> | ||
inline GpuDownloadBuffer CreateDownloadBufferT( const GpuStreamDescriptor& desc, bool dryRun = false ) | ||
{ | ||
return CreateDownloadBuffer( desc, sizeof( T ), dryRun ); | ||
} | ||
|
||
/// Create with descriptor and override entry size | ||
GpuUploadBuffer CreateUploadBuffer( const GpuStreamDescriptor& desc, bool dryRun = false ); | ||
|
||
// inline GpuUploadBuffer CreateUploadBuffer( const GpuStreamDescriptor& desc, bool size_t entrySize, bool dryRun = false ) | ||
// { | ||
// GpuStreamDescriptor copy = desc; | ||
// copy.entrySize = entrySize; | ||
|
||
// return CreateUploadBuffer( copy, dryRun ); | ||
// } | ||
|
||
template<typename T> | ||
inline GpuUploadBuffer CreateUploadBufferT( const GpuStreamDescriptor& desc, bool dryRun = false ) | ||
{ | ||
GpuStreamDescriptor copy = desc; | ||
copy.entrySize = sizeof(T); | ||
|
||
return CreateUploadBuffer( copy, dryRun ); | ||
// return CreateUploadBuffer( desc, sizeof( T ), dryRun ); | ||
} | ||
|
||
|
||
template<typename T> | ||
inline GpuDownloadBuffer CreateDirectDownloadBuffer( const size_t count, IAllocator& devAllocator, size_t alignment = alignof( T ), bool dryRun = false ) | ||
{ | ||
return CreateDirectDownloadBuffer( count * sizeof( T ), devAllocator, alignment, dryRun ); | ||
} | ||
|
||
template<typename T> | ||
inline GpuDownloadBuffer CreateDownloadBufferT( const size_t count, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment = alignof( T ), bool dryRun = false ) | ||
{ | ||
return CreateDownloadBuffer( count * sizeof( T ), devAllocator, pinnedAllocator, alignment, dryRun ); | ||
} | ||
|
||
template<typename T> | ||
inline GpuDownloadBuffer CreateDownloadBufferT( const size_t count, uint32 bufferCount, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment = alignof( T ), bool dryRun = false ) | ||
{ | ||
return CreateDownloadBuffer( count * sizeof( T ), bufferCount, devAllocator, pinnedAllocator, alignment, dryRun ); | ||
} | ||
|
||
//GpuUploadBuffer CreateUploadBuffer( void* dev0, void* dev1, void* pinned0, void* pinned1, size_t size = 0, bool dryRun = false ); | ||
//GpuUploadBuffer CreateUploadBuffer( const size_t size, bool dryRun = false ); | ||
GpuUploadBuffer CreateUploadBuffer( const size_t size, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment, bool dryRun = false ); | ||
|
||
template<typename T> | ||
inline GpuUploadBuffer CreateUploadBufferT( const size_t count, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment, bool dryRun = false ) | ||
{ | ||
return CreateUploadBuffer( count * sizeof( T ), devAllocator, pinnedAllocator, alignment, dryRun ); | ||
} | ||
|
||
inline cudaStream_t GetStream() const { return _stream; } | ||
|
||
protected: | ||
|
||
struct IGpuBuffer* CreateGpuBuffer( size_t size, IAllocator& devAllocator, IAllocator& pinnedAllocator, size_t alignment, bool dryRun ); | ||
struct IGpuBuffer* CreateGpuBuffer( const GpuStreamDescriptor& desc, bool dryRun ); | ||
|
||
void DispatchHostFunc( GpuCallbackDispath func, cudaStream_t stream, cudaEvent_t lockEvent, cudaEvent_t completedEvent ); | ||
|
||
static void CopyPendingDownloadStream( void* userData ); | ||
|
||
[[nodiscard]] | ||
Command& GetCommand( CommandType type ); | ||
void SubmitCommands(); | ||
|
||
// Copy threads | ||
static void QueueThreadEntryPoint( GpuQueue* self ); | ||
void QueueThreadMain(); | ||
|
||
void ExecuteCommand( const Command& cpy ); | ||
|
||
bool ShouldExitQueueThread(); | ||
|
||
protected: | ||
cudaStream_t _stream = nullptr; | ||
cudaStream_t _preloadStream = nullptr; | ||
cudaStream_t _callbackStream = nullptr; | ||
|
||
|
||
Thread _queueThread; | ||
//Fence _bufferReadySignal; | ||
Semaphore _bufferReadySignal; | ||
Fence _bufferCopiedSignal; | ||
Fence _syncFence; | ||
SPCQueue<Command, BBCU_BUCKET_COUNT*6> _queue; | ||
Kind _kind; | ||
|
||
AutoResetSignal _waitForExitSignal; | ||
std::atomic<bool> _exitQueueThread = false; | ||
|
||
// Support multiple threads to grab commands | ||
std::atomic<uint64> _cmdTicketOut = 0; | ||
std::atomic<uint64> _cmdTicketIn = 0; | ||
std::atomic<uint64> _commitTicketOut = 0; | ||
std::atomic<uint64> _commitTicketIn = 0; | ||
}; |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Navigate to the script's directory | ||
$scriptPath = Split-Path -Path $MyInvocation.MyCommand.Definition -Parent | ||
Set-Location -Path $scriptPath | ||
|
||
# Arguments | ||
$ver_component = $args[0] # The user-specified component from the full version | ||
|
||
# Read the version from the file | ||
$version_str = (Get-Content 'VERSION' | Select-Object -First 1 | Out-String).Trim() | ||
$bb_version_suffix = (Get-Content 'VERSION' | Select-Object -Last 1 | Out-String).Trim() | ||
$version_header = 'src\Version.h' | ||
|
||
if ($version_str -eq $bb_version_suffix) { | ||
$bb_version_suffix = "" | ||
} | ||
|
||
# Prepend a '-' to the suffix, if necessary | ||
if (-Not [string]::IsNullOrEmpty($bb_version_suffix) -and $bb_version_suffix[0] -ne '-') { | ||
$bb_version_suffix = "-$bb_version_suffix" | ||
} | ||
|
||
# Parse the major, minor, and revision numbers | ||
$bb_ver_maj, $bb_ver_min, $bb_ver_rev = $version_str -split '\.' | ForEach-Object { $_.Trim() } | ||
|
||
# Get the Git commit hash | ||
$bb_git_commit = $env:GITHUB_SHA | ||
if ([string]::IsNullOrEmpty($bb_git_commit)) { | ||
$bb_git_commit = & git rev-parse HEAD | ||
} | ||
|
||
if ([string]::IsNullOrEmpty($bb_git_commit)) { | ||
$bb_git_commit = "unknown" | ||
} | ||
|
||
# Check if the user wants a specific component | ||
if (-Not [string]::IsNullOrEmpty($ver_component)) { | ||
switch ($ver_component) { | ||
"major" { | ||
Write-Host -NoNewline $bb_ver_maj | ||
} | ||
"minor" { | ||
Write-Host -NoNewline $bb_ver_min | ||
} | ||
"revision" { | ||
Write-Host -NoNewline $bb_ver_rev | ||
} | ||
"suffix" { | ||
Write-Host -NoNewline $bb_version_suffix | ||
} | ||
"commit" { | ||
Write-Host -NoNewline $bb_git_commit | ||
} | ||
default { | ||
Write-Error "Invalid version component '$ver_component'" | ||
exit 1 | ||
} | ||
} | ||
exit 0 | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.