Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for growable memory and single state buffers #104

Merged
merged 7 commits into from
Nov 15, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions protobuf/model_config.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1382,6 +1382,39 @@ message ModelSequenceBatching
//@@ The optional field to specify the initial state for the model.
//@@
repeated InitialState initial_state = 5;

//@@ .. cpp:var:: bool use_same_buffer_for_input_output
//@@
//@@ The optional field to use a single buffer for both input and output
//@@ state. Without this option, Triton allocates separate buffers
//@@ for input and output state
//@@ which can be problematic if the state size is
//@@ large. This option reduces the memory usage by allocating a single
//@@ buffer. Enabling this option is recommended whenever
//@@ the input state is processed before the output state is written.
//@@ When enabled the state
//@@ will always be updated independent of whether
//@@ TRITONBACKEND_StateUpdate is called
//@@ (however TRITONBACKEND_StateUpdate should still be called for
//@@ completeness).
//@@
//@@ The default value is false.
//@@
bool use_same_buffer_for_input_output = 6;

//@@ .. cpp:var:: bool use_growable_memory
//@@
//@@ The optional field to enable an implicit state buffer to grow
// without
//@@ reallocating or copying existing memory.
//@@ Additional memory will be appended to the end of the buffer and
//@@ existing data will be preserved.
//@@ This option is only available for CUDA memory and requires enabling
//@@ use_same_buffer_for_input_output.
//@@
//@@ The default value is false.
//@@
bool use_growable_memory = 7;
}

//@@ .. cpp:var:: message StrategyDirect
Expand Down