Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for growable memory and single state buffers #104

Merged
merged 7 commits into from
Nov 15, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions protobuf/model_config.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1382,6 +1382,24 @@ message ModelSequenceBatching
//@@ The optional field to specify the initial state for the model.
//@@
repeated InitialState initial_state = 5;

//@@ .. cpp:var:: bool use_single_buffer
//@@
//@@ The optional field to use a single buffer for both input and output state.
//@@ The default value is false.
//@@
bool use_single_buffer = 6;
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved

//@@ .. cpp:var:: bool use_growable_memory
//@@
//@@ The optional field to allow an implicit state buffer to grow or shrink
rmccorm4 marked this conversation as resolved.
Show resolved Hide resolved
//@@ when the size changes during a sequence. When using this option Triton
//@@ guarantess that it will use the same buffer even if the state size changes.
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
//@@ Currently, this option only applies for implicit state that uses CUDA and
//@@ use_single_buffer must be enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we force the implicit state memory used in growable block manager to be GPU memory for now. Maybe the wording here could be clearer that regardless of what the user does, this will use GPU memory for now, even if they request other memory types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we error if they try different types? Just double checking - we don't silently switch preferences, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we error if they try different types?

What the user specifies is just "preferences" (i.e., there is no guarantee that Triton will satisfy the request). The same pattern exists in TRITONBackend_OutputBuffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is which options take precedence - does the model config or the API call to get the buffer - will discuss offline

//@@ The default value is false.
//@@
bool use_growable_memory = 7;
nnshah1 marked this conversation as resolved.
Show resolved Hide resolved
}

//@@ .. cpp:var:: message StrategyDirect
Expand Down
Loading