-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] return hidden states #3364
base: main
Are you sure you want to change the base?
Conversation
fdbd188
to
7c73a30
Compare
This is good to see. But could change our documents to demonstrate the usage and add unit tests to your feature? |
Thanks. I will try to get some one familiar with hidden state to help. |
https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_generation_models.py You can check this, may gonna help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Also, could you add the API also to the server, not only the engine. Like how we do for update_weights_from_dist
. You can use Engine API and Server / HTTPS API.
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"llm.shutdown()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think we should not change the docs of offline API. Instead, we should change this.
https://docs.sglang.ai/backend/native_api.html
Also, I think that the best to do this is not add an serving arguement, but rather make a new native API instead. Just like:
@app.post("/update_weights_from_distributed") |
And this:
def update_weights_from_distributed(self, name: str, dtype, shape): |
This could be much easier to use and do not need to launch a specific engine, which cost a lot of time in the docs CI.
Also, update the beginning of the native API docs. https://docs.sglang.ai/backend/native_api.html Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs: /generate (text generation model) /get_model_info /get_server_info /health /health_generate /flush_cache /update_weights /encode(embedding model) /classify(reward model) We mainly use requests to test these APIs in the following examples. You can also use curl. |
You can add one seperate test file as Just like test_update_weights_from_disk in the run_suite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test looks good.
CaptureHiddenMode.FULL | ||
if self.model_runner.server_args.return_hidden_states | ||
else ( | ||
spec_info.capture_hidden_mode | ||
if spec_info | ||
else CaptureHiddenMode.NULL | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaochenyang20 What I meant is here. It seems like it is necessary for the capture_hidden_mode
to be known at engine init time. Otherwise, the decode cuda graph will not contain the return hidden state logic and this cant be changed by sampling args.
https://github.com/sgl-project/sglang/actions/runs/13235087533/job/36938449704?pr=3364 This needs to update the time out of the CI @Jackmin801 |
Motivation
This PR intends to add the
return_hidden_states
argument to ServerArgs which makes the results contain the last layer hidden states inoutput["meta_info"]["hidden_states"]
.These hidden states are useful for example for verifying computations. (e.g. https://arxiv.org/abs/2501.16007)
Modifications
return_hidden_states
to ServerArgscapture_hidden_mode
to accomodatereturn_hidden_states
return_hidden_states
andhidden_states
to necessary dataclassesScript used to test changes
Checklist