DJL not working for VLLM #2645

vishalkumardas · 2024-12-21T17:01:59Z

Description

After deploying the finetuned model to the sagemaker endpoint using DJL, I am getting the error chunk reading inturrupted

2024-12-21T16:52:30.583Z | [INFO ] PyProcess - W-165-model-stdout: INFO::Text generation completed successfully
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: INFO::JSON parsing completed successfully
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: INFO::Cleared GPU memory
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: INFO::GPU 0 Memory: Total: 21.98GB Reserved: 6.05GB Allocated: 5.99GB Free: 15.93GB
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: INFO::Inference process completed {
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: "code": 200,
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: "message": "OK",
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: "properties": {},
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: "content": {
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: "null": "type: <class 'bytearray'>"
| 2024-12-21T16:52:30.834Z | [INFO ] PyProcess - W-165-model-stdout: }
| 2024-12-21T16:52:35.750Z | [INFO ] PyProcess - W-165-model-stdout: }
| 2024-12-21T16:53:23.419Z | [WARN ] InferenceRequestHandler - Chunk reading interrupted
| 2024-12-21T16:53:23.419Z | java.lang.IllegalStateException: Read chunk timeout.
| 2024-12-21T16:53:23.419Z | #011at ai.djl.inference.streaming.ChunkedBytesSupplier.next(ChunkedBytesSupplier.java:79) ~[api-0.30.0.jar:?]
| 2024-12-21T16:53:23.419Z | #011at ai.djl.inference.streaming.ChunkedBytesSupplier.nextChunk(ChunkedBytesSupplier.java:93) ~[api-0.30.0.jar:?]
| 2024-12-21T16:53:23.419Z | #011at ai.djl.serving.http.InferenceRequestHandler.sendOutput(InferenceRequestHandler.java:418) ~[serving-0.30.0.jar:?]

Serving.properties file:
engine=Python
option.model_id=.
option.entryPoint=inference.py
option.tensor_parallel_degree=1
option.rolling_batch=vllm
option.device_type=auto
option.dtype=bf16
option.task=text-generation
option.max_rolling_batch_size=8
option.enable_memory_optimization=true
option.model_loading_timeout=1200

inference.py:

def inference(self, inputs: Input) -> Output:
output = Output()
try:
# Parse input JSON
input_data = inputs.get_as_json()
text = input_data.get('text', '')
system_prompt = input_data.get('system_prompt', '')
# Perform prediction
prediction = predict_fn(text, system_prompt, self.model_dict)
# Add serialized JSON directly to content
output.add_as_json(prediction)
self.logger.info(f"Inference process completed {output}")
except Exception as e:
logger.error(f"Inference error: {str(e)}", exc_info=True)
# Create error output with required parameters
output.error(str(e))
return output

DongZhaoXiong · 2024-12-22T03:01:06Z

try to set option.rolling_batch=disable

vishalkumardas · 2024-12-22T06:36:17Z

@DongZhaoXiong Thanks for your reply, it finally worked.
can you also help me with the optimization, currently it takes 7.3 seconds for the inferencing for 5-page invoices.
To give you the context, we are working on extracting the fields from the invoices. For this experiment, I took 5 page document pdf. I want to reduce the inference time. I'm using a g5.4xlarge GPU instance.

vishalkumardas · 2024-12-30T08:57:01Z

@DongZhaoXiong @sindhuvahinis @siddvenk Today I am getting the same error again:
| 2024-12-30T16:53:23.419Z | [WARN ] InferenceRequestHandler - Chunk reading interrupted

Below is my serving.properties
engine=Python
option.model_id=.
option.entryPoint=inference.py
option.tensor_parallel_degree=1
option.rolling_batch=disable
option.device_type=auto
option.dtype=fp16
option.task=text-generation
option.max_rolling_batch_size=8
option.enable_memory_optimization=true
option.model_loading_timeout=1200

vishalkumardas added the bug Something isn't working label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DJL not working for VLLM #2645

DJL not working for VLLM #2645

vishalkumardas commented Dec 21, 2024

DongZhaoXiong commented Dec 22, 2024

vishalkumardas commented Dec 22, 2024 •

edited

Loading

vishalkumardas commented Dec 30, 2024 •

edited

Loading

DJL not working for VLLM #2645

DJL not working for VLLM #2645

Comments

vishalkumardas commented Dec 21, 2024

Description

DongZhaoXiong commented Dec 22, 2024

vishalkumardas commented Dec 22, 2024 • edited Loading

vishalkumardas commented Dec 30, 2024 • edited Loading

vishalkumardas commented Dec 22, 2024 •

edited

Loading

vishalkumardas commented Dec 30, 2024 •

edited

Loading