Support LLM Inference (vLLM, OpenAI format interface) #1607

jieguangzhou · 2023-12-27T08:13:33Z

#1601
#1536

Description

Related Issues

Checklist

Is this code covered by new or existing unit tests or integration tests?
Did you run make unit-testing and make integration-testing successfully?
Do new classes, functions, methods and parameters all have docstrings?
Were existing docstrings updated, if necessary?
Was external documentation updated, if necessary?

Additional Notes or Comments

codecov-commenter · 2023-12-27T08:21:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (34830a7) 80.33% compared to head (b14a33d) 80.64%.
Report is 1342 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1607      +/-   ##
==========================================
+ Coverage   80.33%   80.64%   +0.30%     
==========================================
  Files          95      115      +20     
  Lines        6602     8154    +1552     
==========================================
+ Hits         5304     6576    +1272     
- Misses       1298     1578     +280

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jieguangzhou · 2023-12-29T16:00:08Z

A new PR will be created next week to supplement the usage documentation.

superduperdb/base/datalayer.py

blythed · 2023-12-30T08:16:48Z

superduperdb/components/listener.py

@@ -42,12 +42,12 @@ class Listener(Component):
    type_id: t.ClassVar[str] = 'listener'

    def __post_init__(self):
-        super().__post_init__()
        if self.identifier is None and self.model is not None:


Why do we need to switch this order (of the super().__post_init__() call)?

I added a check for identifier in the post_init method of Component. Then if any subclass wants to handle a none identifier, they need to fix the value of identifier and then call the super().__post_init()

blythed · 2023-12-30T08:18:25Z

superduperdb/ext/llm/__init__.py

+from superduperdb.ext.llm.openai import OpenAI
+from superduperdb.ext.llm.vllm import VllmAPI, VllmModel, VllmOpenAI
+
+__all__ = [


A general question: how does this fit together with the other OpenAI models which are not of the chat-completion type?

A chat parameter handles this. If set chat=False then it will call _prompt_generate(self.client.completions.create), otherwise it will call _chat_generate(self.client.chat.completions.create)

No I mean we have superduperdb.ext.openai.*; what do we do with those models?

Oops got it, we should continue to maintain these modules as support for OpenAI vendors

blythed · 2023-12-30T08:19:05Z

superduperdb/ext/llm/base.py

+getLogger("httpx").setLevel(WARNING)
+
+
+def ensure_initialized(func):


Maybe we need this as a general purpose tool (not just for LLMs)?

Yes, But it is only used for LLMs now. Because in the previous design, the model was instantiated first and then passed to SuperDuperDB.
We can make it as a general tool in this plan #1604

blythed · 2023-12-30T08:20:29Z

superduperdb/ext/llm/base.py

+    prompt_template: str = "{input}"
+    prompt_func: Optional[Callable] = dc.field(default=None)
+    max_batch_size: Optional[int] = 64
+    inference_kwargs: dict = dc.field(default_factory=dict)


What is this for? Passed to native model?

Yes, different frameworks and API providers provide different parameter of inference/generate function.
Although they are mostly the same.

But I left a parameter for users to adapt to them. If they don’t use it, they can ignore this parameter and the default parameters of the corresponding framework or API provider will be used.

Currently, we can pass in the parameters when generating text in two ways. This is one, and the other is through db.predict("llm", max_tokens=100, **other_inference_kwargs)

It will be explained in the documentation

blythed · 2023-12-30T08:21:27Z

superduperdb/ext/llm/base.py

+        return [self._generate(prompt, **kwargs) for prompt in prompts]
+
+    @ensure_initialized
+    def _predict(self, X: Union[str, List[str]], one: bool = False, **kwargs: Any):


A point for later: none of the LLM implementations we have seem to support postprocess or preprocess.

blythed · 2023-12-30T08:21:59Z

superduperdb/ext/llm/base.py

+
+
+@dc.dataclass
+class BaseLLMAPI(_BaseLLM):


blythed · 2023-12-30T08:22:24Z

superduperdb/ext/llm/base.py

+        )
+        return completion.choices[0].message.content
+
+    async def _async_generate(self, semaphore, prompt: str, **kwargs) -> str:


Maybe now is the time to remove these async methods?

This asynchronous method is used for the underlying implementation and is used for performance optimization when calling API reasoning in batches. It will not be exposed to users, which is different from the scenario we discussed before. However, if we do not use it, we can also use other methods to provide performance optimization during batch requests, otherwise it will block

Ok, gotcha.

blythed · 2023-12-30T08:23:18Z

superduperdb/ext/llm/openai.py

+
+
+@dc.dataclass
+class OpenAI(BaseOpenAI):


Question is how to synchronize this with the other models from OpenAI. I.e. what
about ext/openai/model/*?

In the LLM scenario, I suggest guiding users to use the llm module in the documentation. For example, if a user wants to build a RAG application, they would first go to the llm directory to find the corresponding llm interface, and to the Embedding directory to find the embedding interface, instead of looking under the openai module. The original OpenAI module can coexist.
We will still face this situation in the future, and my suggestion is to unify the underlying interfaces, such as llm and embedding, where calls to OpenAI are reused from the ext.openai module.

I suggest that we can do this unification after completing the embedding module.

WDYT?

blythed · 2023-12-30T08:23:51Z

superduperdb/ext/llm/vllm.py

+            ]
+            return await asyncio.gather(*tasks)
+
+    def build_post_data(self, prompt: str, **kwargs: dict[str, Any]) -> dict[str, Any]:


Please explain this.

Combined with the prompt to wrap the request data of the vLLM API, it supports model predefined parameters and parameters passed when calling, and finally performs parameter filtering (if unnecessary parameters are passed in, an error will occur, such as the context introduced by SuperDuperDB, or others).

blythed · 2023-12-30T08:24:16Z

superduperdb/ext/llm/vllm.py

+        self.on_ray = self.on_ray or bool(self.ray_address)
+        super().__post_init__()
+
+    def init(self):


Maybe we should do this at the module level? I.e. the moment the user tries to import our module, we should test for the vLLM installation.

If users only need the API model, they don't need to install vLLM. So we just need to check this when using VllmModel

blythed · 2023-12-30T08:24:41Z

superduperdb/ext/llm/vllm.py

+
+            runtime_env = {"pip": ["vllm"]}
+            if not ray.is_initialized():
+                ray.init(address=self.ray_address, runtime_env=runtime_env)


What does this ray.init call do under the hood?

Connect the remote ray cluster, if not, then will start a local ray cluster.
If we do not need to connect the remote ray cluster, we can set ray_address = None

blythed · 2023-12-30T08:27:00Z

examples/question_the_docs_opensource.ipynb

This is great. Do you think we should a single notebook where we can try a range of LLMs in different forms:

OpenAI

ray LLM

API self-hosted LLM with vLLM

vLLM native

I think we can write it in the document and then add a link to this part of the document in the notebook.
Because they are almost the same, except for the line of code that defines the model.
But if we are doing a post blog, we can separate it.

blythed · 2023-12-30T08:27:26Z

superduperdb/ext/llm/base.py

+    from superduperdb.base.datalayer import Datalayer
+
+# Disable httpx info level logging
+getLogger("httpx").setLevel(WARNING)


Please explain.

The OpenAI package will print the INFO information of network-related requests when calling the interface, which is of little significance. It is disabled here.

blythed

@jieguangzhou this was a great pull request! Thank-you.

A few questions and discussion points:

What should we do with the existing OpenAI models?
Shall we deprecate the async functions now? These may become hard to maintain if we over-invest in them.
There are a few "TODOs" in the test-suite - shall we address this now?
Is there any trivial/ tiny LLM we can use in the integration tests?

blythed · 2023-12-30T14:21:15Z

A new PR will be created next week to supplement the usage documentation.

Ok

jieguangzhou · 2024-01-02T08:39:21Z

@jieguangzhou this was a great pull request! Thank-you.

A few questions and discussion points:

What should we do with the existing OpenAI models?

For models present in both application scenario modules (LLMs) and interface modules (OpenAI module), it's advised to share a common underlying interface, like the OpenAI module. The interface modules should be kept simple and extensible, while application scenario modules should adapt more to the specific scenario.

Taking OpenAI LLMs as an example:

Remove the context logic in the OpenAI module's OpenAIChatCompletion.
Let llms' OpenAI reuse it, adding only the separate Context logic.
However, this PR does not achieve this as it has two different implementations.

I suggest keeping two interface implementations now, one for the OpenAI module and one for LLMs, and using LLMs as a separate plug-in in the future

Shall we deprecate the async functions now? These may become hard to maintain if we over-invest in them.

This async functions` is used for the underlying implementation and is used for performance optimization when calling API reasoning in batches. It will not be exposed to users, which is different from the scenario we discussed before. However, if we do not use it, we can also use other methods to provide performance optimization during batch requests, otherwise it will block.

BTW, we also can use multi-thread to do this.

There are a few "TODOs" in the test-suite - shall we address this now?

For the test case section of the TODO, I suggest that I do it before we are about to support a large number of AI models and AI API integrations.
Ensure that all models are compatible with SuperDuperDB

Add models and correctly execute db.predict
Add models as listeners and correctly handle data in the database
Models as Listeners can correctly handle CDC data (perhaps this can be combined with 2)

Is there any trivial/ tiny LLM we can use in the integration tests?

I originally planned to do this and upload a very small model to our huggingface repository for integration. However, vLLM installation requires a cuda environment. Currently, automatic testing is not possible.

kartik4949

Great PR

kartik4949 · 2024-01-02T09:07:59Z

superduperdb/ext/llm/vllm.py

+                raise Exception("You must install vllm with command 'pip install ray'")
+
+            runtime_env = {"pip": ["vllm"]}
+            if not ray.is_initialized():


can we somehow utilise RayCompute from backends/ray/compute.py ?

The situation here is unique because VLLM inherently supports Ray. Therefore, even if SuperDuperDB doesn't use Ray as its computing backend, it can still be utilized. They are separate entities.

However, perhaps when SuperDuperDB uses Ray as its computation backend, it could by default use the same Ray instance, but it must be equipped with a GPU.

kartik4949 · 2024-01-02T09:16:07Z

superduperdb/ext/llm/vllm.py

+            **self.get_kwargs(SamplingParams, kwargs, self.inference_kwargs)
+        )
+
+        if self.on_ray:


if not on_ray where is the compute happen?

The model inference will be done locally

kartik4949 · 2024-01-02T09:23:25Z

test/unittest/ext/llm/utils.py

+    assert len(results) == 2
+
+
+def check_llm_as_listener_model(db, llm):


is this more of test suite utils method?

Yes, we need to move this to the test suite utils tool in the future

jieguangzhou changed the title ~~Support vLLM api service~~ Support vLLM Dec 27, 2023

jieguangzhou force-pushed the feat/llm-base branch 2 times, most recently from 2febcb6 to 1e6da67 Compare December 27, 2023 11:14

jieguangzhou linked an issue Dec 27, 2023 that may be closed by this pull request

[LLM] vLLM API integration #1548

Closed

jieguangzhou force-pushed the feat/llm-base branch from 1e6da67 to fe30689 Compare December 28, 2023 02:38

Support vLLM api service

d3a8264

jieguangzhou force-pushed the feat/llm-base branch 2 times, most recently from 869d170 to a9ce504 Compare December 28, 2023 08:46

jieguangzhou linked an issue Dec 28, 2023 that may be closed by this pull request

[LLM] Bring your own "OpenAI" style deployment #1601

Closed

jieguangzhou force-pushed the feat/llm-base branch 2 times, most recently from be4306f to 12acb6d Compare December 29, 2023 14:48

jieguangzhou and others added 9 commits December 29, 2023 22:49

Supert vllm model and runing vllm on ray

c0aa6ab

Support openai format API interface

7718617

Add docs for llm parameters

4a57f1c

Add notebook example about using vLLM

3181a01

Add async batch generate function for api model

80e16cb

Add UT for llm api: OpenAI, vLLM

d5d05ba

Add some UT comments

c3c34d7

Fix openai key on UT

44491d0

Optimize inference parameters and vllm logic running on ray

12acb6d

jieguangzhou marked this pull request as ready for review December 29, 2023 15:27

jieguangzhou force-pushed the feat/llm-base branch from b389c01 to 891bb80 Compare December 29, 2023 15:30

jieguangzhou added 4 commits December 29, 2023 23:31

Filter irrelevant parameters in vllm interface

126bf90

Organize the import of llm class

4d89ebc

Update CHANGELOG for vLLM and OpenAI format api

891bb80

Update question_the_docs_opensource.ipynb

215eaa3

jieguangzhou requested review from thejumpman2323, blythed and kartik4949 December 29, 2023 15:51

jieguangzhou changed the title ~~Support vLLM~~ Support LLM Inference (vLLM, OpenAI format interface) Dec 29, 2023

blythed reviewed Dec 30, 2023

View reviewed changes

superduperdb/base/datalayer.py Outdated Show resolved Hide resolved

blythed reviewed Dec 30, 2023

View reviewed changes

superduperdb/base/datalayer.py Outdated Show resolved Hide resolved

blythed reviewed Dec 30, 2023

View reviewed changes

superduperdb/ext/llm/base.py

@dc.dataclass

class BaseLLMAPI(_BaseLLM):

Copy link

Collaborator

blythed Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

blythed reviewed Dec 30, 2023

View reviewed changes

Optimization function add_component_to_cache

b14a33d

blythed approved these changes Jan 2, 2024

View reviewed changes

jieguangzhou merged commit e36268b into superduper-io:main Jan 2, 2024
2 checks passed

kartik4949 suggested changes Jan 2, 2024

View reviewed changes

		getLogger("httpx").setLevel(WARNING)


		def ensure_initialized(func):

		assert len(results) == 2


		def check_llm_as_listener_model(db, llm):

Support LLM Inference (vLLM, OpenAI format interface) #1607

Support LLM Inference (vLLM, OpenAI format interface) #1607

Conversation

jieguangzhou commented Dec 27, 2023 • edited Loading

Description

Related Issues

Checklist

Additional Notes or Comments

codecov-commenter commented Dec 27, 2023 • edited Loading

Codecov Report

jieguangzhou commented Dec 29, 2023

blythed Dec 30, 2023 • edited Loading

Choose a reason for hiding this comment

jieguangzhou Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blythed Dec 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blythed left a comment

Choose a reason for hiding this comment

blythed commented Dec 30, 2023

jieguangzhou commented Jan 2, 2024 • edited Loading

kartik4949 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jieguangzhou commented Dec 27, 2023 •

edited

Loading

codecov-commenter commented Dec 27, 2023 •

edited

Loading

blythed Dec 30, 2023 •

edited

Loading

jieguangzhou Jan 2, 2024 •

edited

Loading

jieguangzhou Jan 2, 2024 •

edited

Loading

blythed Dec 30, 2023 •

edited

Loading

jieguangzhou Jan 2, 2024 •

edited

Loading

jieguangzhou commented Jan 2, 2024 •

edited

Loading