Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility between Atomic Agents and Instructor when using Gemini models #56

Open
marcortola opened this issue Jan 8, 2025 · 4 comments

Comments

@marcortola
Copy link
Contributor

When using Instructor with Gemini models, there's an incompatibility issue between Atomic Agents and Instructor's implementation. Atomic Agents always includes the model parameter in the create method call, while Instructor explicitly raises an exception when this parameter is provided for Gemini models.

This incompatibility prevents users from using Atomic Agents with Instructor when working with Gemini models, limiting the functionality and integration possibilities.

Note that while we can still use Gemini like this:

client = openai.OpenAI(api_key=gemini_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")

It would be safer in the long term to use the recommended Instructor approach as stated in its docs: https://python.useinstructor.com/integrations/google/

client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

Current Behavior

  1. Atomic Agents sets the model parameter in all cases:
response = self.client.chat.completions.create(
    messages=messages,
    model=self.model,
    response_model=response_model,
    temperature=self.temperature,
    max_tokens=self.max_tokens,
)
  1. Instructor raises an exception when the model parameter is provided with Gemini models:
    https://github.com/instructor-ai/instructor/blob/887316c8267df621b985c736a2ebba504edf2399/instructor/process_response.py#L446

Proposed Solution

Modify the base agent implementation to conditionally include the model parameter based on the client mode, without introducing any BC:

params = {
    'messages': messages,
    'response_model': response_model,
    'temperature': self.temperature,
    'max_tokens': self.max_tokens,
}

if self.client.mode != Mode.GEMINI_JSON:
    params['model'] = self.model

return self.client.chat.completions.create(**params)

Note: While this is the current behavior in Instructor, it might be worth considering a longer-term solution where Instructor handles the model parameter more gracefully (e.g., with a warning instead of an exception) for Gemini models.

I can prepare a PR if this approach is agreed upon.

@KennyVaneetvelde
Copy link
Member

KennyVaneetvelde commented Jan 8, 2025

@marcortola I was looking into this, but then decided not to make any changes since google is providing an OpenAI compatible API nowadays, which means you can just use the openAI provider, see: https://github.com/BrainBlend-AI/atomic-agents/blob/main/atomic-examples/quickstart/quickstart/4_basic_chatbot_different_providers.py

From what I tested it seemed to do what you need it to do, and I didn't want to muddy up configuration or make examples more confusing at that time, especially since with gemini you need to pass the model when executing the .run and that just didn't feel quite right to me, plus using the openai-compatible API/client seemed to work great for images as well, see this example...

That being said, if you make sure it works for images, your solution makes sense as a temporary solution, because other people will inevitably try to use from_gemini and end up either reading this or creating a ticket before realizing or me telling them it can be done with from_openai. And either way, using from_openai to use gemini is counterintuitive

For a longer term solution, I'm starting to think it might be worth abstracting the client creation to provide a better developer experience, that way internally, we could still just use the cleanest/easiest way, which would be using from_openai, while providing the developer with a more explicit way still...

Thoughts? Ideas? Pros? Cons?

@marcortola
Copy link
Contributor Author

Thanks for considering this issue and providing your perspective, @KennyVaneetvelde.

Regarding using the OpenAI adapter, while it does seem to work, I was hesitant to rely on it in production since Google has marked it as experimental. They also explicitly recommend calling the Gemini API directly if you aren't already using the OpenAI libraries. I haven't done any latency tests yet, but that was my main reasoning for not going with the from_openai approach initially.

I agree that passing the model to the run method feels weird and not quite right. It's not an ideal developer experience. I also think that adding vendor-specific code to the Atomic Agents core codebase with my proposed solution would break the OCP and could introduce maintenance difficulties down the line as we support more vendors (or if Instructor adds breaking changes). While it could work as a temporary workaround, it's not ideal for the long term.

Abstracting client creation is an interesting idea. While it solves a slightly different problem, I can see how it would enable us to implement new clients without changing the Atomic Agents user-facing code. The Atomic Agents client creation API could act as a stable contract. As an example, I'm currently using this rudimentary client abstraction in my projects:

import os
from functools import cache
import anthropic
import instructor
import openai
from instructor import Instructor

class ClientFactory:
    @classmethod
    @cache
    def create(cls, model_name: str, streaming: bool = False) -> Instructor:
        if model_name.startswith("gemini-"):
            return cls._create_gemini_client(streaming)
        elif model_name.startswith("gpt-"):
            return cls._create_openai_client(streaming)
        elif model_name.startswith("claude-"):
            return cls._create_anthropic_client(streaming)
            
        raise ValueError(f"Unsupported model: {model_name}")

    @classmethod
    def _create_gemini_client(cls, streaming: bool) -> Instructor:
        api_key = os.getenv("GEMINI_API_KEY")
        if not api_key:
            raise ValueError("GEMINI_API_KEY environment variable is not set")

        if streaming:
            client = openai.AsyncOpenAI(
                api_key=api_key,
                base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
            )
        else:
            client = openai.OpenAI(
                api_key=api_key,
                base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
            )
        return instructor.from_openai(client, mode=instructor.Mode.JSON)

    @classmethod
    def _create_openai_client(cls, streaming: bool) -> Instructor:
        api_key = os.getenv("OPENAI_API_KEY")
        if not api_key:
            raise ValueError("OPENAI_API_KEY environment variable is not set")

        if streaming:
            client = openai.AsyncOpenAI(api_key=api_key)
        else:
            client = openai.OpenAI(api_key=api_key)
        return instructor.from_openai(client)

    @classmethod
    def _create_anthropic_client(cls, streaming: bool) -> Instructor:
        api_key = os.getenv("ANTHROPIC_API_KEY")
        if not api_key:
            raise ValueError("ANTHROPIC_API_KEY environment variable is not set")

        if streaming:
            client = anthropic.AsyncAnthropic(api_key=api_key)
        else:
            client = anthropic.Anthropic(api_key=api_key)
        return instructor.from_anthropic(client)


# Usage example:
client = ClientFactory.create("gpt-4-turbo-preview")  # or "gemini-1.5-pro", "claude-3-opus"

Perhaps we could:

  1. Keep using the OpenAI adapter short-term since it works.
  2. Add a warning when users try the native Gemini approach pointing them to the OpenAI adapter, or implement the conditional model approach I mention in the issue as a temporary fix
  3. Work on a proper client abstraction and usage as a longer-term solution.

Would love to hear your thoughts on this approach.

@KennyVaneetvelde
Copy link
Member

@marcortola My first thoughts are that this seems like a nice approach, let me think on it for a bit and get back to you!

@KennyVaneetvelde
Copy link
Member

@marcortola Thought about it, I am liking this approach, but then I thought "what about using Ollama? And what about people who use something like LMStudio which is sliiiightly different"

And then I also started thinking about custom GPT-4o finetunes and the like..

Perhaps what we should do is:

  • Have an enum of providers
  • Use this factory pattern, but the signature is something like: ClientFactory.create(provider=atomic_agents.Provider.OPEN_AI, model='gpt-4o-mini', streaming=True, [api_key=optional api key, base_url=Optional base URL])
  • The api key is optional, the factory does os.getenv as in your example but if set in the param it is overwritten
  • The base URL is only used for now when the Provider enum would be LOCAL

That way we can have the control of your original proposal, with more configurability in case of local models/other servers/custom finetunes/...

Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants